Can Rack v2 do block processing?

maw · October 9, 2021, 1:01am

My understanding based on v0.6 was that the engine did not allow for processing a block of samples. Has that changed & if so, is there any kind of guide or example?

I imagine that there may a list of changes from v1 to v2 somewhere but my search powers seem to be failing me.

gc3 · October 9, 2021, 2:36am

Hi @maw!

So modules can always do whatever they want (fill a block, then process when full while filling the net block) and a fair number of them, especially spectral processing ones, already do this. Obviously that adds additional latency to those modules. If that’s what you’re after, here’s a good starting point.

I think what you’re asking, though, is whether the engine itself supports block processing. I’m pretty confident that it does not and will not as of V2; even though it’s feeding the audio device in blocks, those are built sample-by-sample as the engine steps through the modules, then cables.

There’s some (veeeeery early) indication that this may change as of Rack V3, which has (again tentatively) been described as the performance update: see the dev blog here. I don’t think this would change any outputs, though, it would just potentially speed up processing on subgraphs without cycles (to somewhat simplify the issue…)

PS. The V2 changelog is here, by the way.

Squinky · October 9, 2021, 4:18am

This is exactly right. Also, if I might add, it would be a pretty rotten modular synth if every cable had a block of delay in it. For obvious reasons.

gc3 · October 9, 2021, 6:12am

This is a really important point. It’s a little like why you’d think twice about a big hardware modular filled with digital modules imposing little A/D and D/A costs everywhere. Writing this out, I’ve just realized that I loosely think about block-processing modules in Rack the way I think about DSP in Eurorack; they’re both great in the right spot, and they can do unique things (like FFTs!), but you want to be on top of where they are and you don’t necessarily want all that many of them.

I don’t know what Andrew has in mind with processBuffer() (mentioned in the V3 post) but my guess is that it’s intended to provide a module with a more efficient way to compute the same output if and when the hypothetically block-permitting V3 engine determines that block processing is graph-theoretically safe (meaning that block and sample processing will be equivalent–and heuristics for that are going to be the crux of this effort). Anyway, if that’s right, I imagine the default implementation would be to loop process() over everything in the block while managing args appropriately. Someday we shall see! But let’s enjoy V2 first

maw · October 9, 2021, 12:28pm

Thank you @gc3. Yes, this is actually the reason I asked. I was wondering what Andrew knew that I did not (probably in addition to many other things).

dylan.mcnamee · December 1, 2021, 7:00pm

For other reasons, no buffering limits Rack’s ability to take advantage of many cores, and also poses challenges with I- and D-cache locality. At high sample rates even a microsecond of buffering would enable inter-core pipelining and improve cache behavior. It’s complicated, though, and would probably require lots of profiling and tuning to get right. That’s probably on the slate for V3 development.

Squinky · December 1, 2021, 8:23pm

Yes, we all know that buffering makes the processing more efficient. Feel free to find a way to implement it in vcv and still make it patchable like a modular.

gc3 · December 1, 2021, 10:08pm

Hi @dylan.mcnamee! Welcome to the forums!

Per the dev blog linked in my post above, I think it’s definitely on the slate for V3, assuming it’s a solvable problem.

I’ve thought about this more for the traditional larger blocks, and I’m interested to know not just what a single microsecond of buffering would look like, but how it would help.

Isn’t a single sample at 192kHz (which I think most people would think of as a high rate) still around 5 microseconds? Of course, Rack oversamples up to 768kHz out of the box–would the kind of very-short buffering you’re contemplating be an optimization that mainly helps only at the very highest setting(s)?

dylan.mcnamee · December 2, 2021, 6:43am

Thanks!

Oops - you’re right! I also should have started with end-to-end latency (top down approach, shooting for <5ms?) instead of specific buffer counts (bottom up). I am interested in exploring the space of I/D cache locality vs. cache sizes vs. buffer size & location, core & thread counts and scheduling policy - this is a challenging (/fun!) design space. I’ll be thinking about this and post any helpful thoughts if I can come up with any.

Squinky · December 2, 2021, 6:53am

Yes, you are exactly right, that is what you would have to do. And you would have to re-compute it for every change in patch cord. And you would have to assume that for a given module every input affected every output, since you couldn’t really know.

So that means it would have to be able to switch from block to sample at any time while playing without any pops or clicks. Wow - that sounds difficult!.

So, yeah, an interesting problem. But I’ll bet you $10 that VCV 3 will not attempt to solve that problem.

fwiw - I won the last $10 bet I made about VCV, so be afraid!

dylan.mcnamee · December 2, 2021, 7:17am

Aha - I hadn’t thought of the need to reconfigure live and not have that ever result in any pops or other glitches. Strategic buffering and scheduling should enable a static configuration of much higher complexity than currently possible, but that’s likely to be brittle for really complex patches. Thanks for the explanation.

gc3 · December 2, 2021, 7:55am

I wouldn’t dare

Right–it’s a challenging graph-theory problem, and that’s before the actual optimizations are even implemented. I mean, you could understand the whole graph well enough that you might be able to do only local (sub-graph) recomputations, and you could dream up a whole system of curated information about modules’ internals that permitted some further heuristics–like a much, much more complicated version of the new bypass stuff, but as actual metadata, not in-code implementation–but still, yikes. I’ve been interested in this line of optimization for a while but I think this may be where it falls apart, if the need is to invisibly integrate the optimization in the background.

But:

OK, now brings up a design choice I hadn’t thought about before. What if we relax the constraint that the patch needs to be repatchable in real-time when using block optimizations? I can see some real use cases for a feature that carefully optimized a given patch, even if it paused playback for some arbitrary amount of time to do so, and then turned itself off (presumably with some glitches) as soon as the graph changed. It would be almost like a blend between “baking” (in the computer graphics sense) and an optimizing compiler. Parameter changes wouldn’t affect the graph, so the patch would still be “playable” in an ordinary sense.

Don’t get me wrong–I think patching a modular instrument is part of playing it (that’s the main reason I’m writing TapPatch, to allow patching from an instrument rather than with the mouse). But I can imagine patching together something really hairy at 11.025, then optimizing it and running it at 48 or 96 once I was satisfied; or keeping oversampling off on a complex feedbacking patch (where it might really matter), then optimizing it and turning oversampling on; or developing a patch on a more powerful machine, then optimizing it and running it on a laptop (or in a particular VST instance…)

A fork of V2 Free would be a very interesting testbed for both the graph-theory questions and and the processor-level stuff that you’re interested in, @dylan.mcnamee . Andrew doesn’t accept pull requests for Rack but that doesn’t mean that research branches aren’t potentially useful to the main branch either, even as proof-of-concept. Were you around for the initial work on making Rack multicore? Some interesting precedents there.