efficiency: inadvertent synchronization of 2^n ClockDividers across modules?

gc3 · September 5, 2025, 5:02pm

EDIT: already well-discussed over at Watch for param value changes - #13 by marc_boule – thanks all!!

EDIT 2: @pachde’s pre-existing solution is written up in his excellent rack-dev-notes.

Hi all–

A very common approach to making Rack modules more CPU-efficient is spin up a bunch of dsp::ClockDividers, call .setDivision(512) or whatever on them in the constructor, and then, in process(), do something like if (lightUpdateDivider.process()) {expensively_update_lights();}.

I’m tuning some dividers for a sequencer module (about which more soon!) and something occurred to me. From the code I’ve seen, many developers are picking divisions from the same “menu” of powers of two (as in in Fundamental, for example). This is fine in isolation–although multiple 2^n dividers will sync up within a module, the maximum expense is likely to be unconcerning for any given module. And when a patch is being built, each new module’s division is starting at an arbitrary point in time modulo each divider, so divisions between modules aren’t going to synchronize.

However, unless I’m missing something, when a patch gets re-loaded, all modules are instantiated at the same time. Therefore, all dividers across all modules in the patch are going to be getting .process() calls in lockstep, in which case the expensive cycles are going to be maximally correlated! If that’s true, some patches might, at least in principle, get CPU spikes/underruns/etc. on reload that they didn’t have during construction, which seems insidious.

Picking from a larger menu, preferably with prime numbers, would be one way to decorrelate. Rack V2 uses 7 for plug lights, presumably for this reason, and 37 for performance measurement to avoid measuring 2^n buffered processors on their output cycles, as seen here. (The comment was a little more explicit in the V1 code, which used 7).

I wonder if the better practice wouldn’t be to randomly set the ClockDivider.clock to something below the division after calling .setDivision in the constructor (since there isn’t a .process(n)). This should basically simulate the normal state of affairs during patch construction. It would be trivial to write an API-compatible DesynchronizedClockDivider that did this automatically as part of its setDivision (and I’ll probably do this in my own module[s]; doesn’t seem as though there would be any downside.)

Has anyone already worried about this, on the forum or elsewhere? Am I missing some existing compensation for it? I don’t have an existence proof of the problem–it’s just theoretical–but I may work one up to confirm.

cosinekitty · September 5, 2025, 7:03pm

This is an interesting observation. It makes me think about how when there is a conflict between two different Ethernet cards on a local network trying to transmit data at the same time, they both detect the conflict and retry after a delay. But the delay is supposed to be a random amount of time, to minimize the chance they conflict on the retry.

And you are absolutely right about everything starting with lockstep process calls when the patch is restarted.

However, I’m not too worried about this mainly because of the following thought process. Suppose a developer realizes a module is using too much CPU time doing some part of the algorithm (call it X) and figures out a way to run X every 256 process calls instead of every single time.

Now X has 1/256 the burden on the CPU. The worst case behavior would be to have a bunch of these modules running the same X on the same sample. And you would need (roughly) 256 of them to get back to the horrible loading. In reality, with multi-threading overhead, maybe you need 50 or 100 of those modules to get that same performance crunch every 256 samples.

I guess it really matters how slow the X step is. If it’s enough that only a few modules running in lockstep cause pain, then in other situations, the X pain from a single module will matter.

So my gut is telling me yes, this is a potential problem, but no, it’s not likely to bother anybody in practice, because you just won’t have that many modules in a patch in the first place, at least not ones that gang up on a particular audio sample and overburden the CPU.

pachde · September 5, 2025, 7:48pm

What I do in my modules to mitigate this peaky synchonization is to add jitter using the module id, which is randomly distributed, guaranteed to be unique, and precomputed (zero cost).

marc_boule · September 5, 2025, 8:09pm

I had also thought about this a little while back, and brought it up here, which might be relevant to the discussion:

gc3 · September 5, 2025, 8:29pm

Awesome! Thanks, all. Glad this is asked and answered

I’ll adopt either the @pachde or @marc_boule solution in my own code and leave this thread up for discoverability (I searched for a while without stumbling across that part of the other thread…)

And I definitely hear you, @cosinekitty, that this is much more likely to be a theoretical problem than a practical one!

pachde · September 5, 2025, 9:31pm

I have observed these cpu spikes in the Rack performance monitors, and the effect disappeared when I implemented jittering with (args.frame + getId()) % INTERVAL

cosinekitty · September 5, 2025, 11:54pm

Very cool seeing all the responses about using the module ID to spread out where the expensive sample is. That is so simple and low-cost, why not do it that way?

Squinky · September 6, 2025, 5:58am

If your underlying buffer size is even 32 (which is very small) this isn’t really significant. ESP compared to the idiotic inefficiencies of so many modules.

Ohmer · May 5, 2026, 5:04am

Hello,

Using args.frame demands more CPU than dsp::ClockDivider because first is using 64-bit (unsigned) integer, second uses 32-bit integer, instead.

About solid lights (and “non-urgent” tasks), you’ll can go to .setDivision(4096), so the clock divider will be invoked every 0.08s @ 48000Hz sample rate. For my (in development) FranKe analog step-sequencer module, I’ve used this implementation and it works fine (controls & CVs scannings are using another clock division, but set to 32).

LarsBjerregaard · May 5, 2026, 9:36am

On our 64-bit CPU’s there should be no CPU performance difference between reading a 32-bit or 64-bit variable. The key is to understand “cache lines”. See e.g. https://www.reddit.com/r/C_Programming/comments/1875mkv/do_64bit_cpus_access_64_bytes_of_memory_at_a_time

Ohmer · May 6, 2026, 12:23am

Thank you Lars for this clarification!