Question about V1 CPU handing

Hi guys, got a non expert question that may seems stupid, I guess I just didn’t understood how it works.

VCV Racks is handeling multi CPU… cool :slight_smile: my fan will say thank you!

but when I load like 4 modules, the task manager says racks takes 7% (a little bit less than 0.6)
and when I set Rack to 4CPU, the task manager says rack takes 50% of the cpu, for the excat same modules… so what is the point?

I hope my question is not too stupid but I really don’t understand what is the benefit of this…

It’s only multithreaded within a single sample. So you are not going to see the same difference as ex. giving three u-he Diva’s their own threads.

Turning up the CPU count is not going to be worth it unless your small number of modules particularly hungry (Vult?) or you have quite a lot of modules mounted.

@Vortico Are you using hot spin locks to synchronize the threads? That might explain some of it if true :coffee:

1 Like

Not sure how this confusion is so common, but multithreading takes more CPU, not less. This means more electricity, more heat, and less resources for other programs.
Imagine you have 1000 hours of paperwork to do. If you hire 3 other people, the total man-hours is not decreased but increased to maybe 1500 hours, as there is overhead of collaborating with others.

3 Likes

Yes, see https://github.com/VCVRack/Rack/blob/v1/src/engine/Engine.cpp#L104. All threads spin after they are finished until the last thread finishes. If modules like Core AUDIO call Engine::yieldWorkers(), this spinlock turns into a mutex until the last thread finishes.
All multithreading overhead is due to both serial code at the end of each timestep and thread orchestration such as spinning.

Depends on the chipset really :confused: Having more but weaker cores (or a chip with an elastic power profile) might be good enough for certain workloads but require less effort spent on cooling. I don’t think desktops fit this model though.

You will only benefit from adding another thread once the first one is close to maxed out.

The best idea is to only use extra cores when you need to, there’s quite a lot of overhead. Strangely there’s less overhead when you’re actually using the extra cores. when you’re using cores you don’t need they spend a lot of time spinning the CPU.

I’m sure we’ll work out a better way to do this, but for now you can get some extra modules this way.

1 Like

Yielding for as much time as scheduling precision allows and then hot looping for the remainder, but would take some changes and tuning to get right.

1 Like

that’s what Andrew does

There is a difference between nano/usleep and _mm_pause:

Sleep releases the core back to idle, _mm_pause is just hinting now is a good time to do something else (but the core stays allocated.)

For game loops you can calculate a time budget for one frame, do some logic and then sleep for the remainder. You can have 30fps or 60fps appear to use almost no CPU power this way. (I’m assuming Rack doesn’t do this because timing precision to sleep 44,800 times per second is… less accessible.)

2 Likes

not to mention it’s impossible for the engine to accurately estimate how long a module will take to process anything

I have extensively researched the problem of decreasing CPU usage while not sacrificing module count or the 1-sample latency rule. I am interested in well-tested solutions for improvements, but I doubt this is possible at this point, so most speculated solutions will increase CPU usage or sacrifice module count.

Thank you for clearing up this misunderstanding of mine.