VCVRACK on Multicore processor


(Dominique Julian) #1

Hi,
when i load patch, sometimes, the sound is cutted, it seems that my porcessor is not enough powered… but, when i control power used, it is only… 20%…
What do you think about this problem ?
does VCV is not compatible with multicore ?
thank you
:slight_smile:


(Lars Bjerregaard) #2

No, Rack is not multicore, so it needs a good CPU with the best possible single-core performance. It also needs a decent GPU/graphics card, not high-end but decent. Desktop machines are preferble to laptops because of thermal throttling.

There’s a few tricks to get better performance:

  • Increase the block-size in the Audio module.
  • Learn which modules are more efficient than others.
  • Minimize the Rack window when just listening or recording to waw file. Alternatively, make the window smaller or run Rack in “low-resolution mode” in your OS.
  • Make sure the sample rate of the Audio module is the same as the sample rate of the Rack audio engine (selected in the toolbar) otherwise resampling uses additional CPU.

(Antonio Tuzzi) #3

some NYSTHI modules are multicore friendly ( 7seas, all the distorsors, the fft visualisers, the tuner)


#4

Make sure to turn on VCV’s CPU meter. Often there is on “pig” plugin hogging all the processor. Usually there is an alternative that is better.


(Nik Jewell) #5

AFAIK, the VST Host module runs the VST in a different core.


(Lars Bjerregaard) #6

Yup, that seemed to be what Andrew said.


#7

there is a multi-threaded experimental version of vcv rack here: https://github.com/Rcomian/Rack


(Andrew Belt) #8

Be careful of the XY Problem here. What you really want is to 1) use more modules before the audio buffer starts missing its deadlines and 2) decrease CPU so other software (e.g. another DAW) can simultaneously take advantage of computer resources.

To be rigorous, “implementing multithreading” in Rack means having that a pool of n threads process the modules’ DSP routines and rejoin at the end of some sized buffer (which could be 1 sample depending on how it’s implemented). Amdahl’s law implies that goal (2) above will almost certainly be worsened, not improved. So I will assume that you’re only interested in goal (1).

The fundamental question is then “How many modules (e.g. copies of Fundamental VCO-1) can you use now? And how many could you use with a given implementation of a pool of n threads?” The answer could be more or less modules. I have not seriously begun testing this, and I’m not going to announce my educational guesses until I have actual data.

Also, see https://github.com/VCVRack/Rack/issues/195 for the GitHub issue, which you may have found before posting this thread.


(Jim T) #9

I have data on this.

The vcv rack workload can be perfectly multithreaded.

To apply Amdahl’s law to this software we need to take the largest indivisible processing unit that defines our job size. That unit is the single sample processing of a single module. We can add extra cores/threads until we have as many cores as we have modules. This is basically unlimited multithreading ability - a perfect workload. The difficulty is in the synchronisation, but the underlying problem is a perfect workload with no diminishing returns until you have more cores than modules.

Other parts of the workload, such as sample rate conversion, are a fixed, constant load and represent a single subtraction from the processing power available for the workload.

The only other scaling load is the wire processing, which is trivial, but could also be done in parallel.

Limiting the number of threads used by rack is a simple way to ensure that there are resources available for the rest of the system. Although if threads automatically scale to handle the load, a user will simply see their system become less usable with larger patches, something they can expect, understand and control.

Using a heavy module and one thread I was able to host 12 instances of the module before stuttering. Using 2 threads, I was able to host 25 instances. 6 threads hosted 73 instances of the module. That’s basically linear growth. Multithreading works, even naive simple 1 sample per atomic spinlock sync multithreading works.

What we actually want is to be able to use all the resources on the machines we’ve purchased for the purposes we’re using them for. If I have 8 cores and I’m stuttering on a patch when my cpu is <25% used, I’m not going to be happy. The fact that my daw has room to breath doesn’t matter when it can’t be fed.

You can say “I want to use more modules before stuttering”. Yes, that’s obviously what we’re after, as users. But you cannot keep fitting more processing into a single core. It’s not a matter of always saying “write your modules better, choose better modules”. At some point the workload gets too large for a single core. There’s no way around that.

Good modules are going to take good amounts of cpu. Limiting CPU usage involves compromises to the sound quality. There is already a hard compromise as the processing must be able to be done in 1 sample time. With a single core limit, the compromise is that the module must leave room for the entire rest of the patch to run in a single sample time as well.

For a module to put its own processing on a separate thread it must buffer like a VST, this massively limits the realtime response of the patch if you have multiple chained modules.

Core speed is not getting significantly faster. CPUs with more cores are exactly where the industry is going and this workload can take advantage of them perfectly.


(Andrew Belt) #10

Looks like you’re using normal C++ mutexes for synchronization. https://github.com/Rcomian/Rack/blob/features/experiments/src/engine.cpp#L265 Those results sound promising, so I’ll take a look within a couple weeks when a sweep through the engine code on the v1 branch.


(Jim T) #11

that’s a secondary feature. I’m sleeping the spinning threads with mutexes but only when the audio module sleeps the entire engine. that’s not how the main synchronisation works, it’s just a CPU saving feature.


(Jim T) #12

my implementation is documented in the commit steps that make up the feature bit by bit. it’s not ideal. my aim is just pathfinding for you, not telling you what and how things should be done.


(Pat) #13

I have multicore vcvrack running fine in linux. All I do is start 4 seperate vcvrack processes, then have them communicate through a es-8 module. I think it’s actually nicer having 4 seperate racks running, in that I can save off sub rack configurations. on the es-8, i sometimes send an output to another input. Also this allows me easy integration to my hardware racks, and the outputs are further routed to my mixing console and recording systems.


(Patman / NYSTHI Manual) #14

@pat I never thought of running multiple instances of Rack (because I’m on a single screen potato), but with a hardware interface (the ES-x ) it totally makes sense!


(Skrylar) #15

@Patman The Windows version will tell you that running multiple instance is not supported (probably to avoid confusing Bridge?)

@JimT Aren’t you still screwed when it comes to serial patches? The Jack2 docs have some comments about A > B > C chained processes not being able to be run in parallel.


(Jim T) #16

not at all. our saving feature is that every wire introduces a single sample delay. this means that we have the following sequence: “process all modules, move data along wires, process all modules, move data along wires”.

the process all modules part is really really easy to do in parallel. making sure you’ve finished processing all the modules before you move the data along the wires is really really hard (at audio rates).


#17

Btw, you are correct that we want to use all our cores. Which will give most users a 3X increase, and some perhaps 7X. You seem to dismiss making the plugins themselves efficient, but out there in the real world of rack you see more than 10X differences in CPU usage between two plugins that “do the same thing”.

Multi-threading is great, but reasonably efficient software is better.


(Skrylar) #18

I think the effort slots that would have been spent on efficiency get burned satisfying the single sample hard real time requirements, but that’s just my guess.


#19

It’s not true. Efficiency makes a huge difference. Just look at the CPU meters on a big patch. Some modules use very little CPU, some use a huge amount. And they are all (of course) doing single sample hardreal time processing. Here’s an app-note about putting VCO-1 on a diet. https://github.com/squinkylabs/SquinkyVCV/blob/master/docs/vco-optimization.md


(Skrylar) #20

Oh. Those are much lower level optimizations than I had in mind. I’m vaguely aware of sysprof and hawktracer but nobody has complained about the performance of my modules :sweat: