Modules with ARM builds!

I’m pretty sure it’s not.

Well, that happens to lots of people, especially on laptops, Apple Silicon or not. Nothing specifically to do with your M1 Air. Also I’m a bit surprised, because when I tested the M1 Air I could go up to 3 threads when trying to squeeze it to the limit. Anyway, don’t run on more than 1 thread until the audio starts to crackle and only then turn up the thread count. Also up the buffer size a bit when it happens.

I think there’s a misunderstanding here that is still out there. More threads does NOT equal more performance, it doesn’t work like that, on the contrary. It simply raises the headroom for the biggest patches you can run, which you are saying you don’t need. It also won’t help you at all with that 96KHz goal you have, and might I suggest you remember that you’re running on Apple’s smallest machine, a fanless design. It’s very performant but has its limits.

Nope, simply not true, I have myself proven otherwise, and I emplore you to stop that meme before it gets wings and fly away.

That’s how Rack works on all machines, it only raises the top bar for how big patches you can run. Turning on more threads in itself lowers performance because it has an overhead.

I’m surprised we still need to be talking about how Rack threads work 2 years in…

Bottomline guys:

  • Turning on more threads in Rack gives you worse performance, it eats CPU simply because of the overhead, which is why you should wait to do it until you have to.
  • Turning on more threads in Rack only means you have more room available, to possibly run bigger patches than on 1 thread, if your machine can handle it, which lots of laptops can’t.

It’s even in the manual.

Oooh, looks like a nice project for me over the holidays. I now regret hand coding Dexter and other modules in raw Intel SIMD instructions :sweat_smile: Thank the stars that things like “SIMD Everywhere” exist.

5 Likes

Hi,

OK I am having a hard time understanding what the implications of those different settings are exactly, can someone please be patient and explain it slooooowly ?

Because from what I understand the situation is that with one thread my CPU consumption (as it shows in VCV) is higher than with the “more modules” setting (4 threads in my lame laptop’s case), so I should use this higher CPU using setting and increase it as I patch along ? Am I getting this right ?

What would the advantage be over just letting it at the setting that allows for more modules ?

Clearly the settings changes “something” because with one thread and a basic init patch (mixer, effects on send, basic limiter for master, audio output, maximum buffer size) I get this with one thread :

And this with four threads (everything else staying the same) :

Also, what is it with the difference between what VCV shows and what the activity monitor shows ? And what is it about that 311% of CPU usage that does not result in audio problems ?

All this is a bit off topic, if this post has to be moved please do it, I am interested in anyone’s help on the subject though.

I had a crack at it here, but you certainly know more than me on the topic!

VCV Rack running native on Apple m1 - #47 by hemmer

The thread limitation is a known issue, acknowledge by VCV in various support emails. It’s great if you haven’t run into the issue—however it is a current issue with x86 builds running on Apple Silicon. It seems it’s more prevalent on the M1 Pro/Max/Ultra family than the OG M1.

I see, that’s probably why we’ve never heard of it :slight_smile: If you can share from those support emails, please do.

Okay, so something changed, it wasn’t an issue before, at least in my own testing and what I’ve heard from everyone in here using Apple Silicon machines.

Sure, see at the bottom of this post. And this is the general issue with a forum of this type - it’s been talked over till death, but then the posts fall down the stack, never to be noticed, nobody searches, and so we can just explain the same things over and over again to each other, it’s not great and we need that wiki. Anyho…

Nope, it’s the other way around - more threads consume more CPU, in and of themselves, because there’s a CPU overhead for running each thread. That’s why you should run as few as possible.

Exactly, you see 4 threads consuming much more CPU than 1 thread. Now, you’re running macOS and the process monitor works differently than it does in Windows (why keep it simple). So, in macOS that figure means “how many percent OF ONE CPU CORE does the process use?”. So 311% means “using a bit more than 3 cores fully”.


Ok, so here goes - Lars’ explanation of Rack performance, and how to make the most of it, with most important things on top, and please do also read what the manual says about it:

  • Only run as many threads as you absolutely need. So in the Engine->Threads menu you start with 1 thread. Only when your audio starts crackling you add 1 more thread, etc. and only after also applying the below performance tips. The reason is that adding more threads is NOT free!. Each additional thread added consumes some CPU because of thread scheduling overhead, sometimes a lot more, so always start with 1. In my own patches, on my own machine (an 2013 Core-i7 iMac), I never go above 1 thread, because by the time I get to adding another one the machine is already unpleasant to use, plus I never make monster patches.

  • Have a decent graphics (GPU) card. Doesn’t have to be fancy just “discrete” meaning “not the inbuilt Intel graphics” (or AMD) in your CPU. The reason is, that if the GPU cannot cope with the load of rendering the graphics, the CPU takes over that duty, and so you end up burning a lot of CPU for just rendering graphics. The card has to be external to your CPU, so like the manual says:

* Graphics: Dedicated Nvidia/AMD graphics card from 2013 or later with recent driver update
  * Integrated (non-dedicated) graphics such as Intel HD are not recommended and may cause significantly increased CPU usage.
  • Run with as high a block size as you can in the audio module. If you are using external MIDI sources, e.g. a MIDI keyboard, and you need snappy/responsive/low latency, then gradually lower the blocksize until the latency is acceptable, but no more. The reason is that the lower the blocksize the more CPU is used.

  • In the audio module, run with a sample rate of 44 or 48KHz. Only higher if you have a really good reason to. The reason is that the higher the sample rate the more CPU is used.

  • In the Engine->Sample rate menu, run with the same sample rate as the audio module, always, unless you have a really good reason not to. It should usually be the “Auto” setting in that menu I believe, where auto means “same as audio module”. The reason is that if you don’t run the Rack engine and the audio module at the same sample rate, you will burn a lot of CPU re-sampling everything needlessly.

  • Use CPU efficient modules. This is one of @Squinky’s pet peeves but also mine. There is a lot of variance between modules in CPU consumption, even if the modules otherwise look the same, with the same amount of functionality etc. It’s got everything to do with how well they are coded. There can be 10 x or a lot more variance in CPU consumption between otherwise identical modules. So put it to the test. When you’re deciding “which LFO should I use?”, put the 3-4 candidates next to each other, patch them up so they’re working, also very much with respect to whether you’ll be using them mono or polyphonically, hit F3 so you can see the CPU usage of each one and then decide. Using efficient or inefficient modules in your patch, especially big patches, has an awfully big consequence on the CPU usage of that patch.

  • In the View->Frame rate menu, lower the graphics framerate if the machine is struggling. That can save quite a bit of CPU as well.

  • Minimize the Rack window when you can. If you’re in a place in your workflow, where you’ve hooked up a recorder, and the patch is playing by itself, or you’re jamming on your MIDI keyboard, minimize the window. That will moderately, or sometimes dramatically lower the CPU usage. The reason is (I think) that, depending on graphics drivers, OS, etc. the GPU and CPU doesn’t have to do any drawing of graphics when the window is invisible, and drawing Rack and all the modules and graphics does take a toll on both GPU and CPU, so you’ll be saving cooling/heat/fan-noise plus some/a-lot of CPU.

  • Have the Engine->Performance meters (F3) off when you’re not actively using it, because that uses some CPU as well.

That’s everything I could think of that’s accumulated over the years. There’s probably a couple of things I’ve forgot, but it’s going to be a while before I’ll bother to write that up again (again again), so now it’s up to y’all to point to this post when people are wondering about performance :slight_smile:

11 Likes

This is a superb summary. There is also a section of the VCV manual online about optimizing performance. There is much overlap.

1 Like

Soooo, back on topic! Any other ARM plugin builds out there?

Thanks! Yes, it’s linked in the text.

If you include rack.hpp all those still work. I still use mm-add-ps everywhere and simde has been working for us for a couple of years but rack includes it for you directly. Your hand coded stuff may just work. Mine did both in surge proper 2 years ago and surge rack now

1 Like

Thank you for your answer with a clear explanation.

I personnel always use maximum buffer size because I never use any external controller, and slowest frame rate because of CPU reason, and work at 44100Hz all the time also (I never heard any difference between that and 48k and I don’t think my computer will allow me to test anything else so…).

I just wasn’t clear as of why CPU meter shows different values in VCV and in the “Activity monitor”, which still seems like a problem to me, because why show a number all the time in the program interface if it is not accurate, or does not reflect the complexity of the situation ? Anyway now I know, thank you.

I think it is a good practice to have a well written post about a subject and then reroute people toward it when needed, if only this could be a “sticky” thread, by itself, on the front page… That and then a few others as necessity arises…

4 Likes

Because they measure different things. The OS activity monitor shows how processes are consuming resources on the whole computer. The Rack meter shows… well, from the manual:

Performance meters

Enables/disables the measurement of time for each module to generate each sample (see Sample rate). This displays a meter on each module with the percentage of time spent in the module’s assigned thread (see Threads).

The CPU meter consumes engine CPU time itself, so it is recommended to disable it when it is not needed for best performance.

So it’s a relative measurement, only valid inside Rack, showing (my words) how much time a module is spending doing its processing, out of the maximum time it could possibly spend. I might not be wording this accurately. It’s most useful in comparing modules to each other, to get an idea of how efficient/inefficient it is.

That’s a good idea - It might be good to split this into two threads. Any mods reading who can do that for us?

I just spent two good hours on a patch and no sign of my fans starting, usually they start about 17s after I open VCV, so it is a HUGE improvement on my quality of life, thank you very much !

=> One thread and up only if needed for the win !!

3 Likes

HetrickCV: Release Nightly · mhetrick/hetrickcv · GitHub

Nonlinear Circuits: Release Nightly · mhetrick/nonlinearcircuits · GitHub

2 Likes

RPJ: Update c-cpp-dev.yml · kockie69/RPJ@41469c3 · GitHub

1 Like

As the rack-plugin-toolchain now includes the arm build it is automatically here in the new release of

2 Likes

Does a new version point update have to be submitted for an automatic library ARM build?

If any of you are running the arm free beta, there was a great change in the 222 sdk which I think is awesome - I was a big advocate for the change! - but which means the 222 sdk makes an arm plugin with different names than 221.

So short version: if you use the surge arm beta it now uses the 222 sdk and so you need 222 free

And plugins built with 221 that haven’t updated yet may not load properly. If they don’t just rename plugin.Dylib to plugin-arm64.dylib

(This change means you can have a rack user directory with both arm and x64 plugins in one spot which is super operationally useful and I’m thankful for vortico and team for adding it)

3 Likes

question related to this, is there a reason to simply not have universal builds?

with the sdk using makefiles the build would only need a few extra flags to support it…

4 Likes