optimal setting for make -j

Squinky · February 23, 2020, 12:20am

Most of us are used to running make with four threads - make -j4, and indeed the VCV manual suggests this as a reasonable default.

With win7 becoming unsupported, I was forced to get a modern computer. This one is no speed demon, but it does have 6 physical cores and 12 logical ones.

On this system there was measure speedup all the way to make -j12 and going up to 32 didn’t slow it down (once it hit 12 all the cores were at 100% CPU).

so - ymmv, but try something bigger that -j4 next time.

Vortico · February 23, 2020, 6:01am

The optimal number of parallel jobs is the number of the logical cores.

wipu · February 24, 2020, 11:47am

On linux it’s easiest to just call make -j$(nproc). I don’t know if nproc is available on other unixes, too.

Squinky · February 24, 2020, 5:06pm

oh, interesting. Have to see if msys2 has that variable. The thing I found slightly surprising is that using crazy large values doesn’t seem to have a negative impact on build time.

LarsBjerregaard · February 25, 2020, 7:47am

On Mac you can use $(sysctl -n hw.ncpu) instead of $(nproc) which doesn’t work there.

wipu · February 25, 2020, 7:51am

Actually, nproc is not a variable, it’s a command. The $(command) syntax of bash is called command substitution. The string "$(nproc)" makes bash execute the command nproc and then replace the string with its output.

wipu · February 25, 2020, 7:54am

Depending on the project, the optimal process count may be a number greater than the number of cores. That way processes that need to wait for IO can yield their core to another process. But this is only speculation, I haven’t measured anything properly.

Squinky · February 25, 2020, 2:58pm

And of course ssd vs spinning disk.

chaircrusher · February 25, 2020, 3:40pm

TL;DR – no point in telling make to run more processes than there are available CPU cores. Even shorter: What @Vortico said.

But if you want to spelunk the topic, there are two things that happen if you do (e.g.)

make -j99

You give make the option of running 99 subcommands at once.
Make’s reason for existence is to manage dependencies – you can’t link until you compile all the files that need to be linked.

#2 is going to influence how many processes spawned, because at some point dependencies converge. If a particular target depends on 10 other targets, then a minimum of 10 submakes are run. If each of those submakes has multiple dependencies they’ll kick off submakes potentially up to 99 processes.

Gnu make is smart in that it keeps track across all it’s subprocesses how many subprocesses have spawned, so even if there’s 1000 dependencies, it will only run 99 of them at any given time.

If you specify a large number of parallel processes, you can run out of independent dependencies to build, and make waits around for them to be built before it spawns more parallel builds.

That’s why you reach a certain level where you don’t get any more speedup.

To some extent you can get more performance by specifying more sub-builds than you have processor cores, because processes that are I/O bound suspend until their input is ready. But that is less of an issue on modern computers because disk I/O is so fast - particularly on SSDs - that there’s not a lot of ‘wait for I/O’ time available for other processes to run.

And I’ve spent more time writing this than is even justified by the topic. Unfortunately I can’t run make -j99 in my brain.

Squinky · February 25, 2020, 4:18pm

Until recently I built rack on an auxiliary disk that was not as SSD, so - don’t assume everyone always has SSD.

My main point is not that you will get better results by using n > num cores, but that you don’t have to worry about it much, it’s perfectly fine to use a huge number and not go rummaging around to see if your cores are physical or virtual.

And if using a huge number (99?) means you get best performance no matter what your hardware, then just use 99 all the time.

Any of these are better than what the manual recommends, which is 4.

Counting down the seconds until we get another essay from @staircrusher…

chaircrusher · February 25, 2020, 10:45pm

Sweet Jebus I thought I killed the topic dead the first time out.

wipu · February 26, 2020, 12:07pm

What matters is that -j$(nproc) is typically at least very close to optimal, and since it’s so easy to use, that’s what I almost always use. I do have an older machine that has 2 cores but only little RAM. On that machine I use -j1, because otherwise it runs out of physical RAM with some bigger projects.

It’s nice to speculate, but that’s my short concrete story from everyday life (as a gentoo user).