I’m sure many of us are interested in how Rack v1 will improve performance. As everyone is by now aware there are two main factors in this, CPU & GPU.
Andrew (@vortico) has recently made some changes that affect CPU performance and I was interested to test these. I will hereafter refer to the v1 development commit I have used as v1dev and it was locally compiled on commit 510f7b2179dcffcd8f2fafaae66f7c0070ee6215. Locally compiled v1dev Fundamental modules were used for patching.
As you are also aware Jim Tupper (@JimT) has been very actively researching this area, has released a series of his experiments as a fork (https://github.com/Rcomian/Rack) and has published a fascinating paper on this: https://github.com/Rcomian/Rack/wiki/Multi-threading.
I will hereafter refer to this as rcomian and it was natively compiled with v0.6.2c-experiments2. Plugin Manager Fundamental modules were used for patching.
Test system: 27” 5K iMac 2015. CPU: 6700K (4 Ghz); GPU: AMD 395X (4Gb); 32Gb 2400.
As much as possible was stopped in the background: WiFI, Time Machine, Spotlight Indexing, iCloud. The only processes running that exceeded 1% CPU were Rack, iTerm, iStat Menus (used for the stats), windowserver and coreaudiod. Other processes had little spikes and these are what create the glitches at the highest number of VCO-1s. Testing was carried out fullscreen with 4 rows displaying on the iMac’s 5k monitor at 2880*1620 resolution. No frame rate limiting was used with either v1dev or rcomian.
Test Patch: Groups of 3 VCO-1s feeding their 12 (4 * 3) oscillator outputs into a Unity (2 * 6). These Unitys feed into further Unitys and then to a Mixer and thence to an Audio. A scope monitors the Mixer’s output in order to ensure the levels do not clip (considerable phasing due to delays in the signals through the circuits produces rapidly changing levels). Testing was carried out at 44.1k (engine & audio) & 256 block size. Patches are here:
Testing began with a 36 VCO-1 patch, 1-4 threads were tested for around a minute each where any glitches occurred, and a (necessarily) subjective assessment of their severity made; CPU percentages were recorded by observing iStat’s display. At higher numbers of VCO-1s these are not stable percentages and a subjective average of them was made (they displayed greater ranges with rcomian). One by one, VCO-1s were deleted beginning at the bottom right and moving left until a row was cleared and then from the right on the next row up. At no point were any Unitys or their wiring deleted.
Here is a screenshot of the rcomian patch with 36 VCO-1s:
Here is a screenshot of the v1dev patch with 12 VCO-1s:
The data in the spreadsheet covers a range form 12 VCO-1s in my patch to 36 VCO-1s on 4 threads (I have 4 physical cores & hyperthreading does not help). Why not go lower or higher? Because data outside this range is not relevant for this patch. The important thing to remember about multithreading in Rack is that you will get no benefit from adding threads until you need them (ie your audio is glitching because your patch has too many modules). Adding threads before this point is counterproductive (you will have greater CPU usage/power consumption/heat/fan noise). Why is this? That is over my pay grade; I have half an understanding but to avoid looking like a complete idiot I refer you to others who can give you the correct explanation. 12 VCO-1s was the point I needed to add extra threads in this patch; beyond 36 VCO-1s lay only complete audio degradation (and it was nice and symmetrical !).
The results are best viewed here: https://docs.google.com/spreadsheets/d/1zxWdVyPo_PpgWNQPT3zK69ij8UPZeARmvSqlaTLS24U/edit?usp=sharing
But here is a screenshot:
Dark Green: No glitches
Pale Green: Very occasional glitch (typically a daemon CPU spike)
Yellow: Occasional glitches & audio dropouts
Amber: Glitches & audio dropouts
Red: Severe glitches & audio dropouts
Black: Complete audio degradation
Please be very clear, the numbers here mean very little. They are a test on my system of a particular patch. It is the pattern that is important.
What can we see/conclude? Well, here are some headline figures for the patch (using Bright Green/No Glitches):
Max VCO1s (1 thread): 12 (v1dev); 13 (rcomian)
Max VCO1s (2 threads): 21 (v1dev); 22 (rcomian)
Max VCO1s (3 threads): 24 (v1dev); 19, 22-24, 27-28 (rcomian)
Max VCO1s (4 threads): 22 (v1dev); 19, 21-24, 27 (rcomian)
Conclusions: v1dev has consistent figures, rcomian has higher but inconsistent figures; 3 threads are optimal, not 4.
CPU Range (1 thread): up to 118 (v1dev); up to 123 (rcomian)
CPU Range (2 threads): 185-219% (v1dev); 142-208% (rcomian)
CPU Range (3 threads): 306-307 (v1dev); 252-250% (rcomian - inconsistencies)
CPU Range (4 threads): 411 (v1dev - no benefit over 3 threads); 297 (rcomian - no benefit over 3 threads)
NB These are calculated from the point that the next thread needs to be turned on.
Conclusions: rcomian has better CPU usage than v1dev, especially at 3 threads, but there are inconsistencies that may create a few glitches (probably) depending on the patch.
Note that (debatably) usable results are available for slightly higher numbers of VCO-1s with rcomian (Light Green). This is likely to be very dependent on the patch used.
Note also that whilst the direction of travel is the same, v1dev’s results are reasonably linear, whereas rcomian’s are (very much) not; v1dev’s implementation is done with little code and is a parsimonious solution (AFAIK); rcomian’s implementation is complex and uses multiple vectors.
It is clear that the law of diminishing returns is very much in play here, as Andrew has previously noted.
I hope that this has been interesting and useful for some of you.
NB: There is an update below testing limiting the frame rate in combination with additional threads.