Haven’t had time for performance comparisons so far. I just threw in all possible optimizations I could think of - had some prior experience compiling all opensource id software games for the Pi so that helped. Compiled with gcc 10 on 32 bit Raspbian Bullseye. I was pretty surprised that I can use smoothly around 30-40 modules (of course carefully selected) and even some heavy ones (Squinky labs) with 44 khz sample rate and 512 samples buffer (10 ms latency). When I go to 24 khz I can use even more modules while multitasking and at the same time playing youtube videos, etc.