Is anyone using the
target_clones attribute? Here’s the GCC manual entry. It tells the compiler to create multiple versions of the function with multiple SIMD targets. This is an easy way to get a free speed boost for heavy functions with lots of math, while preserving support for CPUs that don’t support SSE4.1 or higher. You can add the following attribute to your function.
The compiler creates a resolver to choose at runtime which function version to call. This resolver calls
__cpu_indicator_init and branches on the (cached) result, which takes a few CPU cycles, so it’s not worth using this on tiny functions. Always profile modifications like these.
I haven’t used this in Rack or a VCV plugin yet because it doesn’t solve my top priority of optimization: to make Rack run well on low-end computers. But a second priority is to increase the maximum number of modules you can use on mid-to-high-end computers, so I will probably begin using it more often.
Eventually I’ll drop support for CPUs that don’t support AVX, but maybe in Rack v3 or v4.