Target_clones attribute

Is anyone using the target_clones attribute? Here’s the GCC manual entry. It tells the compiler to create multiple versions of the function with multiple SIMD targets. This is an easy way to get a free speed boost for heavy functions with lots of math, while preserving support for CPUs that don’t support SSE4.1 or higher. You can add the following attribute to your function.

__attribute__((target_clones("default,sse4.1,avx,avx2")))

https://godbolt.org/z/9UDokx

The compiler creates a resolver to choose at runtime which function version to call. This resolver calls __cpu_indicator_init and branches on the (cached) result, which takes a few CPU cycles, so it’s not worth using this on tiny functions. Always profile modifications like these.

I haven’t used this in Rack or a VCV plugin yet because it doesn’t solve my top priority of optimization: to make Rack run well on low-end computers. But a second priority is to increase the maximum number of modules you can use on mid-to-high-end computers, so I will probably begin using it more often.

Eventually I’ll drop support for CPUs that don’t support AVX, but maybe in Rack v3 or v4.

4 Likes