XSIMD

Hi everyone.

I am relatively new to audio dsp and coding in general. I’ve been developing a simple diode clipper using chowdsp_wdf for quite a while now.

I am at the point where the only thing left (I hope) is to optimise the module for polyphony and I’ve been trying to implement XSIMD (encouraged by chowdsp_wdf) unsuccessfully.

Long story short: Are there any open-source VCV plugins implementing XSIMD instead of Rack’s own SIMD? If there are not any, could someone here with experience help me with the basics?

Look at the source code for any of the vcv modules. Most use SIMD.

I know Rack has got its own implementation of simd and it is the only one I’ve seen used in fundamentals and bunch of other plugins. I explicitly referred to XSIMD library, which has its own implementation and is not compatible (at least directly) with rack’s implementation. The chowdsp_wdf library is optimised for XSIMD, that’s why I want to use it instead of what is used in fundamentals.

I’d assume that if anybody’s using XSIMD in Rack (or would know how to set it up), it’s @jatinchowdhury18 himself… Haven’t checked his Rack plugin to see if it’s XSIMD or not but that might be a good start.

I don’t use xsimd but I also don’t use rack simd and have no problem. It’s entirely possible to do cross platform simd using the simde library which is included with rack or just compiling a program which has a mix of Intel and neon instructions.

So while not exactly helpful and I agree look at jatins module is good advice all these things are just wrappers around the simd instruction sets so nothing compels you to use rack simd (or any wrapper for that matter. Lots of the code in surge is just hand coded sse)

Probably the makefile is the hardest part

I was responding to your last question. I do know that you want to use xsimd, but you also asked for help if that’s not possible. We are on the same page here, yes?

Jatin’s modules do not implement his wdf library, so he is using rack’s simd. I am too amateur to hand code (I am too amateur to wrap my head around the surge code either, it is getting me anxiety :grin:)

My bad :face_holding_back_tears:

Anyways, I’ve made it work. Now the problem is I cannot use Rack’s simd ways of getting and setting voltages. I’ve loaded getVoltages() pointer into xsimd::batch but haven’t figured out more optimal way of setting voltages.

Seems to work… I guess… The best way to check whether simd works or not is when 16 channels are processing faster than 7 channels, right? :face_with_raised_eyebrow: (joking)

const int channels = inputs[INPUT_INPUT].getChannels();
float* voltage_ptr = inputs[INPUT_INPUT].getVoltages();
int batch_size = xsimd::batch<float>::size;
for (int x = 0; x < channels; x += batch_size)
{
     int batch_index = x / batch_size;
     input_batch[batch_index] = xsimd::load_aligned(&voltage_ptr[x]);
     input_batch[batch_index] = xsimd::sin(xsimd::tanh(input_batch[batch_index])); //TEST MATH
     input_batch[batch_index].store_aligned(&voltage_ptr[x]);
}

for (int o = 0; o < channels; ++o)
{
     outputs[OUTPUT_OUTPUT].setVoltage(voltage_ptr[o], o);
}
			
outputs[OUTPUT_OUTPUT].setChannels(channels);```

Include lib, choose SIMD arch, use XSIMD types & funcs. For polyphony, thread-safe & multi-voice.

Example: #include <xsimd/xsimd.hpp>

xsimd::batch<float, 4> inputBatch(input); inputBatch += 1.0f; inputBatch.store(input);

1 Like

Thank you for the reply. I have managed to make it work, also successfully implemented chowdsp’s variable oversampling.

The thing is xsimd library(latest) I am using, does not allow me to use <float, 4> it expects architecture type I guess instead of “4”, so what I did is create 4 batches, and used xsimd::batch::size to handle loops. I guess this might work on different architecture types and choose whether to use all 4 batches for size 4 and 2 batches for size 8 or even 1 batch for size 16, leaving extra batches hanging around.

BUT I am sure there is a reason behind vcv’s approach of explicitly limiting batch size to 4. “Dynamic” architecture based batch size does not sound stable at all to me.

As of now I haven’t figured out how to limit batch size to 4.

I don’t know what a “batch” is, but it sounds like horizontal SIMD. VCV stuff is 4, because there are 4 elements in the underlying SSE vectors. Most ppl (but not all) use vertical SIMD so that you get the same CPU usage processing 4 channels as one channel, but for one channel you don’t get any speedup.

It’s almost trivial to do this “vertical SIMD”, so a lot of ppl do it. I know Surge and a few others do horizontal SIMD. One side effect is that you then get more than a single sample of delay, and this is not always what you want to do. Unlike in VST where you get a whole “buffer” of delay anyway.

So a ton of VST use horizontal SIMD.

Speaking just for me, my VSTs all use horizontal SIMD, many of my VCV modules have used vertical SIMD.

2 Likes

I suppose the word “horizontal” and “vertical” in your words describe overall use and approach to these batches (basically vectors of values to be processed simultaneously).

What I am trying to achieve is vertical approach solely for polyphony, simultaneously processing single samples from many channels.

Horizontal in my understanding of your words would mean processing many samples of the same input simultaneously (inevitably adding delay). I can’t think of a place where this could be beneficial, but what do I (newbie) know anyway?:smiling_face_with_tear: Maybe offline linear wave-shaper?

There are plenty of uses for horzontal simd: adding to signals is one of them. But, that is clearly off topic if that is not what a “batch” is. sorry.