Hello,
I’m working on mixer modules and would like to decrease the CPU load. As SIMD seems nice to apply the same instructions to multiple datas, I’m trying to get advantages from it when applying the same instructions to multiple tracks (from individual inputs, not polyphonic inputs). I saw many examples using the SIMD to process the channels of polyphonic modules but not to do something like I’m trying.
By using SIMD on one of my mixers I noticed than the CPU load is a bit bigger than with the “not SIMD” version.
Does it makes sens to use SIMD this way?
Is there something obvious making this code not effcicient (it works as expected but the CPU load is worst compare to non SIMD version)?
My approach was :
To load the inputs and param in float_4 vectors. Is it a problem to get branches here since I don’t use the float_4 vectors to execute same instructions on multiple datas? (This is a simplify version of the code to focus on the simd relative parts)
simd::float_4 s_v[2] = {0.0f};
simd::float_4 s_gain[2]= {0.0f};
simd::float_4 s_CV[2]= {1.0f};
simd::float_4 s_pan[2]= {1.0f};
simd::float_4 s_BusL[2] = {0.0f};
simd::float_4 s_BusR[2] = {0.0f};
int vectorCount = 1;
for (int i = 0; i < 4; i++)
{
s_pan[0][i] = params[PAN_PARAM + i].value;
if (inputs[AUDIO_INPUT + i].isConnected())
{
s_v[0][i] = inputs[AUDIO_INPUT + i].getVoltage();
if (inputs[CV_INPUT + i].isConnected() )
{
s_CV[0][i]= inputs[CV_INPUT + i].getVoltage();
}
s_gain[0][i] = params[TRACK_LEVEL_PARAM + i].value;
}
if (inputs[AUDIO_INPUT + i + 4].isConnected())
{
vectorCount = 2;
// and same with i + 4 for the second float_4 vector E.G:
// s_pan[1][i] = params[PAN_PARAM + i + 4].value;
}
}
Then use these to apply the gains depending on the gain param and the CV. (No branches)
for (int i = 0; i < vectorCount; i++)
{
s_CV[i] /= 10.0;
s_v[i] *= simd::clamp(s_CV[i] , 0.f, 1.f) * simd::pow(s_gain[i], 2.0);
}
Then use s_v to display the vumeters and finally sum the left tracks and right tracks multiply by a gain calculted from the pan values (and multiply these by the master gain (branches avoided by using simd::ifelse) .
// process vu
for (int i = 0; i < 4; i++)
{
if (processingFrame)
{
for (int vc = 0; vc < vectorCount; vc++)
{
s_v[vc].store(trackSignal_for_Vumeter1);
vuTrack[i + vc * 4].process(args.sampleTime * trackVuDivider.getDivision(), trackSignal_for_Vumeter1[i] / 10.f);
}
}
}
// stereo bus routing and apply master gain
float master_gain = params[MASTER_LEVEL_PARAM].value;
for (int i = 0; i < vectorCount; i++)
{
s_BusL[i] = s_v[i] * simd::ifelse(s_pan[i] >= 1.0f, 1.0 - ((s_pan[i]) - 1.0), 1.0) * master_gain;
s_BusR[i] = s_v[i] * simd::ifelse(s_pan[i] >= 1.0f, 1.0, s_pan[i]) * master_gain;
}
// summing
float outL = 0.0;
float outR = 0.0;
for (int i = 0; i < vectorCount; i++)
{
s_BusL[i].v = _mm_hadd_ps( s_BusL[i].v , s_BusL[i].v );
s_BusL[i].v = _mm_hadd_ps( s_BusL[i].v , s_BusL[i].v );
outL += s_BusL[i][0];
s_BusR[i].v = _mm_hadd_ps( s_BusR[i].v , s_BusR[i].v );
s_BusR[i].v = _mm_hadd_ps( s_BusR[i].v , s_BusR[i].v );
outR += s_BusR[i][0];
}
I’m not familiar with SIMD and I know it could be pretty difficult to use these efficiently, advices are welcome, thank you