Tips on Managing Block Based Processing (Short Time Fourier Transform, etc...)

jaakjensen · December 31, 2021, 2:43am

Hello all,

I’m wondering if anyone has any tips on managing frame-based processing in VCV Rack. I’ve got some code pulled together for a real-time pitch shifter using a short time fourier transform but I’m not sure how to manage the frame processing within the VCV rack framework.

Summary of my thoughts/where I’m at:

Short time fourier transform typically requires >512 samples to be collected before the processing can be started
Once you’ve gathered a block of 512 samples, you have to do the STFT and all your processing within a single sample (because VCV rack calls Process() every sample)… which is a lot to do in very little time… herein lies my dilemma.
Ideally, all the processing would occur in a lower priority worker thread… but I come from the land of embedded devices and I don’t typically use such features. I’ve seen: How do you buffer data for block processing - #10 by Xenakios where @Squinky and @marc_boule discuss the use of “worker threads” which can operate outside of the main Process() block. Looking over the “MindMeld EQMaster” source code, I see them being used, but there’s a lot to absorb there and I’d like a softer introduction. Is this just a typical feature in C++11? I understand threads and mutex conceptually but would like some more guidance on how to use them properly within VCV.

Xenakios · December 31, 2021, 2:52am

Even though Rack calls the process() method each sample, that isn’t a hard limit on how long your process method is allowed to run, since everything is ultimately running at the hardware buffer size anyway. Rack just needs to get everything done during the driver callback. So say, if the sample rate is 44100 Hz and the buffer size is 512 samples, all the modules need to get their things done in under about 11 milliseconds. You could start overthinking all this and involve worker threads and such but it likely isn’t going to matter much in the end. If you do the threading incorrectly, it might easily lead to worse performance, not better. I am sure people will disagree with this, though.

jaakjensen · December 31, 2021, 3:08am

Hmm… but how to manage context switching when Rack calls your process method again?

For example, let’s say I finally get a block of 512 samples and I start doing my FFT or processing my Reverb or whatever, but then halfway through my FFT, rack interrupts my processing with another Process() call?

Xenakios · December 31, 2021, 3:11am

The process() calls are not interrupts, they are regular function calls that run to the end and another one can’t happen during the execution.

Squinky · December 31, 2021, 3:30am

I wouldn’t advise running a big fft on the audio thread. You will block everyone else from running. I think most of us run the FFT on a worker thread. @Xenakios : what plugins do you have out that do block fft processing on the audio thread?

jaakjensen · December 31, 2021, 3:32am

Hmmm… well, I was recently working on a reverb module where I tried to do block based processing and I was getting tons of clicks and pops when I made the block size greater than 128 samples. I assumed it was because it was too much for one Process() loop. Here’s a simplified version of the code where I use double buffering. Maybe my issue is that I grab the inputs, do the processing (if a block is ready), and then write the output samples?

void process(const ProcessArgs& args) override {

	// grab input and store it in double buffer -> called every sample
	{
		// scale to +/- 1 and store in buffer
		doubleBuffer[writePtr] = inputs[INL_INPUT].getVoltage() * 0.2f;
		writePtr++;
		if(writePtr == halfBufferSize) {
			halfComplete = true;
		} else if(writePtr == bufferSize) {
			fullComplete = true;
		}
		writePtr = writePtr % bufferSize;
	}

	if(halfComplete || fullComplete) { //Process samples in double buffer if buffer is half full

		//Clear condition, set offset
		size_t offset = 0;
		if(halfComplete) {
			offset = 0;
			halfComplete = false;
		} else if (fullComplete) {
			offset = halfBufferSize;
			fullComplete = false;
		}

		//Process the whole block here using something like -> Process(float* buffer, size_t bufferSize)
		reverb->Process(&doubleBuffer[offset], halfBufferSize);
	}

	// set output from double buffer -> called every sample
	{
		outputs[OUTL_OUTPUT].setVoltage(doubleBuffer[readPtr] * 5.0f);
		readPtr++;
		readPtr = readPtr % bufferSize;
	}

}

Xenakios · December 31, 2021, 3:40am

Is the reverb::process particularly heavy to run?

Anyway, you may want to go ahead with the worker thread approach. Seems unnecessarily complicated to me, though.

jaakjensen · December 31, 2021, 3:54am

all pass filters, feedback delays, delay taps, lots of lfos, lots of interpolation… etc… it works no problem processing things 1 sample at a time (1-3% cpu on my computers) but in theory it should run a little bit smoother when using a block based approach. But like I said, it starts dropping samples when I increase the size greater than 128 samples.

I assume the processing for an STFT will be on par or worse, since I plan on processing blocks of 512 or 1024.

almostEric · December 31, 2021, 4:23am

shrug I do FFT’s on Audio threads, as Xenakios mentions as long as you are not doing an FFT on every sample it works fine

Squinky · December 31, 2021, 5:20am

well, glad that’s working for you guy. I guess it’s a big world out there - more than one way to solve every problem, right?

btw, do you guys have a suggestion for @jaakjensen ? I suggested using a worker thread. Pretty sure that would be one way to solve the problem.

Squinky · December 31, 2021, 5:21am

yeah, could be. btw - what are your plugins? I’m quite familiar with @almostEric 's. They are pretty cool. Don’t know which ones you make, however.

almostEric · December 31, 2021, 5:23am

I basically have a buffer and when it is full, I do the FFT, since the buffer is usually at least 512 samples, that means I only do an FFT every 512 process calls, and that doesn’t seem to stress the CPU too bad

Squinky · December 31, 2021, 5:26am

I don’t understand. I usually set the buffer size of my interface to 128 samples, but sometimes 64. Some people even use 32. Which buffer is usually 512 samples?

synthi · December 31, 2021, 6:05am

I use a worker thread for the fft when block is filled and, of course, double buffering for the sample accumulation and play

generally I use always 2 static block of my defined MAXSIZE and a single pointer for filling and one for ffting (exchanging it)

2 flags per block: readytofft, readytobefilled IIRC

Squinky · December 31, 2021, 6:07am

yes, I know that use use a separate thread for your FFT stuff. I do the same in Colors, where I use inverse FFT to make the colored noise. I use worker thread also in SFZ player, because I don’t want to do file I/O on the audio thread, although I know there are people who do that.

jaakjensen · December 31, 2021, 6:33am

Nice, ok so it sounds like worker threads are a possibility/good place to start. Thanks to everyone who has chimed in so far.

Does anyone have a good soft start resource on using worker threads? Maybe a particular site that captures most of the things I should investigate?

Or links to repos for other open source modules where people have used them?

jaakjensen · December 31, 2021, 6:34am

Oh nice! Are Colors and SFZ on the squinky labs repo?

Squinky · December 31, 2021, 6:54am

Yes, and thy both use the same thread thing. It’s pretty limited, but it works fine.

Xenakios · December 31, 2021, 7:07am

If using a worker thread with FFTs is essential, why doesn’t the VCV Fundamental Noise do that, then?

github.com

VCVRack/Fundamental/blob/45b4a6ebfa6790611ef2b181470b21f910b2a44c/src/Noise.cpp#L31

    
      
          			if (diff & (1 << i)) {
          				values[i] = random::uniform() - 0.5f;
          			}
          			sum += values[i];
          		}
          		return sum;
          	}
          };
          
          

          
struct InverseAWeightingFFTFilter {
          	static constexpr int BUFFER_LEN = 1024;
          
          
	alignas(16) float inputBuffer[BUFFER_LEN] = {};
          	alignas(16) float outputBuffer[BUFFER_LEN] = {};
          	int frame = 0;
          	dsp::RealFFT fft;
          
          
	InverseAWeightingFFTFilter() : fft(BUFFER_LEN) {}
          
          
	float process(float deltaTime, float x) {

Squinky · December 31, 2021, 7:10am

It’s not essential, it’s just a good idea. But you would have to ask Andrew. So you’ve never written a module? Why not?