Pink Trombone in VCV - Help Wanted

secretcinema · April 13, 2022, 5:31pm

Nostalgically refreshing, nice work!

k-chaffin · April 13, 2022, 8:19pm

Stitching together phonemes such that the result sounds natural is a very fun but very challenging endeavor. Many (30) years ago I attempted this by doing FFT analysis of each phoneme and then either cubic or quintic polynomical interpolation or a fourier series synthesis between the two phonemes. I had a lot of fun with this but the results were less than satisfactory.

k-chaffin · April 13, 2022, 10:17pm

I had to go read about Toki Pona. Very interesting, especially for a musical minimalist fan.

k-chaffin · April 13, 2022, 10:37pm

Not to hijack your topic, but here are a couple of my past experiments from 5 years ago in musical phoneme stitching using Yamaha Vocaloid Cyberdiva in conjunction with Ableton Live.

By the way, if anyone is interested, several of the images on my SoundCloud songs comes from various R&D projects (some commercial) in A.I., hippocampus simulations, MRI 3D visualization etc. These were mostly CUDA GP-GPU massively parallel 3D simulations rendered in Unity. So, my hippocampus simulation processed incoming audio and video in real time using the CUDA code. The hippocampus neural network was “wired” up procedurally in C# in Unity before being passed on to CUDA and passed back to DirectCompute and eventually rendered in Unity. I used shared textures as data objects as well as display objects. It was a lot of fun.

Vega · April 14, 2022, 12:48am

Just pushed something to clamp a lot of values and internally attenuate CV inputs in a sane way. It should be much easier to use now, and be MUCH harder to crash. It’s still possible, but seems to usually only happen with audio rate mod. If anyone can narrow done the cause of any more of the lock-up states (especially those that output the evil high pitch, +/-10V (20Vpp) square wave) I’d appreciate it, but afaik you have to be almost trying to crash it to do so. Honestly, it might be safe to push it to the library soon if I can get some people to compile it themselves and test it

Vega · April 14, 2022, 5:47am

Swapped out the unused knob for some LEDs showing state and put the name on the panel.

Unfortuantely, CPU usage is awful - I’d sort of been avoiding looking until now:

For comparison sake, that’s roughly the same as Frozen Wasteland’s Portland Weather on my system. Unfortunately, optimizing this will probably be a bit beyond me as while I’m not a total DSP noob, the black (pink?) magic behind the trombone is a bit outside my comfort level.

So, after a few people test it and at least confirm it’s not horribly busted (so, complie it, try it, report back here) I’ll send it into the library.

Ahornberg · April 14, 2022, 6:41am

I did a quick performance check on your code and found that both Tract::setRestDiameter() and Tract::setConstriction() have nearly equally high CPU usage. There are 3 for-loops with a long as counter. Maybe take a look at those loops.

Vega · April 14, 2022, 6:55am

That’s falling into the code I didn’t write. I’ll still see if I can come up with anything though.

Ahornberg · April 14, 2022, 7:20am

For optimizing Tract::setRestDiameter() try to figure out what min-value is possible for this->tractProps->bladeStart and what max-value is possible for this->tractProps->lipStart.

For optimizing Tract::setConstriction() try to figure out what max-valueis possible for width.

This will give you an overview how often the for-loops are cycled through.

From my perspective, in both loops cos() might be the problem. Maybe rewrite these loops so that you can use simd-instructions. Rack already has float_4 rack::simd::cos(float_4 x) ready to use.

Squinky · April 14, 2022, 2:45pm

Cos inside a loop run every sample sound like the perfect storm of terrible performance!

Vega · April 14, 2022, 5:50pm

Changed everything to use rack::simd, can’t believe I missed that given I did the same thing in a previous module. That took it down 20% or so, but it’s still pretty bad. I’ll give it a more serious look later.

Xenakios · April 14, 2022, 6:26pm

To get the full benefits of the SIMD library, you would need to actually utilize the parallelism those instructions/functions provide. However, that would make it necessary to do quite big rewrites and the benefits might mostly happen for polyphonic use of the module.

carbon14 · April 14, 2022, 6:30pm

If its not polyphonic, then there are fast cos implementations that are probably faster than the simd version.

Ahornberg · April 14, 2022, 6:32pm

Using simd means making 4 calculations of the same type at once. I did a quick rewrite of the loop in Tract::setRestDiameter() here:

void Tract::setRestDiameter(sample_t tongueIndex, sample_t tongueDiameter)
{
	this->tractProps->tongueIndex = tongueIndex;
	this->tractProps->tongueDiameter = tongueDiameter;
	int count_4 = 0;
	rack::simd::float_4 t_4;
	rack::simd::float_4 cos_4;
	for (long i = this->tractProps->bladeStart; i < this->tractProps->lipStart; i++)
	{
		if (!(count_4 % 4))
		{
			// calc t 4 times
			for (auto j = 0; j < 4; ++j)
			{
				t_4[j] = 1.1 * M_PI * (sample_t)(tongueIndex - i + j) / (sample_t)(this->tractProps->tipStart - this->tractProps->bladeStart);
			}
			cos_4 = rack::simd::cos(t_4);
		}
		sample_t fixedTongueDiameter = 2 + (tongueDiameter - 2) / 1.5;
		sample_t curve = (1.5 - fixedTongueDiameter + 1.7) * cos_4[count_4 % 4];
		if (i == this->tractProps->bladeStart - 2 || i == this->tractProps->lipStart - 1)
			curve *= 0.8;
		if (i == this->tractProps->bladeStart || i == this->tractProps->lipStart - 2)
			curve *= 0.94;
		this->restDiameter[i] = 1.5 - curve;
		++count_4;
	}
	for (long i = 0; i < this->tractProps->n; i++)
	{
		this->targetDiameter[i] = this->restDiameter[i];
	}
}

On my system, it gives me about 40% less CPU usage.

The other loop in Tract::setConstriction() seems to be more difficult to rewrite.

Vega · April 14, 2022, 8:13pm

Yeah, I know. Single Instruction Multiple Data and all that. I’m just glad to reap the benefits of the faster implimentitions for now at least. I’ll try rolling the for loops later.

edit: like @Ahornberg apparently has done for me!

edit2: Hmm, I’m not seeing the same performance improvement from actually packing the data like @Ahornberg wrote. I’m not sure why

Ahornberg · April 14, 2022, 8:31pm

Firstly, the performance measurement inside VCV Rack is not that accurate. It depends on what modules are placed in the actual patch.

Then performance inprovements using simd also depend on the CPU itself. I’m running Windows 10 on an Intel i7-8700.

Ahornberg · April 15, 2022, 7:18am

That’s right. I tried FastTrigo by Robin Lobel and did a quick hack by replacing all sin() cos() and sqrt() which gives a vast performanve improvement on the Pink Trombone without using simd.

@Vega I made a pull request on your repo.

Squinky · April 15, 2022, 4:40pm

Yes, this is the best approach - most bang for the buck. Long ago I wrote a mixer “Mixer 8” that was based on the AS Mixer module. The first thing I did was replace all the sin and cos with a lookup table. It made it 8X faster, and took me like an hour.

Vega · April 16, 2022, 3:25am

I replied in the PR, but putting it here for others to comment:

I see FastTrigo depends on <intrin.h> ,xmmintrin.h> , and pmmintrin.h . Following the first google result, https://stackoverflow.com/questions/2520683/how-to-cope-with-intrin-h-no-such-file-or-directory, I see this is a MSVC thing, but I’m on Linux and using GCC. I got that working by doing as that answer says and using #include <x86intrin.h> , but the performance improvement seems small. I assume you’re building on Windows and using MinGW as per the normal plugin dev guide - any idea if you can cross compile for Linux with the MSVC headers? I just want to be sure this doesn’t preclude me from getting the Pink Trombone into the library.

also, I can only assume if we’re using intrinstics it will make a pain for anyone wanting to compile for a pi or eventually M1 Apple hardware - so that could be an issue from the start.

Ahornberg · April 16, 2022, 6:48am

I removed the dependencies to <intrin.h>, <xmmintrin.h>, <pmmintrin.h> by shrinking down the number of functions provided by FastTrig to only FTA::sin() and FTA::cos().

On my Windows-system PinkTrombone used around 12.5% CPU before my changes, now it uses around 5.5% CPU (indicated by the VCV Rack performance meter).