the maths here are specifically iterative maths, and in a sense doubly so. The output is derived from a sinusoidal iterative equation, but I dont want the output to step through each iteration, I want it to follow one specific iteration as that changes over time. And I want this to be done on a minimum of four independent outputs, and its in 2D. As far as I can tell, this simply requires a for loop, and no way to get around it. I have to build the iterations in a loop, output a specific one (e.g. #8), then step time forward and run the loop again, outputting the new #8, meaning I have to run this loop every sample, with 8 or more computation-heavy equations run each time (for 4 pairs of outputs). I have tried something more like this in order to build a buffer and therefore do so less often:
void process(const ProcessArgs& args) override {
if (loopCounter % sample_build == 0) {
build_Samples(args);
}
read_to_Output(args);
loopCounter++;
loopCounter %= sample_build;
}
void build_Samples(const ProcessArgs& args) {
// this should -theoretically i guess- simply create a ābufferā of āsample_buildā samples, already tuned to Racks sampleRate.
for(int k = 0; k < sample_build; ++k) {
//set starting conditions for timestep
for(int i = 0; i < max_iterates; ++i) {
//do iterative maths for each of the outputs
if (i == desired_iterate) { // do this for each
//emplace_back into x and y vectors for this output
//or with arrays make arrayX[k] = x, etc.
}
}
increment_Time(); // this is where tuning happens, just like building a phase for a sine wave
}
}
but for reasons Iām sure someone who knows anything about the internal workings of Rack would know, this doesnāt increase performance all that much, and causes what looks like a general reduction in bit depth, related to the number of samples I buffer.
this plugin in its current technically working state can run 4 2D outputs simultaneously, but it consumes around 30% cpu. Iām certain there are a good number of structural optimizations that could be made, again I am quite new to this, but I am concerned that this may simply be the wrong kind of math for this environment, being chaotic dynamical systems and all.
If you enclose your code in triple backquotes it will be readable
But the general question hereās how I think of it
You have at most 1/48000th of a second to return from process. Rack internally is per sample but sound cards provide some buffer but doing n work every n samples wonāt help that much.
So in your case you just need things to go faster.
Couple of ways you can do that.
If you know anything about the smoothness or functional form of the answer do the heavy computation less. For instance if you can compute the values and their first derivatives you could do a four sample Taylor series.
You could profile your code to see if it is slow. Can you replace sin with an approximation? Things like that
Or in your case since you are computing 8 fields you could use simd parallelizing and recast as a vector operation. This allows the cpu to do 4-wide float ops rather than single but requires some coding expertise which isnāt that common
in my seaching I have come across the use of the simd float4, but mostly in the context of polyphony. if Iām going to be assigning independant values to each of the four slots, would that not still need a loop, and end up essentially the same as the vectors Iām already building time-wise? I have indeed not been able to fully understand their usage. I like the idea of using a faster sin approximation, and will do that at least for sure (its chaos, inexactitude is what gives rise to a lot of the interest), but I dont think im quite high enough in the calculus ladder to determine if a taylor series approach would even work here, as the points Iām tracking are dependent both on initial conditions (where time physically moves a starting location) And on which iteration on those conditions, which I can only know the value of by knowing the one before it, etc.
Is there a Rack module function that runs faster than the sample rate that I could use to build samples in the ābackgroundā or would that inevitably intrude on other moduleās time?
you could spin up a thread and in the background make things ahead but thats tricky and a bit antisocial. and often way less performant if you donāt know all the tricks
Sin is insanely slow on some platforms. donāt use it! The approximation in the VCV SDK is not bad, btw.
You can for sure make math heavy modules, but it requires some programming skill to do it well. for a silly example, my old āLFNā runs white noise through a five band graphic EQ and uses almost no CPU.
Here is an article I wrote years ago on this that some ppl like. here
If you want to seem some examples (and source code) for some things, my āBasic VCOā has two sines - one fast using sine approximation, one very pure using a lookup table.
My āOrgan 3ā sums up a sine at a different pitch for each drawbar on a Hammond, so several sine waves per note that need to be looped over every sample. It also doesnāt use much CPU. Itās here
Your demo modules are in fact how I got started with all this! Right before I finished trying things last night I threw in the pade approx that you used in the vco2, but something went real wierd so Iām gonna have to troubleshoot. Is it a problem for these approximations to be taking the sin of large numbers?
yes, there sure is. most ones I know use something like a taylor series, and they are only accurate for small values, like plus or minus pi.
Oh, I see the one in VCV isnāt quite as restrictive:
" The code is the exact rewriting of the cephes sinf function.
Precision is excellent as long as x < 8192 (I did not bother to
take into account the special handling they have for greater values
ā it does not return garbage for arguments over 8192, though, but
the extra precision is missing)."
Indeed. Iāve spent much of today looking at the taylor series in more depth, and I am uncertain how, when coded, that many calls to pow, and that many factorial loops(if I want accuracy from -2pi - > 2pi) , could be faster than a single call to sin if I am to write out the series manually to an appropriate degree. Am I right in guessing that the approximations Iāve heard of here are essentially algorithmic approximations of this kind of function?
That is a wonderful amount of precision for the vcv version though, will have to drop that in there tonight, but in the meantime Iāll try finding cephes function and putting it into my little portable testing program.
Umm⦠Polynomials use exponents? Which if I donāt wanna write x * x * x * x etc, or do a recursive loop, is a call to pow isnāt it?
Like I know Iām new to coding but⦠What??
Also yes weāre talkin about sin functions but I am not producing sin waves (as what iām taking the sin of is another polynomial dependent on previous positions) and besides using the approximations I am in fact largely imitating You structurally for waveform generation, at least from the demo-modules structure. Still, I can smell a dismissal when I read it, so, peace.
x * x * x is not an exponential function. At least how the term is commonly used. yes, it is x ** 3. But usually exponential means k ** x. often k == e, and āexponentialā mean e ** x or āe to the xā. x ** e is not exponential in x. e ** x is.
Of course you may define the phrase āexponential functionā any way you want - thatās fine. You can google āexponential vs polynomialā if you want to learn more about how these terms are commonly used.
Oh, and usually y = x * x * x is much faster than y = pow(3, x).
I think your question was about how to do math quickly in c++. It is eminently doable, but Iām sorry if this doesnāt help. I, too, wish you luck in your endeavors.
There are many ways to calculate a polynomial on a computer without involving pow. VCV includes a couple of them.
The simplest way is to simply have an āaccumulatorā for the value for x ^ n, like this function from VCV Rack:
template <typename T, size_t N>
T polyDirect(const T (&a)[N], T x) {
T y = 0;
T xn = 1;
for (size_t n = 0; n < N; n++) {
y += a[n] * xn;
xn *= x;
}
return y;
}
Instead of using pow, the previous x ^ n is kept, and then multiplied by x. Multiplications are fairly fast, so this simple version is efficient enough for most applicationsā¦
Not so much for audio, though, which is why VCV Rack also includes a function using Hornerās Method (polyHorner) and one using Estrinās Method (polyEstrin)
You can see an example of those functions being used in VCV Rackās exp2_taylor5 function, itās an approximation of 2 ^ x using a polynomial calculated using polyHorner.
and that many factorial loops(if I want accuracy from -2pi - > 2pi)
You pre-calculate the coefficients of the Taylor series polynomial and insert them into the code, that way, you donāt have to calculate anything for the coefficients at runtime.
Are the simd::sin() and simd::cos() functions optimized, I mean beyond the handling of 4 floats at a time? Do they use any kind of fast approximation of sine and cosine?
Precalculating coefficients: duh, yeah I should totally do that, silly me. This is where I start getting lost with code though, Iām seeing very specific things that imply (to me) specific use cases, like 2^X has a lot of good uses, but that function canāt give me X^any-real-number, or do fractional powers on a coordinate system, so I have no idea if it or an alteration of it functions in my particular very niche case without trying and likely failing repeatedly and tediously cuz I donāt fully understand the functionality yet. I guess generally(haha ironic) I am having trouble taking specific functions, generalizing them, then specifying down into my very different usages. I also have absolutely no āofficialā learning to speak of, which Iām sure makes me a nuisance to talk to.
I showed exp2_taylor5 only as an example of how to use those functions for evaluating polynomials included in VCV Rack. The idea is that you want to generate a Taylor polynomial approximation of sin with N coefficients, and use a function like polyHorner to evaluate it.
Note that thereās also many other ways to approximate sin. It all depends on your precision requirements. The VCV VCO does it like this, for example.
It uses this function. I presume itās somewhat optimized, but I donāt know if thatās fast enough to be using it heavily in process.
For types other than float_4, it just aliases std::sin.
Is the formula you are trying to implement something we can look at on GitHub or some such?
If you are doing repeated sin of a polynomial in a chaotic system or some such then yeah things like horner for the poly, caching coefficients, never calling pow when you can multiply, and finding a stable approx for sin will matter
But also profile the code and find out where the time is going probably by running it outside of rack ina test harness
std::pow(x, n) is optimal for small integer exponents.
The optimized assembly basically the same, and might actually be faster than my āmanualā version.