Double & float

NOI · January 4, 2023, 8:01am

Hello !

I’m pretty new to development I’ve seen that in some cases double are faster than float due to the way cpu compute. Do you have a rule of thumb for choosing between double and float ? Is there a major difference in cpu load using one or the other ?

Have a nice day !

Ahornberg · January 4, 2023, 8:09am

May you give us some examples with benchmarks?

Vortico · January 4, 2023, 8:28am

I’m also curious what examples you’ve seen, since I’ve never seen an example of this in practice.

At worst, floats can have roughly the same instruction latency as doubles, but at best, memory transfers are twice as fast, SIMD instructions can compute twice as many floats as doubles, and special functions like sqrt and exp require fewer iterations to reach ULP accuracy.

NOI · January 4, 2023, 12:44pm

The source is this thread from stack overflow:

Thanks for this clear answer !

Ahornberg · January 4, 2023, 1:22pm

From my experience in coding Rack modules, you should only use a double if you need more precision than a float can give you. For me, this happened exactly only once: in my module TapeRecorder, for the inter-sample-position of the tape-head I use a double to provide an exact tape-speed-control for tapes longer than about 6 minutes.

For audio or CV data, using double instead of float is simply a waste of memory. You can’t hear any difference. And keep in mind, that nowadays memory is always slower than a CPU. So every time-slice you might win on the CPU will be lost at least more than 2 times on the memory side.

I read the article on stackoverflow. Akash Agrawal wrote: In experiments of adding 3.3 for 2000000000 times, results are:

Summation time in s: 2.82 summed value: 6.71089e+07 // float
Summation time in s: 2.78585 summed value: 6.6e+09 // double
Summation time in s: 2.76812 summed value: 6.6e+09 // long double

So he says, by adding up 3.3 for 2,000,000,000 times, you might win 1,2%. But in audio software you will never do such a calculation. You will write code, that does far more complex calculations on a far bigger amount of data. And this data has to be transferred from and to memory.

What he didn’t address is: the calculation of floats leads to another result than the calculation of doubles. Because both operations do not deliver the same result, they can’t be compared seriously.

Conclusion: It’s good to ask questions like this, but at the end, I go with Kent Beck and “ask the computer”, that means, I write code and if I need a performance tuning (because my module is to slow) I check my code and see where I can improve it. Most performance wins take place after overworking the algorithm and the memory layout of the data.

jnorberg · January 4, 2023, 3:47pm

For storing individual sample values I would recommend floats.

For certain algorithms: IIR-filters and sometimes envelope-followers you might want to consider using doubles.

Very small floats turn “denormal”. CPUs would handle these much worse (think x100) than normal numbers. I think it’s less of an issue on modern CPUs.

Also, as a bonus thought: Don’t discount integers. Many hardware synths and effects used them (and some still do).

Vortico · January 4, 2023, 4:00pm

Denormals are disabled on Rack DSP threads. See initMXCSR() in Engine.cpp.

Squinky · January 4, 2023, 4:07pm

Yep. I use them in very few places, and low frequency iir filters is one of them. Even there you probably can’t hear it, but you can easily measure it.

LarsBjerregaard · January 5, 2023, 6:03am

Being completely naive about DSP I always wondered why Rack and others use floating point and not integers for audio samples. 32-bit float and integer can represent the same range of values, so are floats faster than ints now? That seems counter-intuitive to me. Or is it just an “industry norm” thing, or…?

Squinky · January 5, 2023, 6:09am

floats are a lot easier. With ints you need to keep re-normalizing and worrying about overflow and underflow. I’ve done tons of DSP with ints “back in the day”, and I would not want to go back. Well, on the Disting I used some awful fixed point thing I came up with , but it wasn’t great.

Also, float actually does represent a much larger dynamic range. Because it’s basically exponential it can maintain 24 bit S/N over an enormous dynamic range. Sure they both have 2**32 unique values, but they way they are encoded makes floats better for audio.

NOI · January 5, 2023, 7:28am

Thanks a lot everyone ! Everythings is much clearer

baconpaul · January 5, 2023, 12:39pm

Surge uses either for wavetables and most factory wavetables are int16

Plaits and other mutable circuits use int16. You can see in the macro oscillator where the value is renormalized by 32k

Dexed and other dx7 emulations are int internal

I think the trick is most modern audio apis like core audio wasapi etc as well as plugin frameworks like juce vst3 clap are based around ±1 floats so you end up getting there at the “edge”

As to to the original question: the rule of thumb for a person new to dsp dev is “always use float”. By the time that is a false you won’t be new to dsp dev any more

jnorberg · January 5, 2023, 5:35pm

Floats and integers actually work really differently.

Specifically in regards to:

Headroom (ints will hardclip or worse “wrap-around”, floats are practically unlimited headroom)
Quantization noise (ints will have q-noise for very quiet signals)

Ints are traditionally much faster for some instructions (add, mul)

When Steinberg introduced VST back in the day they used floats. So it’s been the standard for PC-based dsp since then.

Traditional DSP-chips (Sharc) used integer/fixed-point so it’s been a standard on hardware synths and effect for a while. (Probably starting to move to floats w. move to arm-chips)

As a piece of extra data-point: Korg still uses int32_t for lougue-sdk (logue-sdk/README.md at master · korginc/logue-sdk · GitHub)

I love using integers for:

phases for oscillators, lfos (and rely on them wrapping)
storing sample-values as int16_t to minimize memory usage/traffic (good enough for audio in most cases)
delay-lengths, or anything that will end up indexing a table/buffer

Squinky · January 6, 2023, 12:28am

Well, that’s true by definition, because in C you can only index with an integer, right?

But I assume your delay lines interpolate, so you probably do what I do - I keep the delay time as a float, and when I use it I convert it to an integer and a floating point remainder, use the integer to index into the delay line, then use the fraction to interpolate for “in between values”.

I also do this in all lookup tables - the integer part looks up the value and slope so my lookup tables can interpolate, too. If the lookup tables didn’t interpolate I couldn’t use the for a lot of things that I do - like use them for the gain lookup in my compressor.

I’m guessing you must do something similar.

btw, there are are plenty of FP DSP chips now. I think the current generation of Pro Tools HD uses them? (for the last 15 years perhaps?). And not just ARM architecture. even old analog devices has had FP for a long time: Fixed-Point vs. Floating-Point Digital Signal Processing | Analog Devices

Squinky · January 6, 2023, 7:43am

perhaps in this case all doubles is faster than a random mix of floats and doubles? I don’t know/remember how long it takes got convert between float and doubles…

Also, doesn’t c/c++ store all temporary values as doubles. like

func(float y, float z, float a) {
float x = y * z + a;

in an old compiler especially might do:

func(float y, float z, float a) {
double temp = y * z;
temp += double(a);
float x = float(temp);

maybe??

I can kind of see how some sloppy code and an old compiler could make mixing floats and doubles slower than all doubles… But in any case, what you say is far more germane and realistic. for today in VCV.

Ahornberg · January 6, 2023, 8:10am

I think a compiler, even an old one, would rather do this:

func(float y, float z, float a) {
    float x = y;
    x *= z
    x += a;
}

The best would be to compile such code and inspect the generated assembler code (Kent Beck said: don’t speculate, ask the computer).

Squinky · January 6, 2023, 8:39am

I think the c standard says that all float intermediates are doubles. I think your version would give the wrong answer, as it would truncate the intermediate values, no?

Ahornberg · January 6, 2023, 8:49am

Maybe you’re right, I don’t know. Do you know a paper (e.g. about the c standard) that could tell us more?

Ahornberg · January 6, 2023, 9:06am

One personal comment to the code example:

I think, the following code (see example above) would be skipped by the compliler, because after leaving func, x can’t be accessed any more.

func(float y, float z, float a) {
    float x = y * z + a;
}

I think, we are talking here about the following code:

float func(float y, float z, float a) {
    float x = y * z + a;
    return x;
}

or simply:

float func(float y, float z, float a) {
    return y * z + a;
}

So in this case, the compiler knows the type of the return value in advance, why should the compiler use another type (e.g. double) for internal temporary variables? What would be the benefit of doing so at the cost of one (or more) implicit data-type conversion(s)?

I’m not an expert on compiler-flags, but I assume there are ways to tell the compiler exactly how to deal with internal temporary variables.

Squinky · January 6, 2023, 3:52pm

Oh, for sure my “example” would get optimized away in the real world. When I was working on benchmarking my VCV code years ago I kept running into the compiler optimizing away all my code because it realized I never use the output! I had to make fake input and process the output in order to preserve all my DSP code so I could measure it!

I didn’t find much on this topic, but the one article I did find says “it depends”. They actually analyze some code that is similar to my example. Intermediate Floating-Point Precision | Random ASCII – tech blog of Bruce Dawson

TL;DR - the old K&R book said all math is done internally at double precision, but all the relevant standards are silent on the topic. And these days (and in VCV) we tend to use the compiler options to do all match using SSE instructions.

The article, while somewhat specific to MS VC++, is interesting.

(Looking at the VCV make files I don’t see the use SSE for math is specified, but I think it must be somehow. directly it’s -mfpmath=sse, but maybe that’s set as a side effect of some other option? don’t quite know).

I’m going to stick with my original hunch: There are or were some cases where mixed float/double is slower than all double, and these probably don’t occur in VCV plugins? Still just a guess.