Double & float

@Squinky Sorry, I was wrong:

A C++ compiler will generate code for the computation of float x, even it is not used. Here is an example:

void func(float y, float z, float a, float b) {
    float x = (y + z) * (a + b) * (y + a);
}

Here is the assembly output, generated at https://godbolt.org with x86-64 gcc 8.3:

func(float, float, float, float):
  push rbp
  mov rbp, rsp
  movss DWORD PTR [rbp-20], xmm0
  movss DWORD PTR [rbp-24], xmm1
  movss DWORD PTR [rbp-28], xmm2
  movss DWORD PTR [rbp-32], xmm3
  movss xmm0, DWORD PTR [rbp-20]
  movaps xmm1, xmm0
  addss xmm1, DWORD PTR [rbp-24]
  movss xmm0, DWORD PTR [rbp-28]
  addss xmm0, DWORD PTR [rbp-32]
  mulss xmm1, xmm0
  movss xmm0, DWORD PTR [rbp-20]
  addss xmm0, DWORD PTR [rbp-28]
  mulss xmm0, xmm1
  movss DWORD PTR [rbp-4], xmm0
  nop
  pop rbp
  ret

Even in this complex expression, there is no need for temporary variables, because everything is calculated using CPU registers. There is no conversion from float to double.

Cool! That assembly looks like sse. Is that now the default for that compiler, or are those options specified somewhere.

Btw, is it your contention that a double could never ever be faster, in any circumstance or with any compiler? Not sure what we are discussing here any longer.

I think the contention is “if you want to know what the compiler will do run the compiler” along with “modern compilers do very smart stuff”

The only way to answer these questions is with godbolt or the -S flag

But I have not seen clang gcc or msvc introducing pointless intermediate doubles when run at -O3. And the compiler is way way better at writing assembly than I am.

2 Likes

64 bit x86 includes sse2 so compilers can use that. It’s also why you don’t need the sse2 flag in 64 but builds but do need the avx flag etc

AFAIK it looks like the default.

No, I never ever said something like that. I wanted to show that there’s no implicit conversion from float to double and vice versa, but e.g. if the variable inside the function is declared as double x, then there will be a conversion after the caltulation is done:

void func(float y, float z, float a, float b) {
    double x = (y + z) * (a + b) * (y + a);
}

The code above compiles to:

func(float, float, float, float):
  push rbp
  mov rbp, rsp
  movss DWORD PTR [rbp-20], xmm0
  movss DWORD PTR [rbp-24], xmm1
  movss DWORD PTR [rbp-28], xmm2
  movss DWORD PTR [rbp-32], xmm3
  movss xmm0, DWORD PTR [rbp-20]
  movaps xmm1, xmm0
  addss xmm1, DWORD PTR [rbp-24]
  movss xmm0, DWORD PTR [rbp-28]
  addss xmm0, DWORD PTR [rbp-32]
  mulss xmm1, xmm0
  movss xmm0, DWORD PTR [rbp-20]
  addss xmm0, DWORD PTR [rbp-28]
  mulss xmm0, xmm1
  cvtss2sd xmm0, xmm0
  movsd QWORD PTR [rbp-8], xmm0
  nop
  pop rbp
  ret

On the other hand, if one of the function parameters is a double, more conversions are taken:

void func(double y, float z, float a, float b) {
    float x = (y + z) * (a + b) * (y + a);
}

The code above compiles to:

  push rbp
  mov rbp, rsp
  movsd QWORD PTR [rbp-24], xmm0
  movss DWORD PTR [rbp-28], xmm1
  movss DWORD PTR [rbp-32], xmm2
  movss DWORD PTR [rbp-36], xmm3
  cvtss2sd xmm0, DWORD PTR [rbp-28]
  movapd xmm1, xmm0
  addsd xmm1, QWORD PTR [rbp-24]
  movss xmm0, DWORD PTR [rbp-32]
  addss xmm0, DWORD PTR [rbp-36]
  cvtss2sd xmm0, xmm0
  mulsd xmm1, xmm0
  cvtss2sd xmm0, DWORD PTR [rbp-32]
  addsd xmm0, QWORD PTR [rbp-24]
  mulsd xmm0, xmm1
  cvtsd2ss xmm4, xmm0
  movss DWORD PTR [rbp-4], xmm4
  nop
  pop rbp
  ret

To know if a code using floats is faster/slower than a code using doubles, we have to know the time (CPU cycles) that is needed for each instruction. There was a time I knew them by hard for an MOS 6510, but I’m sorry, I have no clue about modern CPUs.

well, I think we are in violent agreement about 99% of the issues around float vs. double. And I learned something from @baconpaul - that scalar sse is always used for 64 bit x86. At least with the compilers supported by VCV.

The one miniscule and unimportant 1% where we may disagree is that I think (but can’t prove) that there are or have been cases where using a double is faster.

So, how about we agree on the 99%, I’ll keep my unproven suspicion to myself, and we can assume I’m probably wrong about that anyway.

wow 6510 - My Voyetra-8 synthesizer used the 6502, and yes it was much simpler/possible to know how long something took. Maybe that’s why it’s the assembly language of the Terminator?

I never said using a double in any case could not be faster than using a float.

But I argue for coding VCV Rack modules, to use floats in the first place until there’s a real need for using doubles instead.

The other thing I wanted to bring into this discussion was to stop speculating about what compilers do or not, and to take a closer look what’s going on in details.

OT: The 6510 was/is the CPU inside the Commodore 64, and some decades ago I had access to one of these machines, and it was a whloe lot of fun.

1 Like

Ah. At work I had a “stack” of apple II that I used for embedded development.

I totally agree with that! I am sure you can construct a case where double is faster than float. My only point was: 1) that would be an oddity - I am yet to see it in code I write - so “use float” is a good rule of thumb and 2) if you think you have such a case look at what the compiler did rather than imagine what it did because we are bad at imagining what compilers do, but they will happily tell us :slight_smile:

(And also in audio dsp I’ve only needed to flip to double for precision reasons not performance ones. But that’s just my experience)

2 Likes

Totally agree.

1 Like

For things ending up as indices, such as interpolated delay-times in “electric ensemble”, I use fixed point, (20:12)

6502/65C02? I had both and learned it’s assembly language in my quest to work out how these computer things did anything useful :wink:

I think our first photo had two 1mhz 6502, but the next one had one 2Mhz 65C02? Didn’t have a linker so each code module assembled to 256 bytes and I manually tracked jump table addresses to call from one to a different one.

As mentioned before, I didn’t have as assembler so I had to do everything in 6502 machine code on my Apple II+ back around 1980-82 . And yet I produced a very sophisticated A.I. puzzle solving application, which I saved to cassette tapes at first but finally got a couple of floppy drives. I also had an Epson MX-80 dot matrix printer on which I printed out many fan-fold pages of the machine code. Somewhere along the way I got an assembler.

Those were fun days;)

I had an max-80. I had the stack of three apples so one could build, one acted like a terminal for debugging, and the third one was for printing.

1 Like

I had a whopping 48K of RAM. How much did you have in each?

When I started to work in the semiconductor industry in 1977 as a wafer-fab photolithography process engineer, our largest memory chip was 4K bits!

Not that much. I think whatever was standard? 16k?

I think the Voyetra 8 started with like 1k byte of cmos battery backed up ram, but I re-designed the cpu card to have 4 2kX8 rams for a whopping 8k byes. That was one of the early updates. Added a sequencer and some other stuff. Maybe that’s when midi was added.

1 Like

It is amazing what we could do with a few thousand bytes of machine code though.

3 Likes