Bug report review request. Anyone good at SIMD?

clone45 · March 12, 2026, 1:19am

Hello everyone,

I use AI for development, and sometimes it’s more knowledgeable than myself on certain topics. Today I spent some time hunting down what seems to be a bug in Rack’s implementation of simd::atan2. Although the bug seems legit, the topic of SIMD isn’t very familiar to me.

I’m hoping that someone with SIMD knowledge might be able to read over this and confirm that this is a legitimate bug, and not just some hallucination. I’m can confirm that the code change recommended by the AI worked for me. I don’t want to take up the Rack team’s time with a bogus bug report.

Thanks in advance, Bret

gist.github.com

https://gist.github.com/clone45/6e07ddf5871dd07fab6b82dd61762cd6

gistfile1.md

# Bug Report: `simd::atan2` returns NaN for zero inputs

## Summary

The SSE implementation of `atan2` used by VCV Rack's SIMD library returns NaN when both inputs are zero. The C standard defines `atan2(0, 0)` as returning 0, and the scalar `atan2f(0, 0)` behaves correctly. The SSE version does not, because of what appears to be a copy-paste error in the upstream `sse_mathfun_extension.h` library that has been present since 2016.

This bug is unlikely to affect most modules, but it will silently corrupt any DSP pipeline that processes FFT bins where both the real and imaginary parts happen to land on exactly `0.0f`. Once a NaN enters a processing chain, it propagates through all subsequent arithmetic and typically destroys the output.

## How I found it

This file has been truncated. show original

clone45 · March 12, 2026, 2:10am

Here’s the duplicate line mentioned in the bug report:

github.com/to-miz/sse_mathfun_extension

sse_mathfun_extension.h

4004c2bda


      
          	v4sf offset = _mm_andnot_ps( x_lt_0, offset0 );
          	offset = _mm_and_ps( x_lt_0, offset1 );
          
          	v4sf arg = _mm_div_ps( y, x );
          	v4sf atan_result = atan_ps( arg );
          	atan_result = _mm_add_ps( atan_result, offset );
          
          	/* select between zero_result, pio2_result and atan_result */
          
          	v4sf result = _mm_andnot_ps( zero_mask, pio2_result );
          	atan_result = _mm_andnot_ps( pio2_mask, atan_result );
          	atan_result = _mm_andnot_ps( pio2_mask, atan_result);
          	result = _mm_or_ps( result, atan_result );
          	result = _mm_or_ps( result, pi_result );
          
          	return result;
          }
          
          /* for convenience of calling simd sqrt */
          float sqrt_ps( float x )
          {

Ohmer · March 12, 2026, 12:29pm

SIMD: the concept isn’t “natural” for my old brain… it’s a pain (and Chinese language when I read some C++ code using SIMD) in order to implement SIMD to handle upto 16-voice polyphony, for 6OP-DX synth voice module!

It is also a dilemma to handle… 6 operators by SIMD.

SIMD = 4 for the price of 1…

andreya.ek.frisk · March 12, 2026, 6:55pm

Ok.

It is true the duplicated line does nothing.

It is also true that atan_result = _mm_andnot_ps(zero_mask, atan_result); does need to happen somewhere. I would put it in after adding the offset at line 295 of the source (its line numbers are off for some reason), and then just remove the duplicate.

It is also right that the bitmask x_le_0 will be true when x is zero, but that mask isn’t used anywhere else anyway so a better fix is to use < instead of <= in the first place. So line 264 becomes: v4sf x_lt_0 = _mm_cmplt_ps( x, *(v4sf*)_ps_0 ); And in line 279 change the x_le_0 to x_lt_0.

clone45 · March 12, 2026, 6:56pm

Thanks for the confirmation.

clone45 · March 14, 2026, 4:40pm

The original author of the SIMD library confirmed and patched the bug:

github.com/to-miz/sse_mathfun_extension

simd::atan2 returns NaN for zero inputs

opened 07:15PM - 12 Mar 26 UTC

closed 04:18PM - 14 Mar 26 UTC

clone45

Hello! I'm a bit hesitant to submit this bug report, because it was identified …by Claude Code during development. My knowledge of SIMD operations is fairly limited, but it's assessment and workaround seems sound. I also had this bug report reviewed by another developer who suggests that it's legitimate as well. # Bug Report: `simd::atan2` returns NaN for zero inputs ## Summary The SSE implementation of `atan2` used by VCV Rack's SIMD library returns NaN when both inputs are zero. The C standard defines `atan2(0, 0)` as returning 0, and the scalar `atan2f(0, 0)` behaves correctly. The SSE version does not, because of what appears to be a copy-paste error in the upstream `sse_mathfun_extension.h` library that has been present since 2016. This bug is unlikely to affect most modules, but it will silently corrupt any DSP pipeline that processes FFT bins where both the real and imaginary parts happen to land on exactly `0.0f`. Once a NaN enters a processing chain, it propagates through all subsequent arithmetic and typically destroys the output. ## How I found it I'm building a phase vocoder for time-stretching audio. The algorithm takes FFT frames, computes the phase of each bin using `rack::simd::atan2(im, re)`, propagates the phase across frames, and resynthesizes the signal. For short audio samples it works perfectly. For longer samples (several minutes), the output would cut to silence partway through. After adding diagnostic logging, I found that millions of output samples were NaN. I traced the first NaN to the `atan2` call, always at FFT bins near the Nyquist frequency (bin 2048 of a 4096-point FFT). These are bins where the signal magnitude is often extremely small, and where floating-point cancellation in the FFT butterfly can produce exactly `0.0f` for both the real and imaginary parts. Longer samples have more FFT frames, which means more opportunities to hit this condition, which is why shorter samples appeared to work fine. ## The root cause The function `sse_mathfun_atan2_ps` in `include/simd/sse_mathfun_extension.h` computes `atan2(y, x)` by dividing `y / x` and then calling `sse_mathfun_atan_ps` on the result. When `x = 0` and `y = 0`, the division produces NaN per IEEE 754 (`0 / 0 = NaN`). The function is supposed to detect this case and return 0 instead of the NaN, and the detection logic does correctly identify the zero-zero case through a bitmask called `zero_mask`. However, it never actually uses that mask to suppress the NaN from the `atan` result. Here is the relevant selection logic, in `sse_mathfun_extension.h`: ```c __m128 result = _mm_andnot_ps(zero_mask, pio2_result); atan_result = _mm_andnot_ps(pio2_mask, atan_result); atan_result = _mm_andnot_ps(pio2_mask, atan_result); result = _mm_or_ps(result, atan_result); result = _mm_or_ps(result, pi_result); ``` atan_result = _mm_andnot_ps(pio2_mask, atan_result); is duplicated. It masks `atan_result` by `pio2_mask` a second time, which has no effect since the operation is idempotent. Based on the structure of the surrounding code, it was almost certainly meant to read: ```c atan_result = _mm_andnot_ps(zero_mask, atan_result); // line 290 corrected ``` This would zero out the NaN-contaminated `atan_result` in any lane where `zero_mask` is set, preventing it from leaking into the final result through the bitwise OR on the next line. Without this fix, the NaN from `atan(0/0)` passes through `pio2_mask` unaffected (because `pio2_mask` is all-zero-bits for the zero-zero case), and then gets OR'd into `result`, producing NaN output. There is also a secondary issue: `pi_result` is non-zero for the `(0, 0)` case because `pi_mask = AND(y_eq_0, x_le_0)` and `0 <= 0` is true. Even with thefix applied, the function would return pi instead of 0 for `atan2(0, 0)`. A complete source-level fix would add one more line before the return: ```c result = _mm_andnot_ps(zero_mask, result); // force 0 where zero_mask is set ``` ## Origin This code comes from Tolga Mizrak's `sse_mathfun_extension` library (https://github.com/to-miz/sse_mathfun_extension). VCV Rack's copy notes that the only modifications were making functions `inline` and converting global constants to function-scope locals. The duplicate line exists in the upstream repository, which has had no issues or pull requests filed across its entire history. ## Workaround for plugin authors If you use `rack::simd::atan2` in your module and there is any possibility that both inputs could be zero, you can sanitize the output with one extra SSE instruction: ```cpp float_4 phase = rack::simd::atan2(im, re); phase = phase & (phase == phase); // replace NaN lanes with 0.0 ``` This works because NaN is the only float where `x == x` evaluates to false. The comparison `phase == phase` maps to `_mm_cmpeq_ps`, which returns all-zero-bits for NaN lanes and all-one-bits for valid lanes. The bitwise AND then zeroes out only the NaN lanes, leaving all other values untouched. ## Reproducing The simplest reproduction is to call `rack::simd::atan2` with both arguments set to zero and check the result: ```cpp float_4 zero(0.f); float_4 result = rack::simd::atan2(zero, zero); // result[0] is NaN (expected: 0.0) ``` For comparison, the scalar version behaves correctly: ```cpp float result = atan2f(0.f, 0.f); // result is 0.0 ``` In addition, another developer commented: > It is true the duplicated line does nothing. > > It is also true that atan_result = _mm_andnot_ps(zero_mask, atan_result); does need to happen somewhere. I would put it in after adding the offset at line 295 of the source (its line numbers are off for some reason), and then just remove the duplicate. > > It is also right that the bitmask x_le_0 will be true when x is zero, but that mask isn’t used anywhere else anyway so a better fix is to use < instead of <= in the first place. So line 264 becomes: v4sf x_lt_0 = _mm_cmplt_ps( x, *(v4sf*)_ps_0 ); And in line 279 change the x_le_0 to x_lt_0.

I’m drafting an email to vcv rack right now to report the bug.

andreya.ek.frisk · March 14, 2026, 6:28pm

Great!

clone45 · March 14, 2026, 7:57pm

Thanks for your help. Interesting times we live in when your programming agent identifies a 10 year old bug in SSE2 implementation.