Hello! I'm a bit hesitant to submit this bug report, because it was identified …by Claude Code during development. My knowledge of SIMD operations is fairly limited, but it's assessment and workaround seems sound. I also had this bug report reviewed by another developer who suggests that it's legitimate as well.
# Bug Report: `simd::atan2` returns NaN for zero inputs
## Summary
The SSE implementation of `atan2` used by VCV Rack's SIMD library returns NaN when both inputs are zero. The C standard defines `atan2(0, 0)` as returning 0, and the scalar `atan2f(0, 0)` behaves correctly. The SSE version does not, because of what appears to be a copy-paste error in the upstream `sse_mathfun_extension.h` library that has been present since 2016.
This bug is unlikely to affect most modules, but it will silently corrupt any DSP pipeline that processes FFT bins where both the real and imaginary parts happen to land on exactly `0.0f`. Once a NaN enters a processing chain, it propagates through all subsequent arithmetic and typically destroys the output.
## How I found it
I'm building a phase vocoder for time-stretching audio. The algorithm takes FFT frames, computes the phase of each bin using `rack::simd::atan2(im, re)`, propagates the phase across frames, and resynthesizes the signal. For short audio samples it works perfectly. For longer samples (several minutes), the output would cut to silence partway through.
After adding diagnostic logging, I found that millions of output samples were NaN. I traced the first NaN to the `atan2` call, always at FFT bins near the Nyquist frequency (bin 2048 of a 4096-point FFT). These are bins where the signal magnitude is often extremely small, and where floating-point cancellation in the FFT butterfly can produce exactly `0.0f` for both the real and imaginary parts. Longer samples have more FFT frames, which means more opportunities to hit this condition, which is why shorter samples appeared to work fine.
## The root cause
The function `sse_mathfun_atan2_ps` in `include/simd/sse_mathfun_extension.h` computes `atan2(y, x)` by dividing `y / x` and then calling `sse_mathfun_atan_ps` on the result. When `x = 0` and `y = 0`, the division produces NaN per IEEE 754 (`0 / 0 = NaN`). The function is supposed to detect this case and return 0 instead of the NaN, and the detection logic does correctly identify the zero-zero case through a bitmask called `zero_mask`. However, it never actually uses that mask to suppress the NaN from the `atan` result.
Here is the relevant selection logic, in `sse_mathfun_extension.h`:
```c
__m128 result = _mm_andnot_ps(zero_mask, pio2_result);
atan_result = _mm_andnot_ps(pio2_mask, atan_result);
atan_result = _mm_andnot_ps(pio2_mask, atan_result);
result = _mm_or_ps(result, atan_result);
result = _mm_or_ps(result, pi_result);
```
atan_result = _mm_andnot_ps(pio2_mask, atan_result); is duplicated.
It masks `atan_result` by `pio2_mask` a second time, which has no effect since the operation is idempotent. Based on the structure of the surrounding code, it was almost certainly meant to read:
```c
atan_result = _mm_andnot_ps(zero_mask, atan_result); // line 290 corrected
```
This would zero out the NaN-contaminated `atan_result` in any lane where `zero_mask` is set, preventing it from leaking into the final result through the bitwise OR on the next line.
Without this fix, the NaN from `atan(0/0)` passes through `pio2_mask` unaffected (because `pio2_mask` is all-zero-bits for the zero-zero case), and then gets OR'd into `result`, producing NaN output.
There is also a secondary issue: `pi_result` is non-zero for the `(0, 0)` case because `pi_mask = AND(y_eq_0, x_le_0)` and `0 <= 0` is true. Even with thefix applied, the function would return pi instead of 0 for `atan2(0, 0)`. A complete source-level fix would add one more line before the return:
```c
result = _mm_andnot_ps(zero_mask, result); // force 0 where zero_mask is set
```
## Origin
This code comes from Tolga Mizrak's `sse_mathfun_extension` library (https://github.com/to-miz/sse_mathfun_extension). VCV Rack's copy notes that the only modifications were making functions `inline` and converting global constants to function-scope locals. The duplicate line exists in the upstream repository, which has had no issues or pull requests filed across its entire history.
## Workaround for plugin authors
If you use `rack::simd::atan2` in your module and there is any possibility that both inputs could be zero, you can sanitize the output with one extra SSE instruction:
```cpp
float_4 phase = rack::simd::atan2(im, re);
phase = phase & (phase == phase); // replace NaN lanes with 0.0
```
This works because NaN is the only float where `x == x` evaluates to false. The comparison `phase == phase` maps to `_mm_cmpeq_ps`, which returns all-zero-bits for NaN lanes and all-one-bits for valid lanes. The bitwise AND then zeroes out only the NaN lanes, leaving all other values untouched.
## Reproducing
The simplest reproduction is to call `rack::simd::atan2` with both arguments set to zero and check the result:
```cpp
float_4 zero(0.f);
float_4 result = rack::simd::atan2(zero, zero);
// result[0] is NaN (expected: 0.0)
```
For comparison, the scalar version behaves correctly:
```cpp
float result = atan2f(0.f, 0.f);
// result is 0.0
```
In addition, another developer commented:
> It is true the duplicated line does nothing.
>
> It is also true that atan_result = _mm_andnot_ps(zero_mask, atan_result); does need to happen somewhere. I would put it in after adding the offset at line 295 of the source (its line numbers are off for some reason), and then just remove the duplicate.
>
> It is also right that the bitmask x_le_0 will be true when x is zero, but that mask isn’t used anywhere else anyway so a better fix is to use < instead of <= in the first place. So line 264 becomes: v4sf x_lt_0 = _mm_cmplt_ps( x, *(v4sf*)_ps_0 ); And in line 279 change the x_le_0 to x_lt_0.