Yes and no. with sse2 instruction set it’s usually done with a combination of shuffle and add instructions. But it’s still pretty slow. Probably better than your non-vector example but not good.
There are various newer intel instructions/intrinsics that do this faster, like _mm_hadd_ps, but these AVX and AVX-512 instructions can not easily be used in VCV, as VCV stuff supports really old CPUs that don’t have any AVX instructions.
If you want to use these newer instructions in rack you need to detect the CPU type yourself, and provide a fallback implementation do you don’t crash on CPUs that don’t have the instructions. That is not super easy.
btw, this operation is usually called “horizontal add”, so if you google “sse2 horizontal add” you will get hits to stackoverflow.com. Also searching for “vector dot product” will get some hits, as horizontal add is always an issue with dot product of small vectors.
Like the stack overflow will tell you, even on “old” cpus if you have large vectors you can keep adding them together as vector_4, until you are left with a single vector_4 of partial sums. Then a single horizontal add will make it a float. No help if your vectors only have 4 elements, big help if they are big.