Have you run the calibration using the calibration input? You need to calibrate first.
The way it works is like this. It sends various test voyages to the VCO and then listens back to the frequency the VCO outputs. It knows what the frequency should be and calculates the difference between what it expects to hear and what it actually hears. Once calibration is done it then applies a +/- offset to the voltages sent to the VCO to compensate for the difference.
its best to use a sine/triangle wave output for calibration, square/saw have too many harmonics that can confuse the frequency analyser. Once calibrated you can use whatever outputs you like of course.
I would try it just monophonically first, and use Nysthi Hot Tuna before/after calibration to compare the frequency/pitch of notes coming out of the VCO.
You can tell if Merge is causing an issue by just taking it out of the equation and using 4 x VCO-1 instead.
In your image above you would need to run calibration with each of the Voice controllers.
UPDATE: actually there is something else going on here maybe. The voice controller is really designed for controlling hardware VCOs - ie where VCV is doing the sequencing, that is sent out of the ES-9 to a hardware VCO, and the sound then comes back into VCV.
You are trying to do it the other way round - sequencing VCV from hardware. I don’t think Voice controller will work in this scenario because the voice controller is generating the frequencies, sending them to VCO-1 and listening to the result - which is going to be perfect because the signal path is all internal VCV - it is not going through the ES-9 at all.