I would aks the Cardinal developers for a detailled explanation.
From my perspective, there should not be a difference in performance.
- On each time frame (one sample a time), all modules are processed and the values for the out-ports are calculated.
- The values of all out-ports are copied to the connected in-ports (following the cable-connections).
- The process is repeated on the next time frame (sample) as described in 1.
If VCV would work on blocks of data (e.g. a block of 64 samples), then the order of processing could make a difference.