For scalar CPUs, historically CMOV used to be relatively slow on x86, and notabl...

For scalar CPUs, historically CMOV used to be relatively slow on x86, and notably for reliable branching patterns (>75% reliable) branches could be a lot faster.

cmov also has dependencies on all three inputs, so if there's a high level of bias towards the unlikely input having a much higher latency than the likely one a cmov can cost a fair amount of waiting.

Finally cmov were absolutely terrible on P4 (10-ish cycles), and it's likely that a lot of their lore dates back to that.