For scalar CPUs, historically CMOV used to be relatively slow on x86, and notably for reliable branching patterns (>75% reliable) branches could be a lot faster.
cmov also has dependencies on all three inputs, so if there's a high level of bias towards the unlikely input having a much higher latency than the likely one a cmov can cost a fair amount of waiting.
Finally cmov were absolutely terrible on P4 (10-ish cycles), and it's likely that a lot of their lore dates back to that.
cmov also has dependencies on all three inputs, so if there's a high level of bias towards the unlikely input having a much higher latency than the likely one a cmov can cost a fair amount of waiting.
Finally cmov were absolutely terrible on P4 (10-ish cycles), and it's likely that a lot of their lore dates back to that.