Unless we're talking extremes (AVX512...), optimizing for power in modern CPUs is almost always optimizing the race to idle.
I.e. the CPU doesn't use much less power if most ALUs are idle, so you might as well use all of them to compute redundant results -> and then turn ~everything off sooner.
I.e. the CPU doesn't use much less power if most ALUs are idle, so you might as well use all of them to compute redundant results -> and then turn ~everything off sooner.