The time when the actual floating point instructions were a bottleneck are long ...

The time when the actual floating point instructions were a bottleneck are long gone.

Nowadays when you do computations with single float/double values in your registers they are equally fast.

The biggest difference comes from memory bandwith and the ability to vectorize. Your CPU can calculate either 4 floats or 2 doubles in one instruction (assuming pre AVX X64 processor). With AVX it's 8 floats or 4 doubles.