Only loosely related, but I'm curious: What compiler optimizations have the biggest impact on scientific / floating point computation? Integer (audio/image) ops? With modern CPUs performing speculative execution, register renaming, and all the other magic they do, the CPU is acting like an optimizer in its own right. x64 is mostly just a byte code that gets JIT compiled at runtime. I'd be interested in seeing how much specific optimizations (including classic ones) contribute to speeding things up.
This depends entirely on the specific code you're talking about. If the compiler finds out a specific optimization is applicable for a certain block of code, then it uses it.
So your question cannot be answered. Only possible answer is "it depends". Another guess: "From nothing to an order of magnitude. Probably most cases about 5-20%."
Yeah, I realize it was an open-ended question. I'm just curious if one was starting a (C-like) compiler from scratch, where the biggest bang for the buck comes from. It seems like peephole optimizations are already handled at runtime by the CPU itself.
> What compiler optimizations have the biggest impact on scientific / floating point computation?
In my experience, auto-vectorization is the big one. Modern CPUs do 2-, 4-, or 8-wide operations, but it can be hard to convince the compiler to use them. Next is loop tiling, to keep things in cache(s) where possible. These are both hard to do by hand.
Loop interchange to get better locality is nice, but can be done by hand without much trouble by someone who understands computers.