the most common optimization for release builds and the one used by Linux distrubutions is -O2, not -O3. this is justified by real world measurements, btw, alas I don't have the article link at hand on the smartphone. the quintessential learning from that article was: measure and ideally profile before going beyond -O2 .
and to see the size difference, I'd love to see -Os optimize for size used in comparison to -O2/-O3 which is unrolling loops and inlining static functions as it deems fit, beyond the inline keyword (which is a mere hint).
another paradoxical effect of increasing generated code size with aggressive optimizations is that you may outgrow caches: if you're unlucky paging into slow DDR ram becomes necessary in inner loops and the execution speed decreases.
I'd suggest to read the article with an extra grain of salt.
> [...] Linux distrubutions is -O2, not -O3. this is justified by real world measurements [...]
No. It's because Linus fears -O3 for no reason. He even ordered the removal[1] of the -O3 Kconfig flag[0] (CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3) because "-O3 has a *loong* history of generating worse code than -O2"[1], whatever that means.
From [2]:
> Other upstream kernel developers also criticized that higher optimization level over the default -O2 level due to the risks, particularly with older compilers and memories from times when -O3 tended to be more buggy.
In other words, because of bugs in previous versions that have been fixed, we won't use the feature.
Linus and Co. are sticking their heads in the ground regarding -O3. He says he needs evidence -O3 is good, but doesn't actually provide evidence beyond hearsay that it's bad.
Just because Linus says -O3 is bad doesn't mean he's right.
There is a lot of cargo culting in the -O2 decision.
But it also true that O3 enables a lot of loop optimizations that are not particularly relevant for the kernel. Also the kernel is less reliant of more aggressive inlining and interprocedural optimizations than, say, highly abstracted C++ code.
debugging ring0 code obfuscated by -O3 is another level of fun. ymmv, however the kernel guys are finding plenty of obscure bugs.
and they have been bitten by aggressive smart optimizations based off undefined behavior.
for example testing a variable for != null after dereferencing it makes no sense. if it was null, it was a segfault and the check is never reached, right?
foo *x = f();
x->y();
if (x == null) {
/* unreachable in user space! */
do_sth_about_it();
...
}
I view the null check stuff as being more of an example of the compilers/even kernel Devs not bothering to try and properly express their desires i.e. in this case the check is has wording in the standard covering it, so it should be very explicitly desired to remain in the binary (volatility or similar, although there are limits to what you can ergonomically express in C)
Similarly strict aliasing can change the behaviour of code but if you're genuinely relying on that your code is probably bad - either in the standards view, or in my view in that you can write the same code in a manner that won't cause any mischief (i.e. there are standard-friendly ways to do ugly pointer crap even if they mean memcpying pointers - which will then be eliminated by the optimizer)
and to see the size difference, I'd love to see -Os optimize for size used in comparison to -O2/-O3 which is unrolling loops and inlining static functions as it deems fit, beyond the inline keyword (which is a mere hint).
another paradoxical effect of increasing generated code size with aggressive optimizations is that you may outgrow caches: if you're unlucky paging into slow DDR ram becomes necessary in inner loops and the execution speed decreases.
I'd suggest to read the article with an extra grain of salt.