I work in compiler development, and have worked on a C/C++ compiler. Let me first say that these side by side comparisons are excellent for pointing out the weakpoints in compilers, and showing where extra optimizations could be made, they are, as presented in this article, not very useful.
When's the last time you saw a program that was just a single function? In real production environments, you're going to be running programs with hundreds of modules, thousands of functions, and millions of instructions. Optimizations like inlining, partial inlining, partial redundancy elimination, and inter-procedural analysis become much more important.
When you're a compiler developer looking for more performance, the typical process starts with profiling an important benchmark (like SPEC2006) and looking at where the program is spending most of its time. Then you take a look at the offending code and see if it can be distilled down to one of these microbenchmarks. There you can decide how best to optimize this. Without this context though, these type of microbenchmarks are nearly pointless. They show some optimization that could be done, but who knows if it'll be worth it, or even make any sort of impact on the final performance of a production workload.
Good point, I've heard rumor that some of the DEC compilers for Alpha were actually advanced enough that they'd perform many complex calculations in compiler itself, this led to problems with SPEC where the benchmark would be computed in the compiler itself and the program would basically just print the answer.
Do C/C++ compilers use staged (on-line and off-line) specialisation (removing control-flow dependant on run-time constants)? I suppose specialisation is more suited for languages executing with a virtual machine, like Java.
Weighting our benchmark functions using profile data (as discussed in the conclusion of the blog post) should address this problem, at least partially.
Absolutely, I can't talk too much about what we actually benchmark, but real world applications are always used for benchmarking a compiler. Especially if it's a JIT compiler. Though it is in a repeatable environment with scripts and such to minimize variation and allow us to measure more accurately.
Most compilers typically do a good job, but I was surprised to see such a difference on some examples.
On those examples, Clang seems to be usually better than GCC at static code evaluation and local optimizations.
However, GCC beats Clang in examples 2 and 3 due to better loop optimization capabilities (such as unrolling), allowing it to use SSE instructions.
Example 3 show that the Intel compiler has even stronger unrolling and vectorization capabilities. Not unexpected, as their compiler has to be especially good at this in order to get good performance on the Itanium architecture (IA-64).
I'm also impressed by the trick used by ICC in example 7 (using a bittest against a statically computed bitmap to implement the comparisons).
It will be quite interesting to see the results the researchers will get on many more examples...
As someone who finds this kind of optimization story interesting, but to whom optimisation in everyday life means adding an index or caching a query, wouldn't it be equally important to analyse the worst case optimisations as well?
Then again, that is probably done as part of regression tests by the compiler makers.
When's the last time you saw a program that was just a single function? In real production environments, you're going to be running programs with hundreds of modules, thousands of functions, and millions of instructions. Optimizations like inlining, partial inlining, partial redundancy elimination, and inter-procedural analysis become much more important.
When you're a compiler developer looking for more performance, the typical process starts with profiling an important benchmark (like SPEC2006) and looking at where the program is spending most of its time. Then you take a look at the offending code and see if it can be distilled down to one of these microbenchmarks. There you can decide how best to optimize this. Without this context though, these type of microbenchmarks are nearly pointless. They show some optimization that could be done, but who knows if it'll be worth it, or even make any sort of impact on the final performance of a production workload.