Comparing Compiler Optimizations

CountHackulus · on Dec 12, 2010

I work in compiler development, and have worked on a C/C++ compiler. Let me first say that these side by side comparisons are excellent for pointing out the weakpoints in compilers, and showing where extra optimizations could be made, they are, as presented in this article, not very useful.

When's the last time you saw a program that was just a single function? In real production environments, you're going to be running programs with hundreds of modules, thousands of functions, and millions of instructions. Optimizations like inlining, partial inlining, partial redundancy elimination, and inter-procedural analysis become much more important.

When you're a compiler developer looking for more performance, the typical process starts with profiling an important benchmark (like SPEC2006) and looking at where the program is spending most of its time. Then you take a look at the offending code and see if it can be distilled down to one of these microbenchmarks. There you can decide how best to optimize this. Without this context though, these type of microbenchmarks are nearly pointless. They show some optimization that could be done, but who knows if it'll be worth it, or even make any sort of impact on the final performance of a production workload.

_3u10 · on Dec 12, 2010

Good point, I've heard rumor that some of the DEC compilers for Alpha were actually advanced enough that they'd perform many complex calculations in compiler itself, this led to problems with SPEC where the benchmark would be computed in the compiler itself and the program would basically just print the answer.

iceberg · on Dec 12, 2010

Do C/C++ compilers use staged (on-line and off-line) specialisation (removing control-flow dependant on run-time constants)? I suppose specialisation is more suited for languages executing with a virtual machine, like Java.

regehr · on Dec 12, 2010

Weighting our benchmark functions using profile data (as discussed in the conclusion of the blog post) should address this problem, at least partially.

jules · on Dec 12, 2010

Do you also use real world applications for benchmarking? Or is that too difficult (too big)?

CountHackulus · on Dec 12, 2010

Absolutely, I can't talk too much about what we actually benchmark, but real world applications are always used for benchmarking a compiler. Especially if it's a JIT compiler. Though it is in a repeatable environment with scripts and such to minimize variation and allow us to measure more accurately.

ronnix · on Dec 12, 2010

Most compilers typically do a good job, but I was surprised to see such a difference on some examples.

On those examples, Clang seems to be usually better than GCC at static code evaluation and local optimizations.

However, GCC beats Clang in examples 2 and 3 due to better loop optimization capabilities (such as unrolling), allowing it to use SSE instructions.

Example 3 show that the Intel compiler has even stronger unrolling and vectorization capabilities. Not unexpected, as their compiler has to be especially good at this in order to get good performance on the Itanium architecture (IA-64).

I'm also impressed by the trick used by ICC in example 7 (using a bittest against a statically computed bitmap to implement the comparisons).

It will be quite interesting to see the results the researchers will get on many more examples...

jefffoster · on Dec 12, 2010

Acovea (http://www.coyotegulch.com/products/acovea/) is a genetic algorithm for analysing the performance of code with various optimization flags.

There's a practical example of this being used on a Haskell program at http://donsbot.wordpress.com/2010/03/01/evolving-faster-hask...

kristianp · on Dec 12, 2010

As someone who finds this kind of optimization story interesting, but to whom optimisation in everyday life means adding an index or caching a query, wouldn't it be equally important to analyse the worst case optimisations as well?

Then again, that is probably done as part of regression tests by the compiler makers.

jedbrown · on Dec 12, 2010

FWIW, -mfpmath=sse is default on x86-64