Hacker News new | past | comments | ask | show | jobs | submit login
C++ Compilation Speed (drdobbs.com)
63 points by fogus on Aug 19, 2010 | hide | past | favorite | 45 comments



My guess for most slow compiles (yes, pulled out of my ...hat):

1. Templates

2. No pre-compiled headers

3. Bad header-organization

We have a reasonable sized desktop application 5 mb (release) / 19 mb (debug) that fully rebuilds in 10 s / 12 s (incremental < 2 s usually).

Yes, we have optimized using a RAM disk etc, but the main reason for the speed is No Templates and combining precompiled headers with efficient header-organization.

Some interesting links:

- http://gamesfromwithin.com/physical-structure-and-c-part-2-b...

- http://developers.sun.com/solaris/articles/CC_perf/content.h...


Except that templates give you runtime speed and our customers care about runtime speed, not compilation speed.


What kind of speed increase are you talking about? Your comment's logic seems to suggest that, for instance, C (a language without templates) is inherently slower than C++, which is not the case. Can you back your statements up with evidence?


Speed increases like those explained in http://www.tilander.org/aurora/2007/12/comparing-stdsort-and..., which compares qsort to std::sort.

Because templates are basically automated code copy-paste, the compiler can more easily inline at compile-time. Unfortunately, it also translates to lots more compiled code, which means slower compile times and fatter binaries. Still, writing something like the STL library in C with the same performance characteristics would be much more difficult, if not impossible.

I'm not a C++ expert, but that's my understanding of things, at least.


If you are worried about the performance hit caused by function calling through a function pointer you might want to give a look at http://sglib.sourceforge.net/.


Templates are much more powerful than C with macros.

Templates are more than copy/paste code, the compiler can make strong assumptions about the code path during the optimization phase because it's actual code.

It's not just a question of inlining calls but mainly a question of being able to remove branches altogether. Branches is the performance killer, remember?

Examples:

- with template specialization you can make "compile time" switches (horrible to reproduce with C + macros) - you can compute values at compile time (not quite possible in C + macros) - inlining permits copy elision (an optimization that doesn't exist in C since return by value is a non sequitur) - the CRTP is faster than an object with vtables because it avoids indirect calls - removing of dead branches, by inspecting the templates recursion, the compiler can determine which branches will never be taken and doesn't compile them - probably more!


I think that he is talking about the cases where you can use template to avoid function pointer in C.


I trust the people who care already know about distcc on Linux and Incredibuild on Windows, but I thought I'd mention them just in case.


Don't forget ccache - storing the cache on a fast disk, it can easily speed up a build by 10x.


I still remember the day I compiled an example program that came with Borland's Turbo Pascal and Turbo C++ compilers. This program was designed to showcase Borland's text-mode user interface library called Turbo Vision, and apart from the programming language used, it ran exactly the same.

Turbo Pascal took just a few seconds to compile the example (no surprises there), but Turbo C++ took several minutes. This was back in the 90's. It seems nothing has changed since then. Sad.


That's because those languages haven't changed or only been extended.


Had nobody ever attempted to fix these problems in a non-backwards-compatible way? I'm imagining a new "almost-c++" that ditches the preprocessor for something better and makes a few other changes with the goal of vastly improved compile times. Then it should be possible to produce a GCC or Clang++ plugin that translates from C++ to "almost-C++" automatically, making it trivial to switch.


Yes, they have. It's called D: http://www.digitalmars.com/d

It doesn't have an automatic translator, but it's design does follow a rule of "If a piece of C code is dropped into a D compiler, either it compiles with the same semantics, or it doesn't compile."

BTW, Someone needs to fix the CSS for this text-entry box. Someone set the text color to black, but forgot to set the background color too. So on my light-on-dark system, I get Invisible-text-syndrome and have to edit in notepad and copy-paste into the text-box. Wheee!


Also note that the author of the article is also the author of D.


I think the automatic translator is the most important part.


C++ isn't without it's age spots, but articles like this do nothing to give a balanced view of its usefulness vs. issues. Many languages have boasted to topple C++ and it hasn't happened it, mostly because C++ does its job well enough and other languages aren't better enough so that migrating large code bases is cost effective.


The article was not intended to give a balanced view of issues vs usefulness. It was attempting to explain why C++ compiles relatively slowly, which is a common question people ask me. Whether or not the speed issues are balanced by other considerations is up to the C++ user.


As far as this specific complaint goes, I've worked on large-scale C++ programs and I've rarely come across anything prohibitive as far as compile times are concerned. Granted, C is faster, as are some other languages, but in terms of overall project development time, compiling is dwarfed under requirements, design, coding, testing, collaboration, etc. It's a non-sequitur that improving compile time would have any overall impact on development.


I sat throught he pain of configuring\testing boost on a set (non GNU\GCC) dev tools for embedded platform and the build times were painfully slow - around 15 minutes was common even for reasonable sized tests. A fresh build of the boost libraries themselves - hours.


It's unfortunate that Boost is an all-or-nothing set of libraries, and the meta-programming in there can be brutal to compile, so for that kind of platform you'd really need to weight the pros and cons of what you're using out of Boost vs. compile times.

However, building the library set and examples and tests is a one-time cost for the non-header-only libraries. And you don't need to compile the examples or the tests. Or the various debug-release combinations.

Plus, why wouldn't you cross-compile, anyway? I do lots of embedded sensor work and I wouldn't think of compiling anything directly on the hardware itself.


This was cross-compiled. We had just implemented the full (as opposed to abridged) C++ libraries so obviously we decided to make sure that our customers could use them fully, however given that this was new support there was a distinct lack of test cases. Someone suggested that Boost exercised this functionality pretty well (some customers previously complained that they had been prevented from using boost before due to our only supporting the abridged libraries so this was actually a good idea), someone decided that I should be in charge of this.

What it amounted to was debugging horrible compiler crashes, broken library behaviour and spending a long time deciphering huge template names. This caused me to hate C++ more than you can ever imagine. I swear whoever decided to stitch templates together in the way Boost does has never had to debug a development compiler, this broke almost every part of the toolchain.

Ugh.


You can compile only what you need. It's much faster that way, but if you do a complete build (debug, static, etc) it will take hours. It's not that bad though and certainly worth it in the end.


This to me is one of the best arguments for a new language that takes over the traditional role of C++ - writting programs in it is just too slow.

I can't even imagine what it would be like on a 100hz computer.


D and Go are the two languages poised to take over the roles of C and C++. They both have very fast compilers and numerous other advantages compared with C and/or C++. I've looked at both of them quite carefully, and decided that D works better for me (an important factor for me though was that D2 recently stabilized with the publication a couple months ago of Andrei Alexandrescu's excellent new book "The D programming language"). There's an interesting recent thread debating the merits of the two languages at the golang-nuts mailing list.


> languages poised to take over the roles of C and C++

D has been around for ages, during which time C++ usage has grown quite a lot, essentially at the expense of C and assembly.

First of all, I don't get the sense people are actually casting about for a C++ replacement. Secondly, garbage collection makes D a non-starter in probably over half the areas C++ is used.


Given that Andrei Alexandrescu, who is one of the best-known C++ programmers in the world, co-designed (with Walter Bright) the new version of D, I would say that there are at least some C++ programmers looking for a better language. I think there's also a lot of C programmers looking for a better language but unwilling to deal with C++ (I'm in that category myself). Garbage collection in D is optional.


I'm not poo-pooing D at all, it just it seems to compete more with Java/C# than C++.

C++ is now really for closer to the metal, where garbage collection is truly a non-starter and nobody cares that std::string is awkward because they don't use it.


Can you explain why, if garbage collection is a non-starter, _optional_ garbage collection should have any negative impact?


I would say it depends on what the libraries do expect, especially the D standard libraries. If those work with and without garbage collection it probably doesn't matter so much. If those expect garbage collection then I suppose D2.0 will have a hard(er) time luring that many c++ guys over.

Does anyone already know how the new standard libraries which they wrote do work?


Andrei Alexandrescu addresses the issue of programming in D without garbage collection in this interview from yesterday: http://www.informit.com/articles/article.aspx?p=1622265

Notice in particular that he says "Furthermore, you can use malloc() and free(), along with the rest of C's standard library, ..., without any overhead. Then, a D primitive called emplace allows you to construct objects at specified memory locations (a la C++'s placement new operator)".

The emplace function is in the std.conv library and is the basis for constructing objects outside of the control of the garbage collector.

My understanding is that most library functions do not care about whether you are using garbage collection or not. Library functions are usually written in a parameterized way such that you can use any appropriate object; so long as it has the right methods.

Take a look for example at the std.algorithm library: http://www.digitalmars.com/d/2.0/phobos/std_algorithm.html I don't think anything there cares whether you use garbage collection for your types or not.


Speak for yourself. I use std::string frequently and there isn't anything awkward about it.


Manual memory management is always possible in D, and if you really need to, the GC can be ripped out.


I got into D at least 5 years ago because I was a C/C++ user looking for something that has comparable power and, well, wasn't so horrible.


IIRC, D has optional manual memory management.


And Walter, of course, has his answer to this. dmd is very fast (IIRC according to Walter it's faster than Google Go compiler) thanks to parallel compilation, asynchronous I/O, etc. enabled by design of D.


To be fair, I can't think of much that wouldn't be painfully slow on a 100Hz CPU.


Just as an example, the original Playstation ran on 33.8688 MHz.

(100 MHz is actually quite fast.)


100Hz != 100MHz


Oh, I mentally inserted an M.


That, and the fact that concurrency without garbage collection has been shown to cause brain cancer.


If it gets some momentum behind it then I feel GO could be that language.


Great article. I guess there would be an article about linking C++ code - now that's something that's hard to parallelize.

The IncrediBuild guys have some success there, but you have to link from the .obj files, not .lib (okay if your project is setup like that, but also it does not work always).

We have found that heavy templates usage creates lots of symbols that point to memory get coalesced later, and this also slows down the linker.


For more about Incredibuild:

- http://gamesfromwithin.com/how-incredible-is-incredibuild

My summary of the article (I haven't tried it myself): - can improve a full build compile time - will probably slow down incremental build compile time


I have interoperability mode turned on, so I can both compile with IncrediBuild and normally.

But IB does not treat everytime in the .vcproj file as MS does. For example we used to link statically to certain libs, and later added them with the sourc code, and put dependancies in the .sln file. Well we forgot to remove the references, and guess what - MS ends up with one version, IB the other.

In our case IB was ignoring the .lib files, but taking directly the .obj files and linking with them (this has other side effects, for the globally initialized C++ objects).

But still it's time saver. We often have bad .pdb, .obj files produced, locked situations, etc.

We even use it for Makefile builds, but again such issues are popping out.

It's a clever system, that wraps the whole I/O and runs any process on a different machine, but all I/O is still done on the machine it's coming from... I wish that was somehow built in modern OS's.


Hm, small point, but the linker may not actually fold identical templates. Visual C does; gcc sometimes does not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: