Hacker News new | past | comments | ask | show | jobs | submit login
The Little Things: Speeding up C++ compilation (codingnest.com)
133 points by ingve on Sept 20, 2020 | hide | past | favorite | 82 comments



I wish someone would pick up the zapcc project. Compilers do an insane amount of duplicated and redundant work and there's tremendous potential for speeding up C++ builds if you are willing to rethink how the compiler works a little bit. https://github.com/yrnkrn/zapcc


I work on the video game factorio, which is a c++ project. On a 9900k a rebuild takes about a minute, so it's pretty sizeable but not something ridiculous like oracle db or unreal engine. I tried using zapcc on it, and it was a complete failure. I don't have measurements to hand, but iirc it was actually slower than stock clang. I tested it on a threadripper 2950x with 64gb of ram, running Debian.


I just installed the game again to play the v1.0 release.

It looks and sounds very good! Congratulations on making it to the release milestone after many years of hard work.

As a developer, I must say that I'm deeply impressed by the simulation complexity and incredible scalability.

I keep telling people that Factorio is the most complex train simulator game that I've ever played. But it isn't a train simulator. It's a factory game. The trains are optional.

That's crazy to me, that a subset of a game can be more complex than entire games that "specialise" in the topic.


How do I make curved tracks?


The first bit of track you place is always straight; but if you extend the rail (by clicking on the green arrow of the first piece of rail) you can place curves.

https://wiki.factorio.com/Rail_planner


This was genuine question BTW.


There is a tutorial in-game!

After placing a track, click the end of the track and hold shift to use the "placement planner". Helps work out the curves and things.


I've had good results on some of my own projects. The current implementation certainly has limitations, but it's really early and the potential is huge. For example, because it can cache at a sub-compilation-unit granularity it can make rebuilds of a single file faster, which no distributed build tool or cache can do.


Thank you for Factorio!


Have you tried switching to a unity build? Or removing templates?


We do use a unity build yeah. Without it would be way way longer, maybe 5 minutes+. As for removing templates, there are a few places where we specifically opted to use custom macros instead of std::variant, for compile time reasons, but mostly we do use templates quite a lot.


A unity build gives some of the same build time benefits as zapcc for full rebuilds, but requires more work and compromises in code structure, and also increases your single-file incremental build time. If you're already using a unity build then I'd expect zapcc to not speed up full builds much if at all.

I'd be interested to see if zapcc could speed up your non-unity build to be close to your unity build, without requiring the same compromises. I'd also be interested to see if zapcc would speed up single-file incremental builds in your unity build.

Ultimately I can't recommend zapcc for general use as it's a fork of clang that's not being regularly rebased anymore, so it's getting out of date. That's why I wish someone would pick it up again.


How does ccache compare to it?

https://ccache.dev/


That is what I normally use. ccache is great! With a unity build its usefulness is a bit diminished, but it's still great for switching back and forth between branches.


Is this like llbuild?

https://www.youtube.com/watch?v=b_T-eCToX1I

https://github.com/apple/swift-llbuild

I remember watching the talk awhile ago, but haven't kept up with its progress. It's about using LLVM as a library to eliminate duplicate work, as far as I remember.

I think it was for C++ as well as Swift.


> Compilers do an insane amount of duplicated and redundant work

This is one of the problems that should be solved by modules in the near future.


See also:

Are modules fast? https://bfgroup.github.io/cpp_tooling_stats/modules/modules_...

One reason why this might not be so simple is that what is often measured as a potential for speed improvement is I/O, as in "lines of code a compiler needs to read to process a translation unit". However what happens is that C++ compilers are usually CPU-bound, not I/O bound. What happens also is that C++, very differently from idiomatic C, Pascal, Modula, and their successors, has no clear separation between declaration and use of an interface, as not functions, but classes form the main interface, and what defines a class is the signature of its member functions and its members, and to know the size of the class, the size of its members need to be known, which has the consequence that the C++ compiler needs to read recursively everything which the class is defined with. This is a big difference to modula etc. Also, of course the reading of data can be cached, but the question is how much the expansion of the read text can be cached - for example some will have different semantics if the DEBUG macro is set, or some WIN_xyz macro. As far as I understand, compilers supporting C++ modules will only cache the latest expansion.

Also, the build system with modules is likely to become more complex, with additional dependencies. I do not know whether this is still actual in every aspect (it could be that what is allowed to go into a module header has been narrowed down a bit), but this post : https://vector-of-bool.github.io/2019/01/27/modules-doa.html explains at least some of the topics involved, and not all of them are trivial.


I hope so but last time I looked it seemed like the preliminary implementations didn't improve build times much in practice.


Real-world usage of Clang modules (by Google) and preliminary/synthetic benchmarks of C++20 modules in GCC both show about 3x speedup. This might not be "much" for everyone, buy I will take it.


> benchmarks of C++20 modules in GCC both show about 3x speedup

compared to what? The worst possible case? Code in weakly coupled modules? Using PIMPL idiom? Unity builds? Using ccache in incremental builds?

Especially ccache is a relevant comparison, because what to a developer is of far more interest than a full build is an incremental build with a few files changed. Using modules requires to scan the source code in order to determine dependencies, so it can be slower than an incremental build using other tools. And in addition to this, C++ compilation with include headers is an embarrassingly parallel problem which means speed advantages from using multi-core systems will probably very relevant in the, say, ten years time in which C++ modules will be somewhat standardized in practice.

Not to say they can't help, but in any other large software project one would need to show that something is useful before adding a big change.


This is an excellent and thorough review of the details.

An almost-should-be-mandatory tool on projects of any decent size is https://ccache.dev/


Bruce Dawson's Chromium build time investigation is very enlightening:

https://randomascii.wordpress.com/2020/03/30/big-project-bui...

The main takeaway:

> That is, the main source files represent just 0.32% of the lines of code being processed.

Chromium itself has about 12 million lines of code, but the compiler needs to process 3.6 billion lines of code because it needs to parse the same headers over and over again.


From my experience, unity builds (single compilation unit with #include of all source files into a single main file) and use of orthodox C++ (avoidance of templates and fancy features) is giving good results.


It's reasonable to assume that compilation times can go down if you decide to forego compile-time features (i.e., templates).

However, C++'s standard library is a productivity blessing. I'm not convinced of the advantage of replacing it with ad-hoc components that may or may not be buggy just because you want to shave off a few seconds of each build.

More often than not, all you have to do to lower a double-digit percentage of your compilation time is to modularized your project, and follow basic principles like not have includes in your interface headers. This alone, along with PImpl, does far more for your compilation time than well-meaning but poorly thought-out tradeoff decisions like not using the STL.


There is no way to use the STL and other template code and still have reasonably sized projects do full builds in seconds. That's just an empirical fact. The only way to get C-like compile times is to essentially write C, with minor additions.

If you genuinely regard the standard library as a productivity blessing I'm sure this looks like an unreasonable tradeoff. I would be very surprised to see someone advocate writing "Orthodox C++" solely for the compile times. Most of us who write code like this do so because we don't find value in the STL and regard templates as an extremely poor tool for their ostensible purpose. The compile times are just a bonus, albeit a very pleasant one.

In truth, the divide isn't between people who prioritize compile times over any other consideration and people who don't. It's between people who see most of the language features of C++ as valuable and people who see them as mostly unhelpful. Personally, I like operator overloading for math and SIMD types, function overloading for less verbosity, namespaces and a bit of constexpr for convenience. The rest I'm quite happy to do without, so there isn't much of a tradeoff to make.


Using the STL is a non-issue for me as it is almost never used in the video games industry, mostly for performance reasons.

In my opinion the STL is still a big, bloated, unreadable piece of code with a few badly implemented good ideas (containers)

You should read this article: https://zeux.io/2019/01/17/is-c-fast/


I read the article. The author noticed that MSVC's STL has slower iterators in debug mode, because it does extra checks. Instead of reading the manual and turning off the extra checks and/or reading the manual and turning on some optimizations in debug, he re-wrote it in C, which doesn't have the extra checks to begin with, and was happy because it compiled faster. All the optimizations in release mode came from better algorithms or allocators.

I can't say I'm particularly convinced. Yes, C++ debug builds can be slow. For most of us, that's ok. If it's not, make your "debug" build configuration a release build with the optimizer turned down (but not off) and incremental build enabled. Preprocessor definitions, symbol generation, optimization, and incremental build are all independent settings - if all you need from debug is symbols, incremental builds, and the inliner turned down (so stacks are retained) - then configure your compiler that way.


You probably also noticed that the code ended-up being faster, not much bigger and in my opinion easier to read/modify.


"Orthodox C++" without templates, language features, or the standard library is also known as "C".

I'd go one step further. Skip the compiler and just write in assembly. Your builds will be lightning fast especially if you use one single source file with no macros and static linking. You'll never ship, but you'll never ship really fast.


Except people do in fact ship large C projects all the time. The productivity gain associated with moving from assembly to C is undisputed and vastly exceeds any alleged productivity gain associated with moving from C to modern C++. Pretending otherwise is just facetious.


You'd be surprised by how many "C++" projects are in fact mostly C with a few C++ features in the videogames industry.


> in the videogames industry.

Or anywhere else. C++ is such a beast, you just can't expect your coworkers to memorize the whole standard, so it is quite common that teams agree (or the team-leader agrees for them) upon a subset.

In the past that was due to limited availability of experienced C++ programmers and inconsistent support from compilers, these days it's due to the evolving nature of the language. Quick: in which standard was the variant class template introduced?


That sounds like it breaks all incremental builds (correct me if I'm wrong) for relatively little benefit (skipping linking? less header parsing?). Are there scenarios where unity builds are faster than incremental builds? Or is the intended use-case only for from-scratch builds?


In my experience, you can just do full from-scratch builds in under 10 seconds for any sanely sized codebase (say, under 1M LoC conservatively). This obviously assumes the "orthodox C++" part, so YMMV wildly depending on how far you stray from that. I usually have compile times under 0.5s for all personal projects that don't use CUDA, but I basically write C with a bit of function/operator overloading, absolutely no STL or templates at all. Consequently, I have no use for incremental (or parallel, for that matter) builds whatsoever.


Only an extraordinarily ambitious personal project would experience a significant amount of build latency on modern compilers and CPUs, no matter what approach is used. I don't think you can reason from "it works for personal projects" to "it's fast."

One thing I can tell you immediately about this unity build idea that is suboptimal for many codebases: modern compilers do not have much intra translation unit parallelism. If there's more in your project than one CPU core can compile quickly (i.e. most shipping products), it's going to be a serious bottleneck.


> One thing I can tell you immediately about this unity build idea that is suboptimal for many codebases

to give you a data point, in my case, https://github.com/OSSIA/score (fairly mundane C++ project totaling ~360kloc in roughly 1300 .cpp files / 1800 .hpp files, split across ~15 libraries, using boost, Qt and a few other common libs), unity builds (one .cpp per library) divides the time by 5 compared to PCH.


360kloc certainly meets my definition of extraordinarily ambitious for a personal project. :) Well done you!

"Unity" builds certainly can speed up project compilation when used judiciously. This is easily shown by imagining an absurd C++ codebase where each method or function is in its own module. This would increase the overhead of parsing .h files. Any two functions that could be combined into one while keeping the h files included the same would almost certainly reduce this overhead. But I doubt your project would compile faster if all of the cc files were combined into one.


> This is easily shown by imagining an absurd C++ codebase where each method or function is in its own module

That's the musl codebase, in C. It does this in order to be able to remove unused code in a static build.

It compiles ridiculously fast. So the problem does not seem to be the amount of headers.


> Only an extraordinarily ambitious personal project would experience a significant amount of build latency on modern compilers and CPUs, no matter what approach is used.

If only that were true. If you write modern C++, use the STL and popular libraries, even very small projects can take minutes to compile. IIRC, last time I tried this approach, a ~200 LoC throwaway project using the STL, Eigen and spdlog (which includes fmt) took almost a minute to compile an optimized build. I realize we may be using different definitions of "significant" here, but to me that's just unacceptable.

And, as others have already indicated, if your project really is large enough that you need parallelism, just have as many translation units as you have cores. It doesn't impede the idea of the unity build at all.


I am not talking about hobby projects here, but codebases with more than 100K lines of code.

I agree with your objection, scaling up to 2-8 compilation units to use several cores is something I also plan to test/benchmark.

And to be honest, I don't completely understand why the classic way of compiling (makefiles + dozens separate files) is so slow but there is clearly something fishy as it is very often unbearable.

Slow compilation time can be very detrimental for productivity as it creates a way for the mind to wander and lose focus.


There doesn't need to be exactly one unity translation unit. In Firefox, the build system groups source files by component, and then into "unity" batches containing 5-20 source files each.

If a build spends a lot of time processing code in headers, and source files in the same component are likely to include very similar sets of headers, this kind of configuration gives decent benefits, while not making incremental compilation much more expensive (and possibly cheaper, if you edited a header affecting many files in a single component).


Yes, absolutely, that's the trick with unity builds. It is supposed to be slower because you're asking the compiler to rebuild everything from scratch every time.

But the simplicity of the single compilation unit leads to very sizeable gains.

I plan to write an article on this very subject and post a few benchmarks.

https://en.wikipedia.org/wiki/Unity_build


One of (many) other ways to speed up C++ (or C) builds is to use multi-processing in your make file - https://stackoverflow.com/questions/414714/compiling-with-g-... - I typically get about a 20% speed up, but obviously YMMV.


This post here recommends using ninja, which does this automatically.


Personally I like Tup, which also does this, and has no dependencies, but it has some quirks.


Careful, this can break certain makefiles that are not written to take this into account.


Perhaps post a link to this problem?


You should be able to find a fair amount of published literature about it if you search "make parallel build issues".


Another tip to keep compilation fast: don’t use Eigen. Non-trivial usage absolutely obliterates compile times and blows up the memory consumption by an insane amount.


Eigen is an absurdly template-intensive library, and people use it because so far there is nothing better or even remotely comparable to it.

If you use Eigen but somehow feel that compilation time is your focus instead of actually doing linear algebra or solving systems of linear equations then you have a few best practices that you can follow and techniques at your disposal to lower compilation times.


Such as? (genuinely curious) We've been trying hard to get memory consumption down because we get OOMs in our CI frequently. (Our largest compilation unit compiles for 3 minutes and peaks at just under 6GB memory)


Hmm... What about the pImpl Idiom? That should also help (and also brings in it's own costs).


Builds are just like any other software: if it takes too long, profile it. Use bazel build --profile or whatever equivalent your build system offers.


Modules are supposed to make much of this now quite useful info irrelevant.

Unfortunately modules are several years in the future, both due to lack of implementations yet and the fact that few projects are being written in C++20 yet. I also harbour a suspicion that tricks like these will still be useful with modules.


Last time I checked somebody working on modules said something along the lines of "don't expect modules to speed things up". The goals were different, apparently (stop #defines from leaking everywhere, large-project management, libraries). Has this changed, are there any benchmarks available?


If its true that speeding up compilation was not a goal then the committee is out of its mind. Speeding up compilation would have more impact on productivity and the environment than almost anything else that could be done with the language.


The committee was always out of the mind. In the past years they have neglected compile times, debug build performance, error messages or sometimes even performance.


Modules contain an optimization (less include leaking) and a pessimization (more serialized build graph). Only the former applies to CPU time, so modules should lead to less cpu time, but it's unclear exactly how that translates to wall time.


I still haven't seen a module in the wild yet, they should've been standardized 20 years ago now (D has had modules from day 1, and there are more people on the C++ committee than working on D at all) . I hope they get some traction but I just can't see it helping for all but the newest codebases.


As far as I understand the issue is basically in the hands of the compiler vendors. They need to have a solution that they can agree on before the c++ committee finalizes it.


What I've found wreaks the most havoc on build times is having too many -iquote and -isystem flags. Last time I profiled TensorFlow compiles using strace, about half the gcc wall time was spent inside all the stat() system calls those flags generate.


On thing about forward declarations: they violate DRY.

Now, most of the time, the trade off of repeating

  class Thing;
is more than offset by faster build times.

But if

  class Thing;
is actually

  namespace Bob
  {
    namespace Dave
    {
      template <typename T>
      class ThingTrait;

      template <typename T>
      class Thing : BaseClass<T, allocator<T>, trait<ThingTrait<T> >;
    }
  }
you really want to put that in it's own header, typically "Thing_fwd.h"

(apologies for any syntax errors, been a while since I've written C++)


You're either gonna write

    #include "thing.h"
or

    class Thing;
Either way, as a practical matter, you're saying, "There's something called Thing I'm going to use." C++ forces you to say that before use in each translation unit. It's not really "repeating" yourself to write it, since you haven't "said" it yet in this TU.


You are repeating yourself if you write

  class Thing;
in every file that uses a Thing. And as I said, that's generally fine.

The issue becomes when the declaration of Thing includes namespaces, template params, etc. Rewriting that is tedious and error prone.

Look at something like std::string. That's actually

  namespace std {
  template< class T > struct allocator;
  template<class charT> struct char_traits;
  template<> struct char_traits<char>;
  template<class CharT, class Traits = char_traits<CharT>,
           class Allocator = allocator<CharT>>
    class basic_string;
  using string    = basic_string<char>;
  }
No-one is forward declaring that.


I prefer _fwd headers for a different reason: it's easier to modify without breaking dependencies.

If you change Thing to an alias to a class template specialization then all of the "class Thing;" forward declarations break.


There's a lot of money to be made by someone who write a tool that takes in a C++ codebase and suggests all the improvements that can be made to speed the build up.


Just use a build system in which reliable incremental builds are possible and are the default. Your compilation will take seconds most of the time.


If you can get it to work (not guaranteed) icecream can make a very large difference.


This appears to be referring to a distcc derivative https://github.com/icecc/icecream

Note that discc doesn't really solve the header problem, it just throws more compute at it.


Yup, some people use spare machines lying around the office, some people buy a big-ass server that can be shared between the developers, some people (like me) buy a bunch of used desktop-class quad-core machines and run a little build cluster.

Very useful if you are working on very large codebases (I hack on LibreOffice, 10M LOC, from time to time).


How does the <algorithm> header compare in that list? That's the one I always try and avoid in my projects.


> Remember that std::vector consists of three pointers to a chunk of dynamically allocated memory.

Note that this is not required, and most std::string implementations inline small strings anyways.


Was your `std::string` SBO comment meant to relate to the `std::vector` quote? `std::vector` can't use SBO bc. `std::swap` on `std::vector`s can't invalidate pointers to elements.


I swapped std::string and std::vector when writing that comment :/ But still, there's no need for a vector to necessarily keep three pointers, it could use a pointer and two sizes for example.


In Rust they call it "three pointers" because a size and a pointer are either both 32 bits or both 64 bits


I’ve never heard this term used, to be honest. The type name is “usize.”


Whether std::vector holds three pointers or a pointer and two sizes is entirely irrelevant. Either way sizeof(vector<T>) does not vary depending on T.


It is extremely relevant when responding to the claim "std::vector consists of three pointers".


Only to pedants. Everyone knows that adding a pointer to a ptrdiff_t calculated from the same array gives you another pointer.

If your real life benchmark heavily penalises this addition (and not the complementary subtraction when computing a size or a capacity) I ask you to volunteer it.


It is not pedantic to reply to "std::vector is three pointers" with "actually, it might use a size instead of a pointer". Replying to "std::vector is often three pointers, or something functionally equivalent" with that would just be wrong. I have not made any claims about performance, but if I was asked to I would say that the difference is likely so minor that it doesn't matter.


It's not simply pedantic, you would be interested in knowing that if you are interested in the implementation of vector-like data structures or if you want to implement one yourself, and the only motivation for that isn't that you are penalized in some benchmark. In article about the details of C++, and especially in a footnote, it's relevant IMO.

It does happen that the only thing the point in the article relies on is `sizeof`, for sure.


Great write up. Thanks for sharing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: