Not deeply familiar with C++ modules, but I've built and maintained fairly large build systems for other languages (and written my fair share of C++), and from this article I'm not quite sure where the intractable problems lie. It seems like the .bmi files are effectively an optimization that allows for fast incremental compilation, but a compiler doesn't actually need them to run compilation from scratch: it knows how to generate them, so if they're missing it can fall back to the old, slower #include-style compile-every-file behavior, generating the .bmi files as it goes. It doesn't seem like they add new slow paths that you can't already construct today with macros and #include, so it's hard to see why they'd be DOA: first time compilation should be no slower, but incremental compilation should be much faster thanks to interface stability.
Maybe I'm missing something?
It's not like C++ modules were designed by random nobodies, though; this has been worked on by build infra engineers at major companies with enormous C++ codebases like Facebook, and compiler maintainers e.g. the Clang maintainers. It's possible they completely forgot to think about parallel builds, but that seems at least a little unlikely.
But you can't just compile-every-file. Each file can depend on the outputs of compiling some unknown set of other files. The compiler needs to become a build system, or the build system needs to become a compiler.
The clang modules proposal had the concept of mapping files, mapping module names to file names.
Companies like Facebook will presumably use proper build systems that already encode the dependency information in the build files rather than try to autodetect it. In that kind of an environment this proposal probably isn't particularly painful.
The compiler will not become a build system because this is out of scope for C++. With or without modules, C++ will continue to rely on an external dependency management tool, such as a Makefile. The introduction of modules will not change anything in this respect.
Indeed. You've now taken one solution off the table. The other one is for the build system to become a compiler, which is equally unacceptable. That leaves you with manually encoding all dependency information in the build files. Which most people aren't doing (the exception being Bazel-like build systems which enforce that).
That seems to leave us with just one conclusion: the article is right, and most of the ecosystem will never migrate to modules, leaving us with the worst of both worlds.
> parsing C++ is mostly equivalent to becoming a C++ compiler.
It reallt isn't. Parsing a languagr just means validating its correctness wrt a grammar and in the process extract some information. Parsing something is just the first stage and a one of many stages required to map C++ source code to valid binaries.
The presence of template specialisations and constexpr functions means that the GP is right here; you cannot decide whether an arbitrary piece of C++ is syntactically valid without instantiating templates and/or interpreting constexpr functions. Consider
template <int>
struct foo {
template <int>
static int bar(int);
};
template <>
struct foo<8> {
static const int bar = 99;
};
constexpr int some_function() { return (int) sizeof(void*); };
Now given the snippet
foo<some_function()>::bar<0>(1);
then if some_function() returns something other than 8, we use the primary template and foo<N>::bar<0>(1) is a call to a static template member function.
But if some_function() does return 8, we use the specialisation and the foo<8>::bar is an int with value 99; so we ask is 99 less than the expression 0>(1) (aka "false", promoted to the int 0).
That is, there are two entirely different but valid parses depending on whether we are compiling on a 32- or 64-bit system.
You only need to parse the "module <module name>" and "import <module name>" statements. No need to parse all of C++ for that. You could probably even do that with a regex.
It also has to do all the preprocessing to see which import statements get hit. I don't think templates could control at compile time which module to import, at least I hope not.
You are misrepresenting the concept of undecidable. If the compiler can say if the program compiles or not, then it is most certainly decidable. What you want to say is that it cannot be determined without full parsing, so no preprocessing is possible.
No, it's actually undecidable. C++ templated have been determined to be turing complete, which means that template instantiations can encode the halting problem. Determining whether a program compiles or not therefore requires solving the halting problem.
In practice, compilers work around this by limiting template instantiation depth.
I gave an example of a template program to show the general method. Obviously, primality is decidable, but there exist candidate C++ programs whose parse tree is undecidable. The trick would be to encode your parser in a template, run it on the undecidable program (i.e., itself), and create a contrary result. Does this have any effect on practical C++ builds? I honestly have no idea.
They could just specify that the module/import statements need to be at the top of the file (excluding comments). Most people will do this anyway. Then the build system only needs to parse comments and module statements, which should be fast and easy.
So in reality build systems will be required to invoke at least the preprocessor to extract dependency information.
AFAIK the modules support in the Build2 build system does exactly this, and in fact caches the entire preprocessed file to pass to the compiler proper later.
Having the compiler produce header dependency information is possible, since the dependencies are just an optimization. If there's no dependency information available, you can just compile all of the files in an arbitrary order, and you get both the object file and a dep file. And then on further runs you use the old dep files to skip unnecessary recompilations.
With modules, you can't compile the files in an arbitrary order: if A uses a module defined in B, B must be compiled first. So you need to have the dependency information available up front even for the first build. And since it needs to be available up front, it can't be generated by the compiler. It must either be produced by the build tool which becomes vastly more complicated, or manually by humans.
This is not different from a situation where C++ compilation has a binary dependency on other modules. The best know situation is a static library (.a file). In this case, the project cannot be built if there is a static library missing. With modules, one cannot compile the project with missing modules, so the build system will have to provide this information.
The C and I presume C++ standard has been very carefully avoiding the idea of a preprocessor being separate at all. The standard was very carefully worded to prevent that being necessary, because most C compilers do not have a separate preprocessor. It is only Unix heritage compilers that really have one, and even they're not consistent about it.
Your conclusion is incorrect. Most people with simple projects will use simple techniques to make modules work without worrying about the preprocessor. Large companies will create their own tooling to use modules in their own way. My point is that this is how C++ has been used since its inception. C++ users are already aware that the language needs external building support and modules cannot change this reality. But modules will certainly improve how the language is used.
The compiler won't be a build system? I'm not quite so sure. We already have -MD in gcc to emit Makefile rules for the dependencies of the current file. It's not much of a stretch to propose a similar flag to emit a list of required modules. In fact the very same flag could emit a foo.bmi target requirement when you "import foo" and your Makefile should have foo.bmi as one of the products of compiling the foo module. You could also have a similar flag that tells you what modules are built from the current cpp file given some compiler options.
What I gathered is that module compilation is intended to be safe from preprocessor actions defined outside of the module. So the code you would generate with #include-style compilation and the code you would generate by compiling modules in the intended fashion aren't guaranteed to be the same. It seems as though this would mean that projects involving modules simply couldn't be compiled in the previous fashion.
First compilation time will be slower if before you could compile 8 files at a time, but now you can only do 3 at first because the others depend on those 3. Then maybe you can compile 6 at the same time, because all the rest of the code depends on those 6 modules, etc.
Maybe I'm missing something?
It's not like C++ modules were designed by random nobodies, though; this has been worked on by build infra engineers at major companies with enormous C++ codebases like Facebook, and compiler maintainers e.g. the Clang maintainers. It's possible they completely forgot to think about parallel builds, but that seems at least a little unlikely.