Just for comparison, in Rust this is solved in a very easy way: If you are in a module foo and have a mod bar; statement,
then the compiler will go search for bar.rs and for bar/mod.rs. If neither are found, it reports an error. There is only one path where the compiler starts the search from: the foo/ directory (note that the foo module itself can be declared in the foo directory or outside as foo.rs).
Sometimes C++ can use its age as an excuse to be super complicated but here, the modules implementation of C++ is younger than Rust's.
In Rust's crate compilation model, there's certainly some unexploited parallelism. Often as much as half of the compilation is spent in the LLVM phases. By then, all the MIR is already around and only sitting there, waiting on LLVM to finish. Downstream crates could already start their compilation with the MIR data only. Only the LLVM phases of the downstream crates need the LLVM data of the upstream crates. Assuming that half of the time is spent in MIR, half in LLVM IR, you would be able to double your parallelism, or halve the length of the critical path through the compilation graph.
Yes, often things are compiled in parallel but often there are tight spots in the compilation graph where only one crate is compiling because all later crates are relying on it.
Sometimes C++ can use its age as an excuse to be super complicated but here, the modules implementation of C++ is younger than Rust's.