That's fairly interesting about lld speedup. What I gather is it's faster if you build lld itself with mimalloc, and the reason they chose mimalloc was largely because it was easy or at least possible to integrate mimalloc with the project on supported platforms, and jemalloc or tcmalloc was harder. Which makes me wonder: will distro packages build it with mimalloc? Do they have to do something, or is on by default?
It appears that simply preloading either mimalloc or tcmalloc has a significant speedup for clang-16 and -15 for that matter. I wish these Linux systems came with better defaults!
With mimalloc, a medium-sized C++ test repeatedly takes exactly 43.7s to build and link on this system, while with the default allocator it's all over the place from 46s to 49s. I would loosely characterize that as a free 10%.