> Memory allocators have to be thread safe and – in order to be performant – need to be able to serve a large number of concurrent requests from different threads.
Another approach is to create a bunch of individual allocators that all hold onto a big chunk of memory that only a small set of threads (or even one thread) use. These allocators only contend for locks with each other when they need to fetch more memory from the OS. Otherwise, they just hold onto the big slab of memory they've allocated, give it out when needed, and when it's "deallocated" they keep holding onto it instead of handing it back to the OS, under the expectation that the thread (or small pool of threads) they're managing memory for will need more again soon.
The only downside to this approach is that over time, the app ends up holding onto tons of memory it's not using right now, on the assumption it will be needed in the future.
Ever notice how Chrome eats memory like it's a three-year-old at an ice cream buffet? That performance ain't free.
I think you're misunderstanding the context and the way the allocator works. Take a look at the docs/papers on JEMalloc in general to get a sense of it, but it's already doing what you describe (this would be the threads cache that you can tune). This reduces the need for locking in exchange for less memory efficiency; what this post is talking about is efforts to improve the efficiency of the remaining synchronization.
Any thread-safe allocator is constantly fighting to be as efficient as possible with its synchronization, because that allows it reduces the amount of trading memory for concurrency/efficiency.
...and while Chrome's PartitionAlloc is different from Firefox's modified JEMalloc, there's a lot of similarity in how the two allocators try to manage concurrency.
You can see the relevant code from the links. It's all in memory/build/Mutex.h. On windows it uses critical sections and on non-Darwin systems it uses pthread mutexes.