Only if you insist on clinging to a general purpose, system level perspective on memory allocation. No amount of thinking and reasoning about these issues is useless.
While OP might've phrased it more politely, he's got a point - allocators that aren't designed for multithreaded use are ultimately oversimplified toy projects.
It's really not that much of a challenge to knock together a (slab + heap + free list) allocator that will perform really well single-threadedly. However it will be nearly impossible to adapt it to the multithreaded context. It is a considerably more complex task and the end result will end up looking like a rocket ship compared to a simpleton that even the best single-threaded allocator will look like.
Not if they're meant for specific uses in limited parts of your application; one allocator per thread, for instance. Why are you clinging so hard to the system level, general purpose perspective? Given that the problems it comes with don't really have any good answers. How is that supposed to lead us forward?
Doesn't the benchmark also just allocate and then immediately free? I think I could specialise for that particular use pattern and make it very fast just for the benchmarks. Will that not work for some reason? A good benchmark might be a replayed set of allocate and free operations from a large real application.