Makes perfect sense to me for the updates to happen atomically and avoid causing lock contention, even if that makes the loader more allocation-happy than it'd be otherwise. I've done similar things before.
What about the query path? Your post talked about 10% improvement in response latency by changing memory allocators. That could be due to something like one allocator making huge page use more possible and thus vastly decreasing TLB pressure...but it also could be due to heavy malloc/free cycles during the query getting sped up. Is that happening, and if so why are those allocations necessary? Ignoring the tone, I think this is more what akira2501 was getting at. My inclination would be to explore using per-request arenas.
Per-request arenas are a nice idea, yeah - we haven't pursued the optimizations more, because we're satisfied with the current speed, but certainly there are some things that could be improved.
What about the query path? Your post talked about 10% improvement in response latency by changing memory allocators. That could be due to something like one allocator making huge page use more possible and thus vastly decreasing TLB pressure...but it also could be due to heavy malloc/free cycles during the query getting sped up. Is that happening, and if so why are those allocations necessary? Ignoring the tone, I think this is more what akira2501 was getting at. My inclination would be to explore using per-request arenas.