Right ok.... What I don't get is that it mentions profiling the advantage of TCM...

ice799 · on March 19, 2009

Short answer: depends what you care about.

Long answer: if the allocator is poorly designed, a lot of time will be spent traversing its free list/tree/whatever looking for a block to fit your size requirements. this lookup time can be exacerbated if your heap is badly fragmented, or if the allocator does a poor job coalescing freed blocks. you could end up spending lots of time in malloc looking for a nicely sized block.

also, with regard to heap fragmentation - long running processes which do lots of allocations/frees can cause fragmentation, again depending on the design of the allocator. if there is a lot of heap frag, you could see some substantial bloating.

so profiling your process for those two items can be valuable.

what you say is true; the major gain for TCMalloc is in multi-(native)-threaded apps.

perhaps the next version of malloc_wrap will support multiple threads.

in either case, we have not yet finished collecting data about the different allocators, so I am not currently in a position to say which is better for our use case.

i just wanted a tool to let me replay a constant set of allocation patterns against different allocators to find out if swapping out libc's malloc made a difference for us and that is precisely what malloc_wrap is.

jwilliams · on March 19, 2009

what you say is true; the major gain for TCMalloc is in multi-(native)-threaded apps.

I think this is pretty key, because otherwise TCMalloc is somewhat of an overhead. Depending on your platform, a standard malloc with will pull ahead (depends on how favourable locking is, but it is the case for OS X anyway).

A multi-threaded instance sounds interesting, but - I'm guessing it would be a challenge to get a representative sample.

ice799 · on March 19, 2009

You might be reading the article too literally -- you can test more than just tcmalloc, of course (ned, ptmalloc*, libumem, etc). It is -very- possible that one of these allocators will handle our memory footprint more gracefully than say, libc. There is only one way to find out: via A/B testing.

I think the important thing to keep in mind is that assertions like:

"I think this is pretty key, because otherwise TCMalloc is somewhat of an overhead."

are a bit subjective, IMHO. Allocators are different from one another, and of course they react to a series of allocations/deallocations differently. We're trying to find out if the way we use our heap is better suited to another allocator like tcmalloc, or nedmalloc, or whatever.

And RE: multi-threaded - I don't believe it will be particularly difficult to get a representative sample, but working on that isn't very high on my list right now.

tmm1 · on March 19, 2009

http://goog-perftools.sourceforge.net/doc/tcmalloc-opspersec...