A/b test mallocs against your memory footprint

jwilliams · on March 19, 2009

I'm a bit lost - what does this provide above existing tools - e.g. MallocDebug+Shark on OSX or Valgrind/heap/leaks/etc on Linux?

ice799 · on March 19, 2009

the shim outputs a log of the memory allocation functions called (malloc,realloc,calloc,free) that can be replayed by the replayer program.

you can then replay the exact same allocation pattern against different allocators to determine which is best for you.

to my knowledge, the tools you mention do not provide similar functionality.

jwilliams · on March 19, 2009

Right ok.... What I don't get is that it mentions profiling the advantage of TCMalloc, which is thread-cached malloc. Is this going to be realistic when the replayer is a single thread? (I could be missing something)

ice799 · on March 19, 2009

Short answer: depends what you care about.

Long answer: if the allocator is poorly designed, a lot of time will be spent traversing its free list/tree/whatever looking for a block to fit your size requirements. this lookup time can be exacerbated if your heap is badly fragmented, or if the allocator does a poor job coalescing freed blocks. you could end up spending lots of time in malloc looking for a nicely sized block.

also, with regard to heap fragmentation - long running processes which do lots of allocations/frees can cause fragmentation, again depending on the design of the allocator. if there is a lot of heap frag, you could see some substantial bloating.

so profiling your process for those two items can be valuable.

what you say is true; the major gain for TCMalloc is in multi-(native)-threaded apps.

perhaps the next version of malloc_wrap will support multiple threads.

in either case, we have not yet finished collecting data about the different allocators, so I am not currently in a position to say which is better for our use case.

i just wanted a tool to let me replay a constant set of allocation patterns against different allocators to find out if swapping out libc's malloc made a difference for us and that is precisely what malloc_wrap is.

jwilliams · on March 19, 2009

what you say is true; the major gain for TCMalloc is in multi-(native)-threaded apps.

I think this is pretty key, because otherwise TCMalloc is somewhat of an overhead. Depending on your platform, a standard malloc with will pull ahead (depends on how favourable locking is, but it is the case for OS X anyway).

A multi-threaded instance sounds interesting, but - I'm guessing it would be a challenge to get a representative sample.

ice799 · on March 19, 2009

You might be reading the article too literally -- you can test more than just tcmalloc, of course (ned, ptmalloc*, libumem, etc). It is -very- possible that one of these allocators will handle our memory footprint more gracefully than say, libc. There is only one way to find out: via A/B testing.

I think the important thing to keep in mind is that assertions like:

"I think this is pretty key, because otherwise TCMalloc is somewhat of an overhead."

are a bit subjective, IMHO. Allocators are different from one another, and of course they react to a series of allocations/deallocations differently. We're trying to find out if the way we use our heap is better suited to another allocator like tcmalloc, or nedmalloc, or whatever.

And RE: multi-threaded - I don't believe it will be particularly difficult to get a representative sample, but working on that isn't very high on my list right now.

tmm1 · on March 19, 2009

http://goog-perftools.sourceforge.net/doc/tcmalloc-opspersec...