As far as I understand it (I could be wrong as I've pieced this together from mu...

blocke · on April 22, 2011

I'm not anywhere near being a Java expert but I've seen similar behavior with a non-Hadoop workload.

In my case I had several virtual machines hosting Java VMs with 2+GB heaps running an app that liked to fork and run external programs for short periods of time. If the entire heapsize was say 3GB and it forked twice Linux would act like it needed 9GB. The Linux overcommit heuristic regularly got things wrong and wanted RAM that it would never actually use. This usually resulted in the JVM failing in new and interesting ways.

The workaround is to allocate a crapload of swap (mainly more than heap size times a guestimate of number of concurrent forks). It will never actually USE the swap but having it there seems keep the overcommit heuristic happy.

Yay for cargo cult server tuning. I've never figured out how to get the kernel to not be so pessimistic and I can't modify the Java code in question so... eh.

zorked · on April 23, 2011

I think there's a confusion here between the copy-on-write system that makes fork() so fast under Unix and Linux's overcommiting memory, which allows malloc() to return success for memory allocation that will in fact only be allocated as necessary. (e.g. you can allocate 32GB using malloc but as long as you don't write anything there, the kernel isn't really allocating that much memory).

The latter causes much trouble because people tend to assume that, say, a database can allocate a certain amount of memory and be sure that it will never run out of memory as long as it does not explicitly allocate more memory. This is not true for Linux at all, and a process can be terminated at any time (if I am not mistaken, with the confusing SIGILL signal and an OOM-killer message sent to the system log) if the system is running out of memory.

I suspect the diagnostics in the article may be wrong, and there is something else causing the problem there. My polite guess would be something with the JVM (you are all having problems with the JVM... hmmm...). Merely forking a process is a tiny operation under Linux and I don't believe memory overcommitting is relevant at all here. And in any case you can simply turn off overcommits with a sysctl to verify the theory.

technomancy · on April 22, 2011

We got bitten by the same bug shelling out from the JVM. We ended up running socat in a separate process and bouncing all our external processes off it instead. But it's insane that you even have to think about this kind of thing.

sc68cal · on April 22, 2011

Still, it does make you step back and wonder why they've put together Hadoop nodes and intentionally ran them without swap. The resulting OOM is not surprising.