As far as I understand it (I could be wrong as I've pieced this together from multiple sources), forking still reserves the same amount of memory that the parent process is using without actually copying the data (since it could very well use all that memory eventually).
The wrench is that Linux has a feature called memory overcommit (which I haven't been able to decipher completely). Supposedly it causes forking to not actually reserve that much space, but by default it's in a "heuristic" mode so it may or may not take effect.
These are the best resources I could find on what happens when you fork:
I'm not anywhere near being a Java expert but I've seen similar behavior with a non-Hadoop workload.
In my case I had several virtual machines hosting Java VMs with 2+GB heaps running an app that liked to fork and run external programs for short periods of time. If the entire heapsize was say 3GB and it forked twice Linux would act like it needed 9GB. The Linux overcommit heuristic regularly got things wrong and wanted RAM that it would never actually use. This usually resulted in the JVM failing in new and interesting ways.
The workaround is to allocate a crapload of swap (mainly more than heap size times a guestimate of number of concurrent forks). It will never actually USE the swap but having it there seems keep the overcommit heuristic happy.
Yay for cargo cult server tuning. I've never figured out how to get the kernel to not be so pessimistic and I can't modify the Java code in question so... eh.
I think there's a confusion here between the copy-on-write system that makes fork() so fast under Unix and Linux's overcommiting memory, which allows malloc() to return success for memory allocation that will in fact only be allocated as necessary. (e.g. you can allocate 32GB using malloc but as long as you don't write anything there, the kernel isn't really allocating that much memory).
The latter causes much trouble because people tend to assume that, say, a database can allocate a certain amount of memory and be sure that it will never run out of memory as long as it does not explicitly allocate more memory. This is not true for Linux at all, and a process can be terminated at any time (if I am not mistaken, with the confusing SIGILL signal and an OOM-killer message sent to the system log) if the system is running out of memory.
I suspect the diagnostics in the article may be wrong, and there is something else causing the problem there. My polite guess would be something with the JVM (you are all having problems with the JVM... hmmm...). Merely forking a process is a tiny operation under Linux and I don't believe memory overcommitting is relevant at all here. And in any case you can simply turn off overcommits with a sysctl to verify the theory.
We got bitten by the same bug shelling out from the JVM. We ended up running socat in a separate process and bouncing all our external processes off it instead. But it's insane that you even have to think about this kind of thing.
Still, it does make you step back and wonder why they've put together Hadoop nodes and intentionally ran them without swap. The resulting OOM is not surprising.
The wrench is that Linux has a feature called memory overcommit (which I haven't been able to decipher completely). Supposedly it causes forking to not actually reserve that much space, but by default it's in a "heuristic" mode so it may or may not take effect.
These are the best resources I could find on what happens when you fork:
http://developers.sun.com/solaris/articles/subprocess/subpro... http://lxr.linux.no/linux/Documentation/vm/overcommit-accoun...