One thing many people are not aware of: statically linked programs run much faster than dynamically linked ones!
Or to be specific, they start up much faster, fork()/exec() are much slower for dynamically linked programs, while for statically linked programs it is much faster than most people think.
There is a myth that forking is slow, and that caused people to abandon very simple, elegant and unixy solutions like CGI.
I wrote a whole web "framework" in rc shell ( http://werc.cat-v.org) and the reality is that if you statically link your programs you can use shell scripts that do dozens of forks per request and still provide better performance than something like php with fcgi.
(Another great thing is that shell scripts and pipes naturally and automagically take advantage of multi-core systems, Unix once again beautifully shows how simple and beautiful concepts like fork and pipes have unforeseen benefits many decades after they were invented.)
On the contrary, fork() has scalability trouble as memory grows. Even with copy on write, copying page tables is still O(1) with respect to address space (granted, with a significant divisor). This overhead becomes apparent as programs grow to gigabyte size -- a fork which before took microseconds can begin to take milliseconds. Forking is slow in many situations.
The issue described above can be avoided by using posix_spawn(3), which on linux uses vfork(2).
To the contrary, using dynamic libs allows the OS to cache commonly-used libs. Almost every program uses libc. When your process does a dyld load path as part of loading the ELF binary, the OS has the option to load a cached copy of the library; and possibly not even allocate an extra page of memory for it.
Using shared objects has a number of pitfalls: wasteful duplication on your hard disk, wasteful copying of the ELF binary into memory when you could tap the OS library cache for your dependencies (including extra page allocations), and the inability to upgrade a dependency of a binary without recompiling the binary.
If you're anyway fork/exec'ing a program you'd run before (which you have, if your're in an environment where this would matter), the binary is anyway cached in memory by the filesystem cache. But you don't have the processing overhead of doing the dynamic linking and possible relocation, nor do you pay the overhead of calling functions in a shared library. If the library is relocated, you don't even save memory.
For overly large programs, statically linking in e.g. an X11 enviroment, it might matter.
> Unix once again beautifully shows how simple and beautiful concepts like fork and pipes have unforeseen benefits many decades after they were invented
Multiprocessing and using multiple processes (instead of threads) to take advantage of them predates UNIX, by a lot.
That's not the point. Unix invented the idea of a simple system call to create a process by forking, and (more importantly) the idea of a "pipe" syntax in the shell to connect data streams between processes in a natural and intuitive way. These were usability and elegance enhancements, not performance things.
I'd recommend it too. I used to hang out on comp.compilers back when John was drafting it so, thanks to vetting the AIX bits, I got my name in the final book's acknowledgements (alongside dozens of others). Small thing, I know, but still pleases me when I think of it. :-)
Very nice article, clear explanation but if I may suggest something: please mention the user space trick LD_PRELOAD so users could see what changes when they actually point to another libc version. not 100% necessary but still fun and will complement nicely this article.
One thing would be worth mentionning: do_execve has it's a most commonly know wrapper for sys_execve.
I'm happy to see this type of article very useful indeed I can't wait for the follow-up
LD_PRELOAD belongs to the follow-up which discusses dynamically linked programs. It's a variable used by the dynamic linker, which isn't getting called for statically linked programs.
But I gave up on static linking in Linux after trying to static link a distributed network client we build in a University project that makes heavy use of C++11 std::thread and Boost. It compiles fine, but segfaults on startup. This is a known issue[1] but I did not investigated further.
It appears for me as an complete outsider to glibc/gcc Linux development that static linking is discouraged in Linux[2].
Static linking has its problems, especially when shared data and multithreading are involved. Or memory management. Dynamic linking has its problems though (DLL/DSO hell is just one). It's a tradeoff. But yes, it is also my understanding that dynamic linking is preferred. It's also what I usually do unless I have a good reason not to.
Or to be specific, they start up much faster, fork()/exec() are much slower for dynamically linked programs, while for statically linked programs it is much faster than most people think.
There is a myth that forking is slow, and that caused people to abandon very simple, elegant and unixy solutions like CGI.
I wrote a whole web "framework" in rc shell ( http://werc.cat-v.org) and the reality is that if you statically link your programs you can use shell scripts that do dozens of forks per request and still provide better performance than something like php with fcgi.
(Another great thing is that shell scripts and pipes naturally and automagically take advantage of multi-core systems, Unix once again beautifully shows how simple and beautiful concepts like fork and pipes have unforeseen benefits many decades after they were invented.)