One thing many people are not aware of: statically linked programs run much faster than dynamically linked ones!
Or to be specific, they start up much faster, fork()/exec() are much slower for dynamically linked programs, while for statically linked programs it is much faster than most people think.
There is a myth that forking is slow, and that caused people to abandon very simple, elegant and unixy solutions like CGI.
I wrote a whole web "framework" in rc shell ( http://werc.cat-v.org) and the reality is that if you statically link your programs you can use shell scripts that do dozens of forks per request and still provide better performance than something like php with fcgi.
(Another great thing is that shell scripts and pipes naturally and automagically take advantage of multi-core systems, Unix once again beautifully shows how simple and beautiful concepts like fork and pipes have unforeseen benefits many decades after they were invented.)
On the contrary, fork() has scalability trouble as memory grows. Even with copy on write, copying page tables is still O(1) with respect to address space (granted, with a significant divisor). This overhead becomes apparent as programs grow to gigabyte size -- a fork which before took microseconds can begin to take milliseconds. Forking is slow in many situations.
The issue described above can be avoided by using posix_spawn(3), which on linux uses vfork(2).
To the contrary, using dynamic libs allows the OS to cache commonly-used libs. Almost every program uses libc. When your process does a dyld load path as part of loading the ELF binary, the OS has the option to load a cached copy of the library; and possibly not even allocate an extra page of memory for it.
Using shared objects has a number of pitfalls: wasteful duplication on your hard disk, wasteful copying of the ELF binary into memory when you could tap the OS library cache for your dependencies (including extra page allocations), and the inability to upgrade a dependency of a binary without recompiling the binary.
If you're anyway fork/exec'ing a program you'd run before (which you have, if your're in an environment where this would matter), the binary is anyway cached in memory by the filesystem cache. But you don't have the processing overhead of doing the dynamic linking and possible relocation, nor do you pay the overhead of calling functions in a shared library. If the library is relocated, you don't even save memory.
For overly large programs, statically linking in e.g. an X11 enviroment, it might matter.
> Unix once again beautifully shows how simple and beautiful concepts like fork and pipes have unforeseen benefits many decades after they were invented
Multiprocessing and using multiple processes (instead of threads) to take advantage of them predates UNIX, by a lot.
That's not the point. Unix invented the idea of a simple system call to create a process by forking, and (more importantly) the idea of a "pipe" syntax in the shell to connect data streams between processes in a natural and intuitive way. These were usability and elegance enhancements, not performance things.
Or to be specific, they start up much faster, fork()/exec() are much slower for dynamically linked programs, while for statically linked programs it is much faster than most people think.
There is a myth that forking is slow, and that caused people to abandon very simple, elegant and unixy solutions like CGI.
I wrote a whole web "framework" in rc shell ( http://werc.cat-v.org) and the reality is that if you statically link your programs you can use shell scripts that do dozens of forks per request and still provide better performance than something like php with fcgi.
(Another great thing is that shell scripts and pipes naturally and automagically take advantage of multi-core systems, Unix once again beautifully shows how simple and beautiful concepts like fork and pipes have unforeseen benefits many decades after they were invented.)