Inlining saves time lost due to jumping about, but it can cost time if it causes code replication (same as loop unrolling), because it can bloat the hot code to larger than the smallest cache.
So the arguments against inlining apply even more strongly when talking about every program being statically linked, the same code (standard library) will exist in memory in many places, and will get dumped and reloaded to L2/L3 every process swap. Nothing slower than having to wait for something to be faulted in.
And sufficiently aggressive inlining will increase the program size further. This might or might not be compensated for by the increase in instruction-pointer locality.