The main supposed benefit is not disk or memory economy, but central patching.
The theory is as follows: If a security flaw (or other serious bug) is found, then you can fix it centrally if you use shared libraries, as opposed to finding every application that uses the library and updating each separately.
In practice this doesn't work because each application has to be tested against the new version of the library.
> then you can fix it centrally if you use shared libraries, as opposed to finding every application that uses the library and updating each separately.
This comes up a lot, but how often do you end up in a scenario where there's a critical security hole and you _can't_ patch it because one program somewhere is incompatible with the new version? Maybe even a program that isn't security critical.
Plus what you mentioned about testing. If you update each application individually, you can do it piecemeal. You can test your critical security software and roll out a new version immediately, and then you can test the less critical software second. In some systems you can also do partial testing, and then do full testing later because you're confident that you have the ability to roll back just one or two pieces of software if they break.
It's the same amount of work, but you don't have to wait for literally the entire system to be stable before you start rolling out patches.
I don't think it's 100% a bad idea to use shared libraries, I do think there are some instances where it does make sense, and centralized interfaces/dependencies have some advantages, particularly for very core, obviously shared, extremely stable interfaces like Linus is talking about here. But it's not like these are strict advantages and in many instances I suspect that central fixes can actually be downsides. You don't want to be waiting on your entire system to get tested before you roll out a security fix for a dependency in your web server.
> This comes up a lot, but how often do you end up in a scenario where there's a critical security hole and you _can't_ patch it because one program somewhere is incompatible with the new version? Maybe even a program that isn't security critical?
Talking about user usecases, every time I play a game on Steam. At the very least there are GNU TLS versioning problems. That's why steam packages it's own library system containing multiple versions of the same library -- thus rendering all of the benefits of shared libraries completely null.
One day, game developers will package static binaries, and the compiler will be able to rip out all of the functions that they don't use, and I won't have to have 5 - 20 copies of the same library on my system just sitting around -- worse if you have multiple steam installs in the same /home partition because you're a distro hopper.
There's a big difference between production installs and personal systems or hobbyist systems. Think, there are lots of businesses running Cobol on mainframes, there are machine shops running on Windows XP, there are big companies still running java 8 and python2. When you have a system that is basically frozen, you end up with catastrophic failure where to upgrade X you need to upgrade Y which requires upgrading Z, etc. You'd be surprised what even big named companies are running in their datacenters, stuff that has to work, is expensive to upgrade, and by virtue of being expensive to upgrade it ends up not being upgraded to the point where any upgrade becomes a breaking change. And at the rate technology changes, even a five year old working system quickly becomes hopelessly out of date. Now imagine a 30 year old system in some telco.
These are such different use cases that I think completely different standards and processes as well as build systems are going to become the norm for big critical infrastructure versus what is running on your favorite laptop.
Well, not really. The compiler is able to optimize the contents of the library and integrate it with the program. i.e. some functions will just be inlined, and that means that those functions won't exist in the same form after other optimizations are applied (Like, maybe the square root function has specific object code, but after inlining the compiler is able to use the context to minify and transform it further).
Yes, but LTO doesn't apply across shared object libraries. Suppose I write a video game that uses DirectX for graphics, but I don't use the DirectX Raytracing feature at all. Because of DLL hell, I'm going to be shipping my own version of the DirectX libraries, ones that I know my video game is compatible with. Those are going to be complete DirectX libraries, including the Raytracing component, even though I don't use it at all in my game. No amount of LTO can remove it, because theoretically that same library could be used by other programs.
On the other hand, if I am static linking, then there are no other programs that could use the static library. (Or, rather, if they do, they have their own embedded copy.) The LTO is free to remove any functions that I don't need, reducing the total amount that I need to ship.
Good point (and shows that I am not a video game developer). I had tried to pick DirectX as something that would follow fao_'s example of game developers. The point still holds in the general case, though as you pointed out, not in the case of DirectX in particular.
Even without LTO, linker will discard object files that aren't used (on Linux a static library is just an AR archive of object files). It's just a different level of granularity.
I doubt that you did since OpenGL implementations are hardware-specific. Perhaps you mean utility libraries building on top of OpenGL such as GLEW or GLUT.
Some libraries (OpenGL, Vulkan, ALSA, ..) the shared library provides the lowest stable cross-hardware interface there is so linking the library makes no sense.
> This comes up a lot, but how often do you end up in a scenario where there's a critical security hole and you _can't_ patch it because one program somewhere is incompatible with the new version? Maybe even a program that isn't security critical.
That’s not the point. The point is having to find and patch multiple copies of a library in case of vulnerability instead of just one.
Giving up the policy to enforce shared libraries would just make the work of security teams much harder.
In practice this works really well as long as the change is to eliminate an unwanted side effect rather than to change a previously documented behavior.
But it doesn't really matter. What matters is that whatever system is in use needs to have a control system that can quickly and reliably tell you everything that uses a vulnerable library version, and can then apply the fixes where available and remind you of the deficiencies.
That metadata and dependency checking can be handled in any number of ways, but if it's not done, you are guaranteed not to know what's going on.
If a library is used inside a unikernel, inside a container, inside a virtual machine, inside venvs or stows or pex's or bundles, the person responsible for runtime operations needs to be able to ask what is using this library, and what version, and how can it be patched. Getting an incomplete answer is bad.
I strongly agree that the reporting and visibility you're talking about are important.
But there's one other advantage of the shared library thing, which is that when you need to react fast (to a critical security vulnerability, for example), it is possible to do it without coordinating with N number of project/package maintainers and getting them all to rebuild.
You still do want to coordinate (at least for testing purposes), but maybe in emergencies it's more important to get a fix in place ASAP.
>In practice this works really well as long as the change is to eliminate an unwanted side effect rather than to change a previously documented behavior
...and then you deploy and discover that somebody was depending on that "unwanted" side effect.
> In practice this works really well as long as the change is to eliminate an unwanted side effect rather than to change a previously documented behavior.
I fully agree with that, didn't understand the rest.
Let's say you are building a web-based file upload/download service. You're going to write some code yourself, but most components will come from open-source projects. You pick a base operating system, a web server, a user management system, a remote storage system, a database, a monitoring system and a logging system. Everything works!
Now it's a month later. What do you need in ongoing operations, assuming you want to keep providing reasonable security?
You need to know when any of your dependencies makes a security-related change, and then you need to evaluate whether it affects you.
You need to know which systems in your service are running which versions of that dependency.
You need to be able to test the new version.
You need to be able to deploy the new version.
It doesn't matter what your underlying paradigm is. Microservices, unikernels, "serverless", monoliths, packages, virtual machines, containers, Kubernetes, OpenStack, blah blah blah. Whatever you've got, it needs to fulfill those functions in a way which is effective and efficient.
The problem is that relatively few such systems do, and of those that do, some of them don't cooperate well with each other.
It's plausible that you have operating system packages with a great upstream security team, so you get new releases promptly... and at the same time, you use a bunch of Python Packages that are not packaged by your OS, so you need to subscribe individually to each of their changefeeds and pay attention.
The whole "shared libraries are better for security" idea is basically stuck in a world where we don't really have proper dependency management and everyone just depends on "version whatever".
This is interestingly also holding back a lot of ABI breaking changes in C++ because people are afraid it'll break the world... which, tbf, it will in a shared library world. If dependencies are managed properly... not so much.
I wish distros could just go back to managing applications rather than worrying so much about libraries.
EDIT: There are advantages with deduplicating the memory for identical libraries, etc., but I don't see that as a major concern in today's world unless you're working with seriously memory-constrained systems. Even just checksumming memory pages and deduplicating that way might be good enough.
So if there is 'proper dependency management' (what do you propose? are we too fixed in versioning, too loose?) how will you fix the next Heartbleed? Pushing updates to every single program that uses OpenSSL is a lot more cumbersome (and likely to go wrong because there is some program somewhere that did not get updated) than simply replacing the so/dll file and fixing the issue for every program on the system.
And in case your definition of proper dependency management is 'stricter', then you simply state that you depend on a vulnerable version, and fixing the issue will be far more cumbersome as it requires manual intervention as well, instead of an automated update and rebuild.
If it is looser, then it will also be far more cumbersome, as you have to watch out for breakage when trying to rebuild, and you need to update your program for the new API of the library before you can even fix the issue at all.
No, it is not cumbersome to reinstall every program that relies on OpenSSL. My /usr/bin directory is only 633 MB. I can download that in less than a minute. The build is handled by my distro's build farm and it would have no problem building and distributing statically linked binaries if they ever became the norm.
That is going back to the same issues with containers, where everything works just fine... as long as you build it from your own statically-configured repo and you rebuild the whole system every update. It's useless once you try to install any binaries from an external package source. And IMO, a world where nobody ever sends anyone else a binary is not a practical or useful one.
Yes? Rebuilding and (retesting!) the system on every major update is not a bad idea at all. I rarely install binaries from out-of-repo sources so that is not a great problem for me. And those I do install tend to be statically linked anyway.
> In practice this doesn't work because each application has to be tested against the new version of the library.
In practice it works: see autopkgtest and https://ci.debian.net, which reruns the tests for each reverse dependency of a package, every time a library gets updated.
I know for a fact that other commercial, corporate-backed distributions are far away from that sophistication, though.
> I know for a fact that other commercial, corporate-backed distributions are far away from that sophistication, though.
No, they're not. Both Fedora/RHEL and openSUSE/SLE do similar checks. Every update submission goes through these kinds of checks.
Heck, the guy who created autopkgtest works at Red Hat and helped design the testing system used in Fedora and RHEL now. openSUSE has been doing dependency checks and tests for almost a decade now, with the combination of the openSUSE Build Service and OpenQA.
Having worked with both systems, Debian CI does not focus on system integration. Fedora CI and Debian CI are more similar than different, but Fedora also has an OpenQA instance for doing the system integration testing as openSUSE does. openSUSE's main weakness is that they don't do deeper inspections of RPM artifacts, the dependency graph, etc. They don't feel they need it because OBS auto-rebuilds everything on every submission anyway. The Fedora CI tooling absolutely does this since auto-rebuilds on build aren't a thing there, and it's done on PRs as well as update submissions.
If you can retest everything, you can rebuild everything. But that doesn't help the network bandwidth issue. Debian doesn't have incremental binary diff packages anyway (delta debs) in the default repos anyway, so there's room for improvement there.
> If you can retest everything, you can rebuild everything
No, rebuilding is way heavier.
Even worse, some languages insist on building against fixed version of their dependencies.
It forces distributions to keep multiple version of the same library and this leads to a combinatorics explosion of packages to fix, backport, compile and test.
It's simply unsustainable and it's hurting distributions already.
So if the CVE says that you need to update library L, and program A uses the thing that's broken and it's ok to update, but program B doesn't use the broken thing, but the L upgrade breaks it, CI would let you know that you're screwed before you upgrade... but you're still screwed.
It's worse if the program that needs the update also is broken by the update, of course.
Now it's your choice... you either lose B but protect the rest of the infrastructure from hackers... or you think the CVE doesn't apply to your usecase (internal thing on a testing server), and don't upgrade L to keep B working.
You can also install both versions of L. You can also patch the broken part of L out at the old version, if it's not mission critical. There's a lot of things you can do.
Having one giant binary file with everything statically compiled in is worse in every way, except for distribution-as-a-single-file (but you can already do this now, by putting the binary and the libraries in a single zip, dump everytinh in /opt/foo, and let user find the vulnerable library manually... which again, sucks.
If it were static libraries, you'd upgrade the package for A (which would need to be recompiled with updated L) and leave B alone. As a low priority followup, fix either B or L so they can work together again (or wait for someone else to fix and release).
Installing both versions of L is usually hard. It's one thing if it's OpenSSL[1] 1.1 vs 1.0, but if 1.0.0e is needed for security and 1.0.0d is needed for other applications, how do you make that work in Debian (or any other system that's at least somewhat mainstream)?
[1] Not to pick on OpenSSL, but it's kind of the poster child for important to pick up updates that also break things; but at least they provide security updates across branches that are trying to stay compatible.
For a rough measure of how many packages will break if you perform a minor update to a dependency, try searching “Dependabot compatibility score” on GitHub (or just author:dependabot I suppose), each PR has a badge with how many public repo’s CI flows were successful after attempting to apply the patch. By my eye it seems to be generally anywhere from 90-100%. So the question is would you be happy if every time a security vulnerability came out approximately 1 in 20 of your packages broke but you were “instantly” protected, or would you rather wait for the maintainers to get around to testing out and shipping the patch themselves. Given the vast majority of these vulnerabilities only apply in specific circumstances and the vast majority of my packages are only used in narrow circumstances with no external input, I’d take stability over security.
Security patches are typically much smaller scoped than other library updates. Also, Dependabot does not support C or C++ packages so its stats are not that useful for shared libraries, which are most commonly written in C.
Rebuilding and running tests on that amount of packages every time there's a security update in a dependency is completely unsustainable for Linux distribution as well as the internal distributions in large companies.
I worked these systems in various large organization and distros. We did the math.
On top of that, delivering frequent very large updates to deployed systems is very difficult in a lot of environments.
Works the exact same way without shared libraries, just at the source level instead of the binary level. "Finding every application" is simple. The problem is the compile time and added network load. Both are solvable issues, via fast compiling and binary patching, but the motivation isn't there as shared libraries do an OK job of mitigating them.
> In practice this doesn't work because each application has to be tested against the new version of the library.
Debian security issues thousands patched shared libraries every year without testing them against every program, and without causing failures. They do that by back porting the security fix to the version of the library Debian uses.
I gather you are a developer (as am I), and I'm guessing that scenario didn't occur to you as no developer would do it. But without it Debian possibly wouldn't be sustainable. There are 21,000 packages linked against libc in Debian. Recompiling all of them when a security problem happens in libc may well be beyond the resources Debian has available.
In fact while it's true backward compatibility can't be trusted for some libraries, it can for many. That's easily verified - Debian ships about 70,000 packages, yet typically Debian ships just one version of each library. Again the stand out example is libc - which is why Debian can fearlessly link 21,000 packages again the same version.
I'm guess most of the people here criticising shared libraries are developers, and it's true shared libraries don't offer application developers much. But they weren't created by application developers or for application developers. They were created by the likes of SUN and Microsoft, would wanted to skip one WIN32.DLL so they could update just that when they found a defect in it. In Microsoft's case recompiling every program that depended on it was literally impossible.
But is this still as important if the executable in question is part of the Linux distribution? In theory, Fedora knows know that Clang depends on LLVM and could automatically rebuild it if there was a change in LLVM.
To me that is an argument that doesn't make any sense, at least on Linux. It could make sense if we talk about Windows or macOS, where you typically install software by downloading it from a website and you have to update it manually.
On Linux it the only thing it should change is that if a vulnerability is discovered let's say in OpenSSL all the software that depends on OpenSSL must be updated, and that could be potentially half the binaries of your system. It's a matter of download size, that in reality doesn't matter that much (and in theory can be optimized with package managers that applies binary patches to packages).
It's the maintainers of the distribution that notes the vulnerability in OpenSSL and decides to rebuild all packages that are statically linked to the vulnerable library. But for the final user the only thing he has to do is still an `apt upgrade` or `pacman -Syu` or whatever, and he would still get all the software patched.
That's on the assumption that all software on Linux comes through the the official repos of the distribution. I would bet that there are almost no systems where this holds entirely true, as I've seen numerous software packages whose installation instructions for Linux are either `curl | sudo bash` or `add this repo, update, install`.
Actually now that I think about it, building by yourself might put you at a disadvantage here, as you'd have to remember to rebuild everything. I'm kinda lazy when it comes to updates so not sure if I like the idea anymore with having to rebuild all the software I built myself lol, but it probably could be solved by also shipping .so versions of libraries.
Automatic updates are themselves a security risk, which is something that I rarely hear talked about. For example, the FBI's 2015/2016 dispute with Apple about unlocking phones. The FBI's position relied on the fact that Apple was technically capable of making a modified binary, then pushing it to the phones through automatic updates. If Apple were not capable of doing so (e.g. if updates needed to be approved by a logged-in user), then that vector of the FBI's attack wouldn't be possible.
I don't have the best solution for it, but the current trend I see on Hacker News of supporting automatic updates everywhere, sometimes without even giving users an opt-out let alone an opt-in, is rather alarming.
I don't argue for automatic updates. It's pretty much whatever we already have, but instead of updating a single library, you'd have to update every package that depends on that library.
I'm just throwing ideas around so you should definitely take what I'm saying with a grain of salt. It just would be interesting to see a distro like that and see what the downsides of this solution are. Chances are that there probably already is something like this and I'm just not aware of it and I'm reinventing the wheel.
Ah, got it. Sorry, I misinterpreted "update everything regularly" to imply developers forcing automatic updates on every user.
I'm in the same boat, as somebody who isn't in the security field. I try to keep up with it, but will occasionally comment on things that I don't understand.
sure, some packages in some of the popular distros are indeed like that. If the package is important enough (like firefox) and the required dependencies are a bit out of step with what's currently used by the rest of the distribution you will sometimes see that for at least some of the dependencies.
Most distros dislike doing so, and scale it back as soon as it becomes technically possible.
But they dislike just packages requiring different versions of libraries, right? My point is to do literally everything as it is right now, but simply link it statically. You can still have the same version of a library across all packages, you just build a static library instead of a dynamic one.
This is odd to me, because surely they have to maintain that list anyway so they know which dependent packages need to be tested before releasing bugfixes?
Or is that step just not happening?
I just feel like, one of the big points of a package manager is that I can look up "what is program X's dependencies, and what packages rely on it?"
Another issue with security patches is that specific CVEs are not of equal severity for all applications. This likewise changes the cost/benefit ratio.
You also get bugs introduced into a program by changes to shared libraries. I've even seen a vulnerability introduced when glibc was upgraded (glibc changed the direction of memory, and the program was using memcpy on overlapping memory).
The memcpy API says that is Undefined Behavior, that program was never valid. Not much different from bitbanging specific virtual addresses and expecting they never change.
C language makes a difference between Undefined Behavior and Implementation Defined Behavior. In this case it's the former (n1570 section 7.24.2.1).
Any code that invokes UB is not a valid C program, regardless of implemention.
More practically, ignoring the bug that did happen, libc also has multiple implementations of these functions and picks one based on HW it is running on. So even a statically linked glibc could behave differently on different HW. Always read the docs, this is well defined in the standard.
The source code may have had UB, but the compiled program could nevertheless have been bug-free.
> Any code that invokes UB is not a valid C program, regardless of implemention.
I disagree. UB is defined (3.4.3) merely as behavior the standard imposes no requirements upon. The definition does not preclude an implementation from having reasonable behavior for situations the standard considers undefined.
This nuance is very important for the topic at had because many programs are written for specific implementations, not specs, and doing this is completely reasonable.
> You also get bugs introduced into a program by changes to shared libraries. I've even seen a vulnerability introduced when glibc was upgraded (glibc changed the direction of memory, and the program was using memcpy on overlapping memory).
Did glibc change the behavior of memcpy (the ABI symbol) or memcpy (the C API symbol) which currently maps to the memcpy@GLIBC_2.14 ABI symbol?
Central patching had greater utility before operating systems had centralized build/packaging systems connected via the internet. When you had to manually get a patch and install it via a tape archive or floppy drive, or manually copy files on a lan, only having to update one package to patch a libc issue was a big help. Today it would require a single command either way, and so doesn't really have any significant benefit.
The theory is as follows: If a security flaw (or other serious bug) is found, then you can fix it centrally if you use shared libraries, as opposed to finding every application that uses the library and updating each separately.
In practice this doesn't work because each application has to be tested against the new version of the library.