Hacker News new | past | comments | ask | show | jobs | submit login
Excellent succinct breakdown of the xz mess, from an OpenBSD developer (marc.info)
107 points by signa11 7 months ago | hide | past | favorite | 50 comments



> Liblzma ends up dynamically linked to sshd because of a systemd-related extension added by many Linux packagers that pulls in liblzma as an unrelated dependency.

Lots of dependencies -> bad. Reducing dependencies will sometimes necessitate open-coding things, which too can be bad.

But what's interesting here is that if you're trying to find a good place to insert a backdoor, all you need to do is analyze the dependency graphs of targets of interest looking for ones which might be vulnerable to compromise via social engineering or other vectors. We should do our own such analysis looking for places that need to be watched better or removed.



> Liblzma ends up dynamically linked to sshd because of a systemd-related extension added by many Linux packagers that pulls in liblzma as an unrelated dependency

kinda wish this was unpacked a bit more, why exactly is a service executable dynamically linking to a library without using any of its symbols or functions, because of systemd.

and a follow up if some openbsd folks can comment. over the years i've read about various unique security capabilities in openbsd, it seems natural to ask, what kernel or OS capabilities does openbsd provide to thwart the stage 2 efforts for this class of injection techniques?


> kinda wish this was unpacked a bit more, why exactly is a service executable dynamically linking to a library without using any of its symbols or functions, because of systemd.

If I recall correctly based on what I've read about this — I believe from the original mailing list post that noticed the vulnerability — it's because under certain circumstances in order to enable certain functionality you might want sshd to be able to talk to systemd, so distros often patch sshd with code to do that. But obviously you need a library to implement actually speaking system's protocol, and as it happens, the easiest way to do that is to include the entirety of libsystemd, since it has functions for doing that, even though there are at least two other libraries that implement just the communication functionality and are actually designed for non-systemd programs to use. The problem with that idea being that libsystemd, being a whole standard library for all systemd-related functionality that is probably mostly designed for use by programs in the tightly integrated systemd family and as a reference imenentation, also includes a lot of other code, including code that has to deal with compression that depends on liblzma, even though all of that is never used by sshd, because it only uses the small subsection of the library it needs.


> But obviously you need a library to implement actually speaking system's protocol

That is overstatement. The docs have basic self-contained example how to implement the notification without libraries, its 50 lines, majority of which is just error handling:

https://www.freedesktop.org/software/systemd/man/devel/sd_no...


The example was only recently added (like 1-2 days ago). Before that it was only really explained and said it was stable (and guarnteed as stable API).


Oh, cool! I was just trying to give the maximum benefit of the doubt, but this is good to know. The systemd hate never seems as justified as it'd like to be...


> even though there are at least two other libraries that implement just the communication functionality and are actually designed for non-systemd programs to use.

The important question is: if one of those libraries is used, and then something else pulls in `libsystemd`, will they conflict?


Why would they? There isn't any magic in libsystemd, its just a normal c lib


A lot of systemd-replacement shims try to be transparent, which means exporting the same symbols as the real systemd libs and thus causing weird conflicts if you link to the real systemd libs too.


AFAICT libsystemd is the "blessed" way of a daemon interacting with systemd and its notification protocol. libsystemd is something of a kitchen sink library now, it wasn't always so: there used to be separate libsystemd-daemon that had the functions a daemon might use (and not the ones it won't, like compression wrappers for functions in external libraries). https://lwn.net/Articles/587373/

A lot of things (e.g. OpenSSH, Apache) don't use it natively, you won't tend to see it in the official sources of non-Linux specific software. Package maintainers who choose to foist^H^H^H^H^Huse systemd add it to those packages. It's quite fun working out why Apache hangs when you port a config and don't realise it needs mod_systemd loaded to stop systemd stamping on it.

There's an extra security wrinkle too: lazy symbol resolution is not the done thing now, it's preferred that symbols are resolved at loading time so important chunks of memory can be protected from other types of attacks (more clarity in one of the preceding articles: https://research.swtch.com/xz-script ).


Libsystemd is a grabbag utility library, and completely optional:

> The libsystemd library provides functions that allow interacting with various interfaces provided by the systemd(1) service manager, as well as various other functions and constants useful for implementing services in general.

https://www.freedesktop.org/software/systemd/man/latest/libs...

E.g. the service readiness protocol is pretty trivial to implement yourself if you don't want to pull libsystemd as dep.


The problem was that SSH linked to libsystemd. Libsystemd linked to xz. So the library got pulled in transitvly. Here are two recent commits that fix the problem by reducing dependencies:

https://salsa.debian.org/ssh-team/openssh/-/commit/cc5f37cb8...

https://github.com/systemd/systemd/pull/31550


There's very little you can do to limit the scope of a backdoor in sshd. If sshd can't do anything, you wouldn't be able to do anything after logging in either, which wouldn't be very useful.


Systemd provides via libsystemd a call sd_notify() that tells systemd the daemon is not only started but ready to accept connections. libsystemd, being a kitchen-sink like everything else systemd, has a bunch of unrelated functionality including one that pulled liblzma as a dependency.

The backdoored liblzma relies on a misfeature of glibc called ifuncs, where a library can override a function by calling a special init function in the library. This is so for instance if you have a version of a function optimized for AVX512, one ofor AVX2 and one not optimized for those at all, the init function would check which features the CPU supports and picks the best one. Seemingly ifuncs doesn't check the function being overriden is in the same library, and it was replacing OpenSSH's RSA auth function.

OpenSSH added a clean-room libsystemd-free implementation of sd_notify() after this fiasco, in the hope Linux distros will stop linking against libsystemd, this will appear in OpenSSH 9.8:

https://bugzilla.mindrot.org/show_bug.cgi?id=2641


sshd is started by systemd.

systemd has several ways of starting programs and waiting until they're "ready" before starting other programs that depend on them: Type=oneshot, simple, exec, forking, dbus, notify, ...

A while back, several distro maintainers found problems with using Type=exec (?) and chose Type=notify instead. When sshd is ready, it notifies systemd. How do you notify systemd? You send a datagram to systemd's unix domain socket. That's about 10 lines of C code. But to make life even simpler, systemd's developers also provided the one-line sd_notify() call, which is in libsystemd.so. This library is so other programmers can easily integrate with systemd.

So the distro maintainers patched sshd to use the sd_notify() function from libsystemd.so

What else is in libsystemd.so? That's right, systemd also does logging. All the logging functions are in there, so user programs can do logging the systemd way. You can even _read_ logs, using the functions in libsystemd.so. For example, sd_journal_open_files().

By the way... systemd supports the environment variable SYSTEMD_JOURNAL_COMPRESS which can be LZ4, XZ or ZSTD, to allow systemd log files to be compressed.

So, if you're a client program, that needs to read systemd logs, you'll call sd_journal_open_files() in libsystemd.so, which may then need liblz4, liblzma or libzstd functions.

These compression libraries could be dynamically loaded, should sd_journal_open_files() need them - which is what https://github.com/systemd/systemd/pull/31550 submitted on the 29th February this year did. But clearly that's not in common use. No, right now, most libsystemd.so libraries have headers saying "you'll need to load liblz4.so, liblzma.so and libzstd before you can load me!", so liblzma.so gets loaded for the logging functions that sshd doesn't use, so the distro maintainers of sshd can add 1 line instead of 10 to notify systemd that sshd is ready.


This seems like a clear case of premature optimization. During the three decades sshd has existed I have never seen a real world situation where the equivalent of Type=exec was not enough.

The time window where sshd is started and not yet ready to receive connections is short, and clients will have a connection timeout orders of magnitude larger. The notify functionality is more relevant for things like Java middleware processes and clients that lack the functionality to poll and wait. Under most situations none of this is relevant for sshd. These patches solve a problem very few people have.

If you really have this problem, the systemd readiness is far from enough to solve the problem. The readiness is sent too early and there could still be permission problems that would cause sshd to be ready but the connection to fail. Even more relevant is local firewall rules that are completely out of scope for a readiness check!

Polling for readiness is the only robust way.


> The readiness is sent too early

What makes you say that? The readiness notification is sent after the sshd has opened the listen socket, it literally is accepting connections at that point.


Accepting connections, yes, but for a client that is dependent on being able to establish ssh connections that is not enough. They would want to be notified on the ability to make successful connections.

Keep in mind that this is only a problem in special situations, almost no regular Linux servers carry any services that care about being able to establish ssh connections, that cannot reconnect and use appropriate socket options. For other services than ssh, such as databases, this is much more common. And for those, it is not enough that the server has opened a listening socket.

When building distributed systems, this is something you need to think about. Not so such with local systems. But the same principles apply. And the only robust way is to poll for readiness. Signalling readiness is both complex, when the dependency chains are non-trivial, and prone to error, when readiness signals arrive out of order or are dropped or fail for some reason. This could be because of operations failure but make for hard to debug cases. Dependency chains that mysteriously stop because of permission problems with out of band traffic is both classic and unnecessary problem.

All of this complexity go away when polling. This is why you should adopt this design in the somewhat unlikely case you have clients that depend on being able to make ssh connections.


There really is an exception to every rule. "Do not attribute to malice that which can be adequately explained by neglect, ignorance or incompetence" and then you come across:

> The stage 0 shell snippet looks at first glance like a plausible part of > the poorly readable autoconf/automake tooling.


Also, given the context its clearly malice.

If there was one commit with a stray period that disabled the sandbox, sure, might be a typo.

But the other stuff where they're unpacking specific byte ranges from multiple places in the test files, that's not an accident.

Viewed as a whole, it would be extremely hard to view this as "adequately explained by neglect, ignorance or incompetence"


None of those adequately explain these commits. This is no exception to Hanlon's razor.


Hanlon's razor is useful to curb one's paranoia, but it is far from being a universal rule.

In fact, malice and incompetence are not necessarily mutually exclusive.

This very incident shows several instances where "Jia Tan" is being arguably incompetent, in addition to being clearly malicious: unintended breakage by adding extra space between "return" and "is_arch_extension_supported"; several redundant checks for `uname` == "Linux"; botched payload, so "test files" had to be replaced, with pretty fishy explanation; rather inefficient/slow GOT parsing, list goes on...


Hanlon's razor lets every single first degree murderer off the hook, or any other of the many crimes where you have to establish intent to convict.


It is a rule-of-thumb heuristic, not a law.

Even if this weren't the case, your claim would remain specious. "Establish intent" and "justify attribution to malice" are the same thing said two ways.


> It does not plausibly pass for a typo because no typical editing glitch will leave a '.' character there.

I would certainly attribute that to a typo if I was reviewing the code.


The point is that under casual review & in isolation, it's not hard to accept as a typo, but under closer review and knowing what we know now, it's almost certainly part of the attack.


We only know it's malicious in hindsight. If I was Lasse (original xz maintainer) I could have easily thought it was an innocent typo and that's the point. If it was more convoluted like zero-width Unicode character or misspelling of variable names, it'll be less plausible IMO.


The thing is, if anyone would have actually realized it (other than Jai) it would have been hopefully fixed. Like... no one would just see that and say "well, must have been a typo" and just leaves it.


You're missing the years of reputation Jia built so that he'd be given the benefit of the doubt. I don't think anyone would have been too suspicious.


I am not saying people would have said Jia is acting maliciously, just noone saw this to begin with else it would have been fixed in a separate patch (before the emergancy fix recently by Lasse)


XZ is just a really boring project. It seems complete for the most part, so I'm not surprised by the lack of scrutiny. It's one of these projects that I would just assume is well maintained.


> The syntax error is a single period '.' as the first character on an otherwise empty line of C code.

I believe the point is that given this context, it could/should not be construed as a typo.


Very common if you copy paste (using mouse buffer) from midnight commander text editor or any other editor that highlights trailing spaces as dot.


For those using editors with mouse, it's very easy to add a character anywhere. Doesn't matter the place.


If you received this patch from someone you trusted, and noticed the period, would you call it a typo or would you assume it is malicious?

Or is this just an argument over the use of the word "typo" instead of "mistake"? That wouldn't be very useful, IMO.


I take it they're not forcing contributors to lint code, then?


Does your linter look for syntax errors in the content of strings? If so, how would you design a unit test meant to capture syntax errors?


Have you tried running lint on a configure.ac or CMakeLists.txt file?


"." repeats the last command in vim, so I'd consider it quite plausible as a typo, if we didn't know about the rest of the stuff.


ed man! Ed uses a single . line to end input. You actually can't enter a single '.' (because that wound end your input) but "I was using vi and must have dropped into ex mode or something" on the surface sounds reasonable.


> the build-to-host.m4 macro package that ships as part of GNU gettext was replaced by a modified version that was copied into the release tarball

Am I correct in concluding that this backdoor might have been picked up sooner if the practice of shipping tarballs of source releases was dropped in favour of running builds straight off a repo tag?

I've setup and worked with many build pipelines in the .NET world over nearly 20 years and we manage fine without the concept of a source release tarball, but I realise C dev is more complex so I'm probably missing something?


> Am I correct in concluding that this backdoor might have been picked up sooner if the practice of shipping tarballs of source releases was dropped in favour of running builds straight off a repo tag?

I don't think so. The git repo did have everything needed. The release tarball thing was just an added benefit to the attackers, but not critical. IMO.

> I've setup and worked with many build pipelines in the .NET world over nearly 20 years and we manage fine without the concept of a source release tarball, but I realise C dev is more complex so I'm probably missing something?

.Net, but, portable .Net?

The issue isn't the language but what things you need to detect presence of in the target host when building, and if you're doing systems programming you're likely to need to detect many such things, and if you're doing more high level programming (e.g., an HTTP app that uses a DB) then you're not likely to need to detect many such things.

The real issue is the variance from standards (e.g., POSIX) and the fact that the standards (e.g., POSIX) are so far behind because they really just standardize the lowest common denominator.


> The release tarball thing was just an added benefit to the attackers, but not critical. IMO.

What I mean is that Debian and Fedora presumably download the tarball and build from there, rather than cloning a repo and building from the repo.

If they did the latter, it would have been much more difficult to manipulate the build process to inject the backdoor that lived inside the testcase binaries, because that change to the build scripts would be out in the open.

But I could be mistaken...


In .NET specifically it's not an issue because taking on a dependency means you always get the source code in the form of IL assemblies with (usually) symbols (either way these trivially decompile back to readable-ish C#).

The actual final product you get is when publishing (building) the application itself, similar to Rust the way crates participate in the build process.

Nuget packages still can and do ship pre-compiled native dependencies but ideally you want to either provide your own (think sourcing libsodium separately and just getting the bindings from nuget) or instrument the corresponding .NET project to build a native dependency alongside it (very easy with <Exec Command="..." /> properties in csproj).


> I don't think so. The git repo did have everything needed.

build-to-host.m4 is not checked into the repo. Without the modified build-to-host.m4, there is nothing to kick off the exploit.


Kind of funny the developer dismisses Linux as a "Unix-like operating system".


Yes, the concluding paragraph is the most succinct (and fitting) here, there are better explanations as for the technical aspects. I know it's not related, but for the Germans around, this is just great: https://inka.de/angebot/

A community ISP. Last changed 02.05.2001, in case you missed it. ;) Millennium blues. I wasn't aware about the possibilities back then, but it's just before my time.


> much of the emerging narrative about this backdoor that you can read all over the net is based on idle speculation

...which sums up the situation quite nicely.

This was a pretty bad attack on a certain ecosystem. The ecosystem will recover, or not, regardless of your feelings. Unless you're in a position to truly make a difference, just sit back and enjoy the ride...


There is some protection to be had in using non-mainstream platforms (in my case, Alpine Linux, OpenBSD and Illumos, though I do have some Ubuntu or Pop-OS! laptops).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: