Given the long history of request parsing vulnerabilities in HTTP/1.1 servers and proxies, is HTTP/2 actually worse, or have most of the HTTP/1.1 bugs just been fixed already?
These vulnerabilities are all from badly-written HTTP/2 → HTTP/1.1 translations. Most of them come from simple carelessness, rookie errors that should never have been made, dumping untrusted bytes from an HTTP/2 value into the HTTP/1.1 byte stream. This is security 101, straightforward injection attacks with absolutely nothing HTTP-specific in it.
Some of them are a little more complex, requiring actual HTTP/2 and HTTP/1.1 knowledge (largely meaning HTTP/2 framing and the content-length and transfer-encoding headers), but not most of them.
Is HTTP/2 actually worse? Not in the slightest; HTTP/1.1 is the problem here. This is growing pains from compatibility measures as part of removing the problems of an unstructured text protocol. If you have a pure-HTTP/2 system and don’t ever do the downgrade, you’re in a better position.
I'd agree that HTTP/1 deserves a significant portion of the blame.
On the other hand, one maxim I've learned from my time bug hunting is that nobody ever validates strings in binary protocols. As such, I'm utterly unsurprised there are so many implementations with these kinds of bugs, and I'd say they could have been predicted in advance.
In fact… let's see… yep, they were predicted. Some of them, at least. In the HTTP/2 RFC, under Security Considerations, 10.3 'Intermediary Encapsulation Attacks' describes one of the attack classes from the blog post, the one involving stuffing newlines into header names.
Does that mean something could have been done about it? Perhaps not. The ideal solution would be to somehow design the HTTP/2 protocol itself to be resistant to misimplementation, but that seems pretty much impossible. The spec already bans colons and newlines in header names, but there's no way to be sure implementations won't allow them anyway, short of actually making them a delimiter like HTTP/1 did – in other words, reverting to a text-based protocol. But a text-based protocol would come with its own share of misimplementation risks, the same ones that HTTP/1 has.
On the other hand, perhaps the bug classes could have been mitigated if someone designed test cases to trigger them, and either included them in conformance tests (apparently there was an official HTTP/2 test suite [2] though it doesn't seem to have been very popular), or set up some kind of bot to try them on the entire web. In principle you could blame the authors of HTTP/2 collectively for the fact that nobody did this. But I admit that's pretty handwavey.
> On the other hand, one maxim I've learned from my time bug hunting is that nobody ever validates strings in binary protocols.
I wonder how much this has to do with the way strings need to be handled in the programming languages these protocols are implemented in. If dealing with strings is something that seems to be even more of a danger (if done incorrectly) you might just not do it.
It's tough to say that something is a "rookie error" when basically every serious professional team makes the same mistake. This broke apparently broke every AWS ALB, for instance.
I am genuinely astonished at the number of implementations and major players that are experiencing problems here. I’ve done plenty of HTTP/1 parsing (most significantly in Rust circa 2014) and some HTTP/2 parsing in its earlier draft days, and I can confidently and earnestly state that my code (then and now) would never under any circumstances be vulnerable to the ones I’m calling rookie errors, because I’m always going to validate the user input properly, including doing any subsequent validation necessary in the translation layer due to incompatibilities between the versions, because I know it’ll blow up on me if I don’t do these things. Especially when all of this stuff has already been pointed out in the HTTP/2 RFC’s Security Considerations section, which such sections you’re a fool to ignore when implementing an IETF protocol. The attacks that depend on content-length and transfer-encoding I’m quite not so confident about, though I believe that any of my code that I wrote then or that I would write now will be safe.
It’s quite possible that my attitude to these sorts of things has been warped by using Rust, which both encourages proper validation and makes it easier and more natural than it tends to be in languages like C or C++. I’d be curious to see figures of these sorts of vulnerabilities in varying languages—I strongly suspect that they occur vastly less in Rust code than in C or C++ code, even when they’re not directly anything to do with memory safety.
No, that doesn't make sense. The errors that trip seasoned pros up are very likely to trip rookies up as well. Words mean things; rookie mistakes the mistakes that don't trip up the pros.
I would bet that a lot of these are not rookie errors, they are more akin to Spectre or Meltdown: inherently unsafe code that was considered a valuable risk for performance.
In general, when writing a high performance middle box, you want to touch the data as little as possible: ideally, the CPU wouldn't even see most of the bytes in the message, they would just be DMA'd from the external NIC to the internal NIC. This is probably not doable for HTTP2->HTTP1, but the general principle applies. In high-performance code, you don't want to go matching strings any more than you think is strictly necessary (e.g. matching the host or path to know where to actually send the packet).
Which is not to say that it wasn't a mistake to assume you can get away with this trade-off. But it's not a rookie error.
No, as I said most of these are absolutely trivial injection attacks from not validating untrusted inputs, being used to trigger a class of vulnerability that has been well-documented since at least 2005.
My point is that the code is doing the most performant thing: sending the values from A to B with as little bit twiddling as possible. They almost certainly failed to even consider that there are different restrictions between the 2 protocols that could pose security issues.
Is an new bucket leaking in a dozen places worse than an old one with all leaks fixed? I would say yes until those holes in the new one are also fixed.
When I implemented an HTTP2 server several years ago it was all of the "fun" of HTTP 1.1 parsing and semantics plus the extra challenges of the HTTP2 optimizations such as HPACK, mapping the abbreviated headers to cache in-memory representations, stream management, and if you supported Push Promises then that too.