Enlightening article, but I did get a little hung up on this statement.
> I hate it when people say that H2 and H3 allow you to send multiple resources in parallel, as it’s not really parallel at all! H1 is much more parallel than H2/3, as there you do have 6 independent connections. At best, the H2/3 data is interleaved or multiplexed on the wire
More TCP connections doesn’t mean that stuff is happening in parallel, after all, it’s all going down the same wire. You’re just letting a different layer to the multiplexing.
So saying
> H1 is much more parallel than H2/3
Is just wrong. It’s no more or less parallel than H2/3, if we’re being pedantic. It’s just that H1 leaves the interleaving to your kernel, as it chooses which TCP packet to send next. Where as H2/3 implement their own interleaving within a single TCP connection, allowing the application to use higher level context to decide how to multiplex the resources together.
Also with H2/3 there’s the additional benefit of not having to pay the TCP slow start cost for every “parallel” resource you request. And the ability to multiplex and unlimited number of resources streams, without either the client or server kernel complaining about socket/file descriptor exhaustion.
The exact same applies to UDP. Your connection is (usually) a single full duplex stream of Ethernet packets, sending and receiving packets one at a time, providing concurrency across the many open connections through time sharing, not parallelism.
The crucial thing QUIC being UDP enabled them to do (it would also have worked if you could just make a new transport protocol on IP, but you can't because that extension point is rusted shut on the public Internet) was avoid TCP head-of-line blocking.
Suppose for some crazy reason, the fifth of ten packets with the JPEG of the cute puppy inside it gets lost on the network. It's sent, but it's never received.
With HTTP/1 the TCP session that was being used to send the Puppy JPEG is stalled until the loss is detected and the packet is resent, which will be about one full round trip latency and maybe more. But, it's usual to have several parallel connections, the one we're using to fetch a JPEG of a kitten continues as normal.
With HTTP/2 the whole HTTP session is stalled, not just the Puppy, a JPEG of a kitten you asked is also stalled, as is the Font we were going to use to render placeholder text.
With HTTP/3 we're back to just the Puppy JPEG is stalled, the kitten, font etc. are unaffected even though they're in the same HTTP session.
> It’s just that H1 leaves the interleaving to your kernel,
You've made assumption that bandwidth is the limit. That's pretty much false assumption on most residental and many mobile connections.
The concurrency hitting the target server does matter as then site can process them in parallel (either using multiple cores on same machine or multiple servers, depending on site size).
So it isn't "kernel choosing which packet out of bunch H1 connections to send", it near always have enough bandwidth to have all of them on the wire, it's "just" slow start making it slower
Don't the bits come off the wire one at a time at the server as well? Any ability to read() from multiple sockets coming over the same interface is enabled by the kernel reading the data serially and placing it in buffers.
> So it isn't "kernel choosing which packet out of bunch H1 connections to send", it near always have enough bandwidth to have all of them on the wire, it's "just" slow start making it slower
I mean, it is. Ethernet is a serial bus, you can only send one packet/Ethernet frame at a time. Even if the server has some seriously fancy hardware, that capable to doing parallel DMA reads from main memory, directly onto multiple parallel network buses, it’s basically guaranteed that somewhere been you and the server all of data is gonna be read and copied one byte at a time (just incredibly fast).
As a general rule, computers don’t do anything in parallel. Because the complexity of coordinations and synchronisation is so high, and so expensive, it’s just isn’t faster than doing stuff in a purely serial manner.
Now obviously there are exceptions to the above rules, multi-core CPU, dedicated network devices etc. But even then, only the computation is done in parallel. Actually shuffling the data in and out of the machine is almost certainly going to be done over a purely serial bus. But unless you’re moving gigabits a second or more (such as between a CPU and RAM, or between a CPU and GPU, or between network queues in a switch), doing it parallel isn’t worth the extra complexity.
Six connections with H1 means waiting for the server to process only up to six responses at a time. You also can't send any additional requests until one of those six ongoing requests has finished. So at the very least, you're paying a full round trip cost per request where no useful work is happening.
Let's say the server makes a 250ms request to an auth service on each request. For every six requests, you're sitting and waiting a quarter second for the server to sit and wait. With H2, you essentially pay that 250ms cost once, because ~every request happens concurrently. With H1, you pay it once per request on each connection.
It's also the case that many servers stream their responses. If I make 12 requests and the storage that the server pulls the data from responds more slowly than the server can send it to the client, H2 means that I get twelve half finished requests instead of six half finished requests and six requests that never even got sent. For content that doesn't require the full file to be loaded, that's a win.
H2 also has header compression. Sending HTTP requests on H1 often takes multiple packets, especially if you have more than one cookie. HPACK can basically eliminate much of the overhead of actually sending the request, eliminating multiple TCP round trips. Which means the server is able to start processing more requests in parallel sooner because it's not waiting for the requests to arrive as long.
H3 eliminates many of the problems caused by head of line blocking. If you have packet loss on one of your connections, whatever was being transmitted is now blocked until TCP sorts out the missing packet. H3 limits the congestion only to the individual request impacted by the lost packet rather than all of the requests being made over that connection.
In any case, it's not "just" TCP slow start that makes H2/H3 better than H1 in regards to concurrency.
> Six connections with H1 means waiting for the server to process only up to six responses at a time. You also can't send any additional requests until one of those six ongoing requests has finished. So at the very least, you're paying a full round trip cost per request where no useful work is happening.
Browsers disable pipelining on H1 now, but it used to be a real feature to avoid this problem.
Well, to be fair, it was disabled because it mostly didn't work correctly in a wide amount of circumstances, because a lot of HTTP/1.1 servers didn't support it correctly. Not because people didn't want better performance. It was bolted on and HTTP/1.x lets you get away with an incredible amount of stuff[1] while continuing to "work". It is also another thing that introduces head-of-line blocking; and that one isn't fixable even if the servers were.
Moving multiplexing directly into the "data path" of the protocol was the correct move in the case of HTTP/2 and later, because you can't get this part wrong and still have things work; multiplexing is simply a core feature of the system. This is true of many other things as well.
[1] Stuff like this is guaranteed to happen IMO because HTTP/1.1 is a perfect example of "deceptively complex." Everyone says that HTTP/2 got complicated, but HTTP/1.1 was already incredibly complicated to get working reliably and efficiently. The difference is you can't write an "HTTP/2 server" in under 200 lines of code that ignores 99.9% of the whole specification, while you could get away with this in HTTP/1.1. The /2 spec was, in this sense, largely an admission of the complexity of the problem, which protocol developers were already acutely aware of.
Nothing. It would be a property of the webserver whether multiple requests can be processed in parallel. And a HTTP/3 server which can't would most likely be broken. It it would not process other QUIC packets while handling one request, it could not even process ACKs and keepalive packets, and thereby would just timeout the connection.
TIL about fetchpriority="high". Seems like it has a good chance of being abused. I can imagine tutorials and stackoverflow posts telling you to routinely add it.
Priority can’t be self-declared unless there’s an economic cost attached to it. FedEx envelopes cost a lot more than regular mail. People who classify an email as high priority aren’t gaining much of an advantage as recipients are already jaded by people who mark every single outgoing email as high priority. Emergency vehicles can only make up a tiny fraction of total road traffic. Microsoft Windows and ISPs reset packet DSCP bits to 0, otherwise every program would request EF handling.
All of this complexity (with "browsers also differ quite a bit in the importance they assign to different types of resources and loading methods," etc.) suggests one take-home message. It is very expensive to architect your web services with the optimal combination of performance across browsers. Unless you have a huge budget, your most practical option is to be performant only for the browser with the largest market share. (Or hope that you can outsource to someone who understands all of this and yet remains affordable.)
I don't understand why HTTP/3 is better than HTTP/1. Are there real benchmarks that show that it's better? It seems like an awful lot of reimplementation and potential security flaws just so that multiple TCP connections don't have to be made. And all over UDP which is blocked on a lot of firewalls.
> I hate it when people say that H2 and H3 allow you to send multiple resources in parallel, as it’s not really parallel at all! H1 is much more parallel than H2/3, as there you do have 6 independent connections. At best, the H2/3 data is interleaved or multiplexed on the wire
More TCP connections doesn’t mean that stuff is happening in parallel, after all, it’s all going down the same wire. You’re just letting a different layer to the multiplexing.
So saying
> H1 is much more parallel than H2/3
Is just wrong. It’s no more or less parallel than H2/3, if we’re being pedantic. It’s just that H1 leaves the interleaving to your kernel, as it chooses which TCP packet to send next. Where as H2/3 implement their own interleaving within a single TCP connection, allowing the application to use higher level context to decide how to multiplex the resources together.
Also with H2/3 there’s the additional benefit of not having to pay the TCP slow start cost for every “parallel” resource you request. And the ability to multiplex and unlimited number of resources streams, without either the client or server kernel complaining about socket/file descriptor exhaustion.