The article talks about several concepts as if they're new things and/or can't be done just as efficiently outside the kernel. For example...
> Normally load balancers reverse proxy to the worker that is most likely to be ready
This really is a problem especially for apps that have long-running requests. However, both HAProxy and Apache can be configured to load balance to the worker that is guaranteed to be ready. Why they don't do so by default is beyond me. I believe HAProxy's config option is called 'max_conn 1' or something.
Phusion Passenger also load balances like this with its Global Queuing feature. Since version 3.0 that's on by default.
In short: load balancing like this doesn't require the kernel to do the job.
Instead of stacking people up in long queues behind some guy who’s connection fails miserably; Unicorn lets that one guy fail.
I believe that from what I have read and experienced that Reddit uses a similar system. Sometimes if the wireless connection I use is weak and takes too long to send the request after establishing a connection, Reddit will give me a "timeout" message rather than serving the page.
Timing out connections is pretty standard practice. However that's not what the article is referring to. Imagine requests being handled serially by the web server. If two clients are queued up, [A, B] with B being the first in line, and B also happens to be on a flaky wireless connection that takes 2 minutes to complete the request, then A doesn't get its turn until 2 minutes later.
To fight this undesirable effect, you will want to handle all connections simultaneously so that no user has to wait for the other. However many web apps tend to be serial with limited concurrency, because of the way they're written.
One way to solve this problem is by buffering the request and the response at the reverse proxy level, where the reverse proxy can - in principle - handle an unlimited number of connections concurrently, or at least a large number of them concurrently. The reverse proxy doesn't pass the request to the web app until the request has been completely received. Likewise, the reverse proxy receives the complete response from the web app, thereby immediately freeing the web app's limited concurrency resources, and then sends the buffered response to the client no matter how long that can take. Both the reverse proxy receive and send steps can be combined with a timeout; this is what you're referring to in your reply but it's only a part of the total picture.
However all of this is pretty standard practice for many web servers, and has been for a while. Phusion Passenger has done this since version 2.0 (2 years ago). Before Unicorn, traditional setups with Nginx + mongrel_cluster, Nginx + Thin, Apache + mongrel_cluster, Apache + Thin, do exactly the same thing.
To be clear: Unicorn is not responsible for handling slow clients. It's fairly unique in its policy to explicitly not handle slow clients, and to require the user to put it behind a buffering reverse proxy. This makes sense if to people who are intimately familiar with network server architectures, but can be confusing to newbies who just want to setup a server expect it to work; this newbie will be in for a nasty surprise.
For example Thin does not require a buffering reverse proxy, it buffers internally. Mongrel can handle slow clients up to a certain number because it's multithreaded in so far it hasn't hit the app's lock yet.
Before Unicorn, traditional setups with Nginx + mongrel_cluster, Nginx + Thin, Apache + mongrel_cluster, Apache + Thin, do exactly the same thing.
Actually, this isn't entirely correct. Traditional setups with nginx have needed HAProxy in the middle to perform the same balancing that Unicorn provides.
For any non-trivial app, you will need some form of balancing of requests to avoid the 'single request hurts everyone' problem. Unicorn does this in the cleanest way, as the number of places to define the process count is in a single location with Unicorn, and in at least 2 places with HAProxy + <Thin or Mongrel>.
We tried testing Phusion Passenger 2.x for our workload, and found that it often ran more processes than we configured it for, and could take out the entire server by over committing memory. This then needed yet more config in out monitoring solution and became more trouble than it was worth.
For ease of configuration and actually working as advertised, I would recommend Unicorn to anyone running a Ruby web server.
I'm in the process of building out a web services architecture built in Rails and served up by Unicorn. Its working great so far. All of the clients hitting my service are high end servers on a local network, so I don't have to worry about network latency, client buffering, etc.
> Normally load balancers reverse proxy to the worker that is most likely to be ready
This really is a problem especially for apps that have long-running requests. However, both HAProxy and Apache can be configured to load balance to the worker that is guaranteed to be ready. Why they don't do so by default is beyond me. I believe HAProxy's config option is called 'max_conn 1' or something.
Phusion Passenger also load balances like this with its Global Queuing feature. Since version 3.0 that's on by default.
In short: load balancing like this doesn't require the kernel to do the job.