Reverse proxies won't work for HN, because requests for the same resource from multiple users can't use the same results. Not only are certain bits of info customized for the user (like your name/link at the top), but even things like the comments and links are custom per user.
Things like users' showdead value, as well as whether the user is deaded, can drastically change the output of each page. Eg, comments by a deaded user won't show as dead to that user, but they will for everyone else...
There's cookie-based caching in Varnish (and in some other proxy caches too). Essentially, the key is going to be made of the usual hash + the cookie like this:
sub vcl_hash {
set req.hash += req.http.cookie;
}
What this means is that the cache is per-logged-in-user and pretty much personalized. The server's going to need a lot more RAM than usual. You can set a low TTL on the cache entries so they're flushed and not kept in memory indefinitely. But the performance boost is great.
This is not recommended as an always-on measure. We wrote an entry about accomplishing something similar w/ python&varnish. Here it is if you're interesting in reading about it: http://blog.unixy.net/2010/11/3-state-throttle-web-server/
Of course it will work. The whole point of reverse-proxy is to buffer slow requests and send them fast over LAN to your back-end servers that cannot handle high concurrency efficiently.
The FreeBSD's accept_filter() used by Rtm does more or less that (you can think of it as of reverse-proxy in the kernel), but it only works for plain HTTP and HEAD/GET methods.
Except they can't, for the reasons I mentioned above. Eg, if my account is deaded, when I view a thread with one of my own comments, it looks different than if someone else was viewing that some thread, especially for those of us with or without showdead checked in our profiles.
Its not as straightforward as you would like it to be.
Special cookies could be set for dead users and users who enable showdead to bypass the cache.
For example, one of the sites I run has about 50K pageviews/day by logged in users, and another 600K pageviews/day by anonymous users coming from referrals or search engines. Logged in users have similar customization options so we bypass cache for these users by detecting a cookie.
Obviously going the cache route would require some changes to how things are setup, its not a turn-key solution. But the insignificant amount of changes are well worth it for most content sites, but for a user generated content site like HN it would also depend on how the TTLs and cache purging are setup.
The majority of requests probably come from live accounts in good standing or from people not even logged in, so the majority of requests could still be cached.
Things like users' showdead value, as well as whether the user is deaded, can drastically change the output of each page. Eg, comments by a deaded user won't show as dead to that user, but they will for everyone else...