Hacker News new | past | comments | ask | show | jobs | submit login

The key thing I got a bit wrong was high-traffic mostly static "content", vs high-traffic "very custom" which is a) what I was used to at envato and b) what I thought we had on our hands at goodfilms.

I think in a growing site with a lot of custom per-user content (like a social network) the extra complexity of a cache layer and managing expiry is more pain than it's worth while iterating the product quickly. If you're mostly a content site, it's definitely the #1 thing you should be doing.

Realising that we're both at the same time, depending on the user or page, means that sometimes the cache stuff is the right thing to do, and sometimes not. I was leaning too far towards not, and happy now with the balance we've picked.




Absolutely. At theconversation.edu.au we run a content site—we publish the news, which is the same for everyone. This means we can cache the front page and all the articles as static HTML, and then annotate the page with user info for signed-in commenters, editorial controls for signed-in editors, and so on.

(We have a separate cookie that is present for signed-in users, so the fronted knows whether it should fire the annotation request.)

The result is that we can serve a sudden influx of unauthenticated users (e.g. from Google News or StumbleUpon) from nginx alone, which gives us massive scale from very little hardware. It's likely that the network is actually the bottleneck in this case, and not nginx.


Interested in what you mean by annotating the page after caching it, do you have any more info on this?


The cached page contains content suitable for everyone, so it looks the the user is logged out.

An extra AJAX request grabs the users logged in status, CSRF token and similar data as JSON and then modifies the page so the user sees what they expect (a logout button, a comment form, etc).


Doesn't that cause content movement?


Essentially talking about edge side includes [http://en.wikipedia.org/wiki/Edge_Side_Includes].



If you're running Ruby on Rails record cache will do it all for you (I've yet to run into any issues, the one I thought was an issue was actually a pgpool-II bug).

https://github.com/orslumen/record-cache

If you're on Django I think cache-machine does the same thing. There are some things you won't get cached this way that you could manually (functions and procedures), but I think they're both conservative enough that you won't return stale resources.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: