Hacker News new | past | comments | ask | show | jobs | submit login

Indeed. It boils down to their need for a durable cache. It's simply too expensive to try to cache every comment tree in RAM, and Cassandra's data model and disk storage layout is a really good fit for the structure of their data.



You don't need every comment tree in RAM just the last few days worth plus a few older threads that get linked back to. They are currently using 200 machines so let's say 10 of them are used to cache 1 weeks comments. 30 GB of ram * 10 machines = 300GB of cache. I would be vary surprised if they generate 200GB/week or 10TB of comment data a year.

Edit: For comparison Slashdot spent a long time on just 6 less powerful machines vs the 200+ Reddit is using. Reddit may have more traffic, but not 40x as much. And, last I heard HN just uses one machine.

PS: The average comment is small and they can compress most comments after a day or so. They can probably get away with storing a second copy of most old threads as a blob of data in case people actually open it which cost a little space, but cuts down on processing time.


Please. Reddit does 2 billion page views per month.


Yea, and Casandra was built for a company serving 500 times that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: