Hacker News new | past | comments | ask | show | jobs | submit login

It adds some complexity, but consider that we know very well how to scale this type of service: E-mail + reflectors (mailing lists), and we know very well how to do parallel mass delivery for the small proportion of accounts with huge numbers of followers.

Scaling this is easily done with decomposition and sharding coupled with a suitable key->value mapping of external id to current shard. I first sharded e-mail delivery and storage for millions of users 23 years ago. It was neither hard nor novel to do then, with hardware slower than my current laptop handling hundreds of thousands of users each.




Those models are predicated on every user having an ‘inbox’

Do you believe that every Twitter user has an inbox stored on disk somewhere that just contains every tweet posted by someone they follow?


I have no idea if that is how Twitter ended up doing it. But building it that way is vastly easier to scale than trying to do some variation over joining the timelines of everyone you follow "live" on retrieval, because in models like this the volume of reads tends to massively dominate.

You also don't need to store every tweet, you need to store the id's of the tweets (a KV store of the tweet id to full tweet is also easy to shard), and since they're reasonably chronological the id's can be compressed fairly efficiently (quite a few leading digits of tweet id's are chronological).

You also have straightforward options for "hybrid" solutions, such as e.g. dealing with extreme outliers. Have someone followed by more than X% of total userbase? Cache the most recent N tweets from those accounts on that small set of timelinesyour frontends, and do joins over those few with users who follow them.

Most importantly, it's an extensively well tested pattern in a multitude of systems with follower/following graphs whenever consumers/reads dominate over a period of decades at this point, so behaviours and failure modes are well understood with straightforward, well tested solutions for most challenges you'll run into, which matters in the context of whether it'd be possible to build with a small team.

Put another way: I know from first hand experience you can scale this to millions of users per server on modern hardware, so the number of shards you'd need to be able to manage to deal with Twitter-level volume is lower than the number of servers I've had ops teams manage (you'd need more servers total, because your read load means you'd want extensive caching, as well as storage systems for e.g. images and the like - there's lots of other complexity, but scaling the core timeline functionality is not a complex problem)


It seems likely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: