Hacker News new | past | comments | ask | show | jobs | submit login

>It's because "how to make a website scale" depends heavily upon which website, what it does, and how big it needs to be. Making a messaging queue like Twitter or G+ scale is very different from making Google Search scale. Hell, making the indexing system of Google search scale is a very different problem from making the serving system scale.

For most websites is not THAT different.

Actually, most have pretty similar needs, and you can sum those up in 3-5 different website architectural styles anyway.

There is far more duplication of work and ad-hoc solutions to the SAME problems than are "heavily different" needs.




What would be those 3-5 different website architectural styles?


News/Magazine/Portal like (read heavy), Game site (evented, concurrent users, game engine computations), Social Platform (read-write heavy), etc.

Most needs are bog standard. If you really look at most successful sites they use might same-ish architectures, only with different components/languages/libs each.

Basically all high volume sites use something like the notions behind the Google App Engine, and the services offered. The various AWS tools are also similar (S3, the table they offer, etc).


I think you're missing a lot of complexity of the considerations that actually go into implementing any of the above. I can think of 3 subsystems within Reddit alone (reading, voting, and messages) that all have different usage patterns and (if they're doing it right) require different approaches to scaling.

Where's e-commerce on your list? The approaches for scaling eBay are completely different than for scaling Reddit or YouTube, because eBay can't afford eventual consistency. You can't rely on caching to show a buyer a page whose price is an hour out-of-date.

Here's something else to think about: why do (the now-defunct) Google real-time search and GMail Chat have completely different architectures, despite both of them having the same basic structure of "a message comes in, and gets displayed in a scrolling window on the screen"? The answer is latency. With real-time search, a latency of 30 seconds is acceptable, since you aren't going to know when the tweet was posted in the first place. With GChat, it has to be immediate, because it's frequently used in the context of someone verbally saying "I'll ping you this link" and it's kinda embarrassing if the link doesn't arrive for 30 seconds. Real-time search also has to run much more computationally-intensive algorithms to determine relevance & ranking than GChat does.

I've personally worked on Google Search, Google+, and Google Fiber. I can tell you that they do not all use something like the notions behind Google AppEngine. There's no way you could build Google Search on AppEngine, and G+ would be a stretch.


>I can think of 3 subsystems within Reddit alone (reading, voting, and messages) that all have different usage patterns and (if they're doing it right) require different approaches to scaling.

Yes, and those would be the same in all social bookmarking type sites, and similar (voting, etc) components of a social site a la Facebook, G+ etc.

>The answer is latency. With real-time search, a latency of 30 seconds is acceptable, since you aren't going to know when the tweet was posted in the first place. With GChat, it has to be immediate, because it's frequently used in the context of someone verbally saying "I'll ping you this link" and it's kinda embarrassing if the link doesn't arrive for 30 seconds.

Yes, so that's one use case for one architecture (low latency message queue), and the other is another. I gave the example of a Game site that also has similar latency concerns.

>Where's e-commerce on your list? The approaches for scaling eBay are completely different than for scaling Reddit or YouTube, because eBay can't afford eventual consistency.

My list wasn't supposed to be exhaustive -- I spoke of 4-5 common styles and only mention 2. That said, the approaches behind eBay might not resemble Reddit or YouTube, but they will resemble others like Amazon, Etsy, etc.

>I've personally worked on Google Search, Google+, and Google Fiber. I can tell you that they do not all use something like the notions behind Google AppEngine.

I don't think we mean the same things. For one, nobody makes search engines, or a google competitor. So what it takes to do Google Search is a moot point, when discussing common architectural patterns behind big sites.

I meant high level stuff on on hand like demoralised data, map reduce, workers, share nothing, sharding and such, and common infrastructure on the other hand, like the relational db, memcached, big-table like datastore, abstract filesystem (S3, BlobStore, GridFS, etc), ElasticSearch, Hadoop, Node, Redis, message queues, etc.

Google Search or Facebook might have needs way beyond those, but the above are shared by 99% of big sites out there.

A common system to build on top of them that is higher level than Heroku (and more expansive and accommodating than GAE) should exist, and it would cater more than 80% of big website needs. Of course each will need some custom stuff, but not 80% custom stuff.


Several typos, sorry, here's a particularly bad case:

"I meant high level stuff on _one_ hand like _denormalized_ data."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: