High Scalability - All Time Favorites

3amOpsGuy · on Oct 29, 2012

The POF one tickles me, i love the whole approach.

The genius in it is not spending untold time and money building the latest whizz-bang responsive gui, or never getting launched because the "web scale" back end isn't the worlds most perfect design yet. It's all just hacked together organically, with the simplest, path of least resistance choices taken at every stage.

I love that this succeeds, i love that it's not over ambitious, i love that it costs the owner minimal maintenance time.

Hats off to that man!

bmelton · on Oct 29, 2012

What I found most interesting was in the Update to the POF article where Jeff Atwood compares the cost of scaling up (as POF did) vs. scaling out (as Atwood does) and notes the comparison of 1 heavy DB server vs its equivalent cost of approximately 83 commodity boxes.

The article factors in hardware and software licensing, but completely overlooks the cost of system administration. With the base cost of $100,000 used in the comparison, that's only a minor part of the price point vs. hiring the head count needed to administer 83 commodity servers, which I could reasonably see requiring a few FTEs if they performed their own hardware support.

pron · on Oct 29, 2012

That's a long list. Here are my picks (in no particular order):

* Are Cloud Based Memory Architectures The Next Big Thing? - http://highscalability.com/blog/2009/3/16/are-cloud-based-me...

* Why Are Facebook, Digg, And Twitter So Hard To Scale? - http://highscalability.com/blog/2009/10/13/why-are-facebook-...

* Paper: The End Of An Architectural Era (It’s Time For A Complete Rewrite) - http://highscalability.com/blog/2009/4/16/paper-the-end-of-a...

* Ask For Forgiveness Programming - Or How We'll Program 1000 Cores - http://highscalability.com/blog/2012/3/6/ask-for-forgiveness...

* The Performance Of Distributed Data-Structures Running On A "Cache-Coherent" In-Memory Data Grid [disclosure: I wrote that post] - http://highscalability.com/blog/2012/8/20/the-performance-of...

* Big Data Counting: How To Count A Billion Distinct Objects Using Only 1.5KB Of Memory - http://highscalability.com/blog/2012/4/5/big-data-counting-h...

* MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL) - http://highscalability.com/blog/2012/8/14/memsql-architectur...

* Think Of Latency As A Pseudo-Permanent Network Partition - http://highscalability.com/blog/2010/8/12/think-of-latency-a...

* How Will Memristors Change Everything? - http://highscalability.com/blog/2010/5/5/how-will-memristors...

lifeisstillgood · on Oct 29, 2012

My big question, and my big unsolved question, for all distributed, scalable systems, is managing internal authorisation and authentication.

If you are calling across HTTP what do you use to both ensure the calling server/service is one of your own, and to ensure the call originates from a user authorised to do what you are asking.

Does OAuth cut it? Do all these folks run their own internal OAuth networks, and just not mention it? SSL client certificates? Kerberos? I have never found a satisfactory "framework" for this for myself.

Anyone?

buro9 · on Oct 29, 2012

OAuth does not cut it.

Sure, it works... you can do it. But wow on the headaches you're going to get down the line and the volume of HTTP chatter.

You generally end up adding the ability for servers to impersonate other users, so that a server process speaks to an OAuth authority to determine whether it is allowed to generate a user token and perform some task in their name.

All kinds of hell lies that way.

Maybe there's someone from the BBC here? They have a nice system using server certificates. Every application runs in it's own user process and identity, and every user is authenticated by certificates. The different servers have knowledge of who is allowed to do what, and users (applications) that try things they're not allowed get refused and reported.

I don't know the nitty gritty of their system, but the gist is "certificates everywhere", it's proven and it works.

antihero · on Oct 29, 2012

You start by having internal sources talk to each other on an internal network that isn't accessible to the outside.

ericcholis · on Oct 29, 2012

The way AWS handles permissions between servers works pretty well also.

However, I think he was referring to APIs in part. I personally haven't met a scheme that I really liked. With that being said, I'm generally a fan of limiting access by IP, assigning API keys or tokens, access over SSL, and simple hashing to verify message integrity. Obviously this doesn't work for all things, but you get the picture.

Overall, I doubt that there is a one size fits all solution. Security is generally many parts working together to limit penetration as well as ensure message integrity and authenticity.

rubyrescue · on Oct 29, 2012

One thing that really sticks out - there's a LOT of MySQL on here. and very little Mongo and Riak. That's not to say those are not useful technologies but it's clear SQL Scales.

bmelton · on Oct 29, 2012

Not disagreeing with you -- there have obviously been sites like Slashdot and Reddit that scaled on MySQL for a long time before the current crop of NoSQL solutions, but it could also indicate that scaling MySQL is non-trivial and hence, worthy of noting tried and true solutions to emulate.

showerst · on Oct 29, 2012

It's worth noting that the current crop of NoSQL databases isn't _that_ old, and many of these sites were designed and built years before mongo or riak were really production ready.

I'd bet that in 3-5 years this list will have more nontraditional databases on it.

drwl · on Oct 30, 2012

Great read -- thanks for the share!