Hacker News new | past | comments | ask | show | jobs | submit login
High Scalability - All Time Favorites (highscalability.com)
125 points by zerop on Oct 29, 2012 | hide | past | favorite | 11 comments



The POF one tickles me, i love the whole approach.

The genius in it is not spending untold time and money building the latest whizz-bang responsive gui, or never getting launched because the "web scale" back end isn't the worlds most perfect design yet. It's all just hacked together organically, with the simplest, path of least resistance choices taken at every stage.

I love that this succeeds, i love that it's not over ambitious, i love that it costs the owner minimal maintenance time.

Hats off to that man!


What I found most interesting was in the Update to the POF article where Jeff Atwood compares the cost of scaling up (as POF did) vs. scaling out (as Atwood does) and notes the comparison of 1 heavy DB server vs its equivalent cost of approximately 83 commodity boxes.

The article factors in hardware and software licensing, but completely overlooks the cost of system administration. With the base cost of $100,000 used in the comparison, that's only a minor part of the price point vs. hiring the head count needed to administer 83 commodity servers, which I could reasonably see requiring a few FTEs if they performed their own hardware support.


That's a long list. Here are my picks (in no particular order):

* Are Cloud Based Memory Architectures The Next Big Thing? - http://highscalability.com/blog/2009/3/16/are-cloud-based-me...

* Why Are Facebook, Digg, And Twitter So Hard To Scale? - http://highscalability.com/blog/2009/10/13/why-are-facebook-...

* Paper: The End Of An Architectural Era (It’s Time For A Complete Rewrite) - http://highscalability.com/blog/2009/4/16/paper-the-end-of-a...

* Ask For Forgiveness Programming - Or How We'll Program 1000 Cores - http://highscalability.com/blog/2012/3/6/ask-for-forgiveness...

* The Performance Of Distributed Data-Structures Running On A "Cache-Coherent" In-Memory Data Grid [disclosure: I wrote that post] - http://highscalability.com/blog/2012/8/20/the-performance-of...

* Big Data Counting: How To Count A Billion Distinct Objects Using Only 1.5KB Of Memory - http://highscalability.com/blog/2012/4/5/big-data-counting-h...

* MemSQL Architecture - The Fast (MVCC, InMem, LockFree, CodeGen) And Familiar (SQL) - http://highscalability.com/blog/2012/8/14/memsql-architectur...

* Think Of Latency As A Pseudo-Permanent Network Partition - http://highscalability.com/blog/2010/8/12/think-of-latency-a...

* How Will Memristors Change Everything? - http://highscalability.com/blog/2010/5/5/how-will-memristors...


My big question, and my big unsolved question, for all distributed, scalable systems, is managing internal authorisation and authentication.

If you are calling across HTTP what do you use to both ensure the calling server/service is one of your own, and to ensure the call originates from a user authorised to do what you are asking.

Does OAuth cut it? Do all these folks run their own internal OAuth networks, and just not mention it? SSL client certificates? Kerberos? I have never found a satisfactory "framework" for this for myself.

Anyone?


OAuth does not cut it.

Sure, it works... you can do it. But wow on the headaches you're going to get down the line and the volume of HTTP chatter.

You generally end up adding the ability for servers to impersonate other users, so that a server process speaks to an OAuth authority to determine whether it is allowed to generate a user token and perform some task in their name.

All kinds of hell lies that way.

Maybe there's someone from the BBC here? They have a nice system using server certificates. Every application runs in it's own user process and identity, and every user is authenticated by certificates. The different servers have knowledge of who is allowed to do what, and users (applications) that try things they're not allowed get refused and reported.

I don't know the nitty gritty of their system, but the gist is "certificates everywhere", it's proven and it works.


You start by having internal sources talk to each other on an internal network that isn't accessible to the outside.


The way AWS handles permissions between servers works pretty well also.

However, I think he was referring to APIs in part. I personally haven't met a scheme that I really liked. With that being said, I'm generally a fan of limiting access by IP, assigning API keys or tokens, access over SSL, and simple hashing to verify message integrity. Obviously this doesn't work for all things, but you get the picture.

Overall, I doubt that there is a one size fits all solution. Security is generally many parts working together to limit penetration as well as ensure message integrity and authenticity.


One thing that really sticks out - there's a LOT of MySQL on here. and very little Mongo and Riak. That's not to say those are not useful technologies but it's clear SQL Scales.


Not disagreeing with you -- there have obviously been sites like Slashdot and Reddit that scaled on MySQL for a long time before the current crop of NoSQL solutions, but it could also indicate that scaling MySQL is non-trivial and hence, worthy of noting tried and true solutions to emulate.


It's worth noting that the current crop of NoSQL databases isn't _that_ old, and many of these sites were designed and built years before mongo or riak were really production ready.

I'd bet that in 3-5 years this list will have more nontraditional databases on it.


Great read -- thanks for the share!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: