So this is what, a 2000 request/sec peak? Over 11 servers, that's like 200 reque...

Nick-Craver · on July 22, 2014

Peak is more like 2600-3000 requests/sec on most weekdays. Remember that programming, being a profession, means our weekdays are significantly busier than weekends (as you can see here: https://www.quantcast.com/p-c1rF4kxgLUzNc).

It's almost all over 9 servers, because 10 and 11 are only for meta.stackexchange.com, meta.stackoverflow.com, and the development tier. Those servers also run around 10-20% CPU which means we have quite a bit of headroom available. Here's a screenshot of our dashboard taken just now: http://i.stack.imgur.com/HPdtl.png We can currently handle the full load of all sites (including Stack Overflow) on 2 servers...not 1 though, that ends badly with thread exhaustion.

We could add web servers pretty cheaply; these servers are approaching 4 years old and weren't even close to top-of-the-line back them. Even current generation replacements would be several times more powerful, if we needed to go that route.

Honestly the only scale-up problem we have is SSD space on the SQL boxes due to the growth pattern of reliability vs. space in the non-consumer space. By that I mean drives that have capacitors for power loss and such. I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

Nick Craver - Stack Exchange Sysadmin & Developer

voltagex_ · on July 22, 2014

>I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

I'm sure some DBAs and devs here would find it interesting.

bagels · on July 22, 2014

This is definitely the case. Every writeup from those in the trenches I read and share with coworkers.

cagenut · on July 22, 2014

  I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

Yes! please yes. The occasional real world gem like that is what makes scrolling past a thousand javascript-framework/obscure-language/alternative-datastore posts here worth it (for me).

zaroth · on July 22, 2014

I'm most impressed that all your traffic is passing through a single active HAproxy which is also terminating SSL?

TLS Session IDs and tickets of course are absolutely essential, but I'd be curious how many peak TPS (number of full handshakes / sec) you are seeing on HAproxy.

The alternative, fanning out your SSL termination to your IIS endpoints, unfortunately means running HAproxy at L2, so you lose all your nice logging.

Nick-Craver · on July 22, 2014

I'm not sure how many full handshakes/sec we're running - the logging isn't as readily available there because we're explicitly not logging the listeners involved on separate proc binds. We are logging the traffic on the port 80 front end they feed. We have tuned the performance there via best practices and taking into account very recent HAProxy changes for various SSL windows added by Willy, Emeric and help from Ilya Grigorik. They're all awesome and tremendously helpful. We are on a version slightly newer than 1.5.2 (we build straight from the repo).

Even with all that SSL we're only running around 15% CPU at peak so it's not having any trouble. Most of that CPU usage does come from the SSL termination though - it ran around 3-5% CPU before. We're also working on much larger infrastructure changes that mean SSL termination much more local to our users, which means the HAProxy load will drop to just about nothing again. I'm working on a followup to my previous SSL post: http://nickcraver.com/blog/2013/04/23/stackoverflow-com-the-... that will explain why it's not already on for all hits...I think hacker news will have fun with hat one.

All that being said, there's no reason we can't forward that syslog traffic from the listeners to our logging infrastructure to get at least a counter out of it. If you're curious how we log, I'll explain a bit.

We feed it into a logging display called Realog built by Kyle Brandt and Matt Jibson in Go. Here are a few snapshots of the dashboard: http://i.stack.imgur.com/OIqhm.png http://i.stack.imgur.com/iQOP9.png http://i.stack.imgur.com/bfpUb.png http://i.stack.imgur.com/JZNy6.png

This lets us forward on custom headers to logging that the web servers are sending back. It tells us how much time we spent in SQL, redis, etc. We can graph it (bottom of the first link) and identify the root cause of any issues faster. It also handles parsing of that traffic from the syslog entry already, so we use it as a single logging point from HAProxy and it handles the JSON generation to forward that traffic data into logstash (a 300 TB cluster we're working on setting up right now).

As soon as we get the logstash piece working well and dogfooded thoroughly, I'll poke Kyle to open source Realog so hopefully some others can get use from it.

Nick Craver - Stack Exchange Sysadmin & Developer

joshyeager · on July 22, 2014

My team would definitely be interested to read that. Please do!

qntmfred · on July 22, 2014

yes please

sklivvz1971 · on July 21, 2014

Yes, the idea is to scale up as much as economically makes sense and only then scale out, but due to Moore's law we're still scaling up.

Not everyone has google-like problems that are betters solved by a battery of cheap boxes.

threeseed · on July 21, 2014

Actually most of the websites today are better solved by a battery of cheap boxes.

You simply put a load balancer (Nginx/ELB/HAProxy etc) in front of a fleet of smaller web/application servers that dynamically scale depending on traffic. That way it is cost effective, far more reliable, easier to scale and you can tolerate DC outages better.

mwsherman · on July 22, 2014

As with most systems, the data store is the hard thing to scale out. That requires intelligence at the application level, and that might include CAP issues.

So for us, the SQL data store is the real “up” part of the equation. We have a fair amount of headroom there, so if we can keep sharding (and other data strategies) out of the codebase, so much the better.

(Load-balancing HTTP requests “out” is not a big deal and we are doing that.)

sklivvz1971 · on July 22, 2014

Marco/Stack Overflow here.

Simply put, this is not true at all. Most of the websites today run happily on a single box and the choice of architecture is frankly irrelevant because of the abysmal amount of traffic they get.

Large, and by large I mean top 100 sites, are large and typically complex systems. Each one is different and bespoke. Making broad generalizations doesn't really work, because typically they have parts that need to be scaled up, parts that need to be scaled out, etc.

Finally, if you read the article you would know that we already use a load balancer with multiple web servers. What we are discussing here is whether it's better to have 100s of cheap/cloud boxes versus a few big ones.

In our use case (we want fast response times) the second solution is demonstrably better.

mwfj · on July 21, 2014

Not quite sure how you derive a peak figure from a monthly average; but on average it's more like 560000000/(30 * 24 * 3600) = 216 req/s

MichaelGG · on July 21, 2014

I took 4 hours at 20 days to get the peak, figuring it probably won't be worse than that. The average over the month isn't a number you'll see in practice. Most perf engineering work seems to go to keep things working nicely while at peak. (And making sure you have good 95th percentile times.)