I can boil water for tea in my oven, but should I? This is totally nonstandard a...

phil21 · on May 28, 2014

Really? If you can't understand this, or why it's better than almost all other architectures for massive horizontal scaling you're not thinking it through.

I've run similar setups utilizing ECMP -> HAProxy -> content servers, that scaled into the multi-terrabit range.

My junior level sysadmins understood how it worked, and it's a hell of a lot nicer to be able to run L3 to every access port and not deal with epic hacks like DSR and other extremely hard to troubleshoot stuff on L2.

It can be explained basically as "see this process 'bgpd' running? That is what tells the traffic to come to this load balancer - kill it and the traffic goes away, start it back up and it comes back". From there, the config stuff is trivial and it's just another HAProxy instance.

The hardest part of implementing such a solution is coming up with sane service checking scripts. You want to down a single HAProxy instance should it be having issues, but you certainly don't want to down every single one should all your webservers alert at the same time (e.g. application update fail, or whatever). We had ours setup with very basic healthcheck scripts for BGP (is haproxy alive? is it answering requests? stay up!), and then much more complex checks haproxy did itself on the webservers - with paths of last resort and the like.

This architecture also scales great when you put your big boy pants on and need to start doing anycasting. You pretty much already have the architecture setup for it, you just need to change some IP's and how you do route aggregation in each PoP for your anycast space. It's a great feeling when you can down an entire PoP and traffic instantly moves over to the next closest, then comes right back after maintenance.

I have yet to see a more simple, concise, and reliable architecture for serving up massive amounts of HTTP. Once you get into the 100gbps+ range, the usual vendor offerings are laughable considering the costs. I would say based on most vendor demos we did, the BGPd+HAProxy solution was far easier to understand and administer at a large scale.

DNS RR to horizontally scale needs to finally die off. ECMP is a great way to retain full control over your traffic flow, and is essentially "free" on any modern networking gear that you already have.

dsl · on May 28, 2014

As you start engineering bigger and bigger systems you'll start to discover that you sometimes need a complex solution to a big problem.

The sticky point here for some people is it requires you to hire actually talented people, and not rely on trendy methodologies focused around getting acceptable work from a larger pool of mediocre developers and sysadmins.

yelloblac · on May 28, 2014

Why would this be a nightmare to document? This is an approach that has been successfully implemented and maintained in many companies, I think it's more a matter of whether or not this approach works for your team, not whether or not it can be documented(Which it can be).

forgottenpass · on May 28, 2014

I'm not a fan of this methodology myself, but BGP is an incredibly standard technology. You're just not used to it in the areas you tend to work in. Which is fine, and it's also fine if you don't want to hire a guy with a networking skillset to work on your network. I just ask that you're a bit more realistic with your reasoning against it.

subway · on May 28, 2014

This is becoming increasingly standard, and is quite easy to document.

yelloblac · on May 28, 2014

twothamendment · on May 28, 2014

Hard to hand off. Isn't that the point?