I'm really reaching back into the depths of my memory, but I've implemented this in the past. It's not quite as simple as they make it sound - there's a lot of sticky edge cases that crop up here (some of which have no doubt been addressed in subsequent years).
- It heavily limits the number of nodes you can have - that is something the article does say, but I want to highlight here. It strikes me as a really bad strategy for scale-out.
- I've run into weirdness with a variety of different router platforms (Linux, Cisco, Foundry) when you withdraw and publish BGP routes over and over and over again (i.e. you have a flapping/semi-available service).
- It is true that when a node goes down, the BGP dead peer detection will kick in and remove the node. However the time to remove the node will vary, and require tuning on the router/switch side of things.
This is a fairly crude implement to swing - machete rather than a scalpel. You lose a lot of the flexibility load balancers give you, and depend a lot more on software stacks you have less insight and visibility into (router/switches) and are also not designed to do this.
My suggestion would be that this is a great way to scale across multiple load balancers/haproxy nodes. Use BGP to load balance across individual haproxy nodes - that keeps the neighbor count low, minimizes flapping scenarios, and you get to keep all the flexibility a real load balancer gives you.
One last note - the OP doesn't talk about this, but the trick I used back in the day was that I actually advertised a /24 (or /22, maybe?) from my nodes to my router, which then propagated it to a decent chunk of the Internet. This is good for doing CloudFlare-style datacenter distribution, but has the added benefit that if all of your nodes go down, the BGP route will be withdrawn automatically, and traffic will stop flowing to that datacenter. Also makes maintenance a lot easier.
> My suggestion would be that this is a great way to scale across multiple load balancers/haproxy nodes. Use BGP to load balance across individual haproxy nodes
Exactly. BGP, while it may work like the OP said, was not meant to live this close to the actual server nodes.
You could push BGP even further away. In a more traditional model, it's meant to be used to switch (or load balance) between geographically separated datacenters.
- It heavily limits the number of nodes you can have - that is something the article does say, but I want to highlight here. It strikes me as a really bad strategy for scale-out.
- I've run into weirdness with a variety of different router platforms (Linux, Cisco, Foundry) when you withdraw and publish BGP routes over and over and over again (i.e. you have a flapping/semi-available service).
- It is true that when a node goes down, the BGP dead peer detection will kick in and remove the node. However the time to remove the node will vary, and require tuning on the router/switch side of things.
This is a fairly crude implement to swing - machete rather than a scalpel. You lose a lot of the flexibility load balancers give you, and depend a lot more on software stacks you have less insight and visibility into (router/switches) and are also not designed to do this.
My suggestion would be that this is a great way to scale across multiple load balancers/haproxy nodes. Use BGP to load balance across individual haproxy nodes - that keeps the neighbor count low, minimizes flapping scenarios, and you get to keep all the flexibility a real load balancer gives you.
One last note - the OP doesn't talk about this, but the trick I used back in the day was that I actually advertised a /24 (or /22, maybe?) from my nodes to my router, which then propagated it to a decent chunk of the Internet. This is good for doing CloudFlare-style datacenter distribution, but has the added benefit that if all of your nodes go down, the BGP route will be withdrawn automatically, and traffic will stop flowing to that datacenter. Also makes maintenance a lot easier.