Hacker News new | past | comments | ask | show | jobs | submit login

ECMP is old, we've being pulling this stunt for a long time, for the right applications (e.g., UDP, short lived TCP). Adding/removing routes in ECMP can cause flows to hash to a new destination. The connection tracking and consistent hashing in Maglev is important to deal with faults and avoid disrupting unrelated flows. Still nothing new.



(Tedious disclaimer: my opinion only, not speaking for anybody else. I'm an SRE at Google, and I'm oncall for this service.)

Well, it's "not new" in the sense that this system's been running google.com for about 8 years now ;)

This is just the first time that we've published how it works.


We (Cumulus Networks) support a feature called "resilient hashing" that ensures that if a software LB fails, only the flows going to that LB are redistributed to the remaining LBs.

You still lose some connections when an LB fails, but only the ones going through the failed LB. Unrelated flows to other LBs are not impacted.

We've got multiple customers doing variants of the LB architecture Google talks about here.


I didn't know Cumulus had a LB product. Any more details that you can share?


You can turn any switches into ghetto load balancers by running BGP on some hosts and advertising a /32 into your switches.


You want BGP on your edge router, where your transit connections, these are what do the ECMP towards your LBs which speak iBGP. I'm not sure where you would use a cheap switch in this setup, certainly not at the edge.


Many recent data centers run eBGP between all switches. I can't explain the rest without diagrams.


Huh? Then you must not understand what you are talking about very well. Are you talking about running BGP to ToR switches? What does that have to do with ECMP or load balancers? Its just an L3 design and its not new.


I think they mean hashing algorithm for cumulus ECMP routing, that serves downstream LBs


but this resiliency has nothing to do with the hashing per se correct? Once the route is withdrawn via BGP it is no longer a viable path so it wouldn't ever be routed to and by extension "hashed(source/dest etc.) Or am I misunderstanding what you are saying?


The problem that resilient hashing solves is that if you have 8 LBs in an ECMP group, and one dies and gets withdrawn, a naive hash function would redistribute all flows randomly, meaning every active connection would break.

Resilient hashing means that only flows going to the dead LB will get rehashed to other LBs. Those flows would break anyway, but the remaining flows are OK.


such a "flow aware" bucket resiliency might work if a single LB forwards to all backend servers. but, as suggested in Maglev, if a ECMP is used towards multiple such Maglev LBs (each of them are forwarding to a set of backend servers), then we cannot pursue a (distributed) "flow-aware" resiliency..

in such cases, only consistent hashing or maglev hashing might be the only option.


Usually ECMP on your LBs, you run a routing daemon on the LB box - such Quagga or BIRD and the speak BGP to the edge, if an LB fails or goes away then the route is withdrawn from BGP peer and this withdrawl is how faults are dealt with. Can you or someone else elaborate on how Malev adds or differs from that? Unfortunately I can not get to google at the moment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: