Modern network load balancing and proxying

trjordan · on Jan 15, 2018

This is a great introduction to a lot of the problems Envoy is trying to solve. If you're interested in learning more, the rest of the Envoy blog [0] is great. Coincidentally, I also spent most of last week putting compiling [1] all the posts we could find on Envoy, smart proxies, and modern load balancing.

[0] https://blog.envoyproxy.io

[1] https://www.turbinelabs.io/resources

gopalv · on Jan 15, 2018

The biggest bit I miss from messing with proxies in the past has been "stop, hold, upgrade" game that web-servers could play with proxies.

This means that you can have your proxy "hold requests" for a few seconds while your system restarts in the background & slowly let off the requests out to the backend in "one more request per second" scale up.

This sort of stuff would work great with a proxy that actually understands something a kubernetes scheduler & know when a node is failing/restarting etc.

The_Fox · on Jan 16, 2018

Istio is a networking project for Kubernetes (and other orchetrators) that uses Envoy as the proxy. It understands quite a bit about what's going on at the container level. https://istio.io/docs/concepts/traffic-management/handling-f...

indigodaddy · on Jan 15, 2018

It may take some years for mid to large sized corporations and enterprises to transition from F5 & A10 L7 hardware load balancing (and the like), to router/load balancing as a service, or even to software-based solutions.

zaroth · on Jan 15, 2018

I'm not saying there's never a use case for a proxy to do health checks, TLS termination, load balancing, sticky sessions, etc.... but I don't see how that can compete with using ECMP with hash-based flows and TLS termination at the service endpoints.

mahmoudimus · on Jan 15, 2018

For those who do not know what ECMP means, I found this blog post helpful: https://blog.talentica.com/2016/12/09/hash-based-ecmp-load-b...

tl;dr -- ECMP means Equal Cost Multi-Path

packetslave · on Jan 15, 2018

off the top of my head:

* non-equal-cost loadbalancing (give the Skylake boxes 25% more traffic than the Broadwell boxes)

* ECMP doesn't know anything about service health, so what happens when one app server out of 10 gets wedged and stops responding to requests?

* TLS termination on the proxy means you limit what machines need to hold your private keys.

* what if you want to load-balance or direct traffic based on something other than a 5-tuple hash?

* ECMP doesn't scale very well. Not all that long ago, at $DAYJOB, we had scalability problems because a certain large network vendor couldn't do more than 8 next hops for a single prefix.

_jbez · on Jan 16, 2018

I agree with all that you said except the scaling limits of ECMP. We have boxes that support 64 ECMP destination and I've seen others mention 256: https://youtu.be/ciClZdwHelU

packetslave · on Jan 16, 2018

Nice to hear things have gotten better. 8 and 16 next-hop limits were very common a few years ago, especially on devices that you'd use as a top-of-rack switch.

ECMP still has some scaling challenges, IMHO, since each destination host still needs to peer with the upstream switch over BGP (unless you do some cleverness).

devonkim · on Jan 15, 2018

I don't understand, Google uses ECMP in Maglev [1]. Is there something specific about Maglev that makes their usage of ECMP viable while the scenarios that you encountered were poor fits?

[1] https://research.google.com/pubs/pub44824.html

packetslave · on Jan 15, 2018

Maglev is a network load balancer that operates at L3/L4. ECMP on the upstream routers spreads traffic for a VIP across a pool of Maglevs, but the Maglev itself does more complicated backend selection.

Envoy is an L7 proxy, so it's the moral equivalent of the Google GFE (which, conveniently enough, is the next layer of load balancing behind Maglev for most Google services).

moduspwnens14 · on Jan 15, 2018

Envoy has a very neat way of solving some interesting modern problems and I'm interested to see what this space looks like in the upcoming few years.