Load balancing and its different types

dragontamer · on Jan 26, 2021

Load balancing as a strategy is used in far more than just web-applications!

This article only discusses web-based load balancing, which is absolutely important, but doesn't discuss supercomputer scheduling load-balancing. Its arguably a different subject... but the concept is the same.

When you have 4000 nodes on a supercomputer, how do you distribute the problem such that all the nodes have something to do? Supercomputer problems are sometimes predictable (ie: matrix-multiplications), and you can sometimes "perfectly load balance without communication".

But in the case of web-applications, there's probably no way to really predict the "cost of performance" before you start processing the service request (what if its a Facebook request to a really old photograph? Facebook may have to pull it out of long-term storage before it can service that request. There's no real way to know at the load-balancer whether a picture-request would be in the cache or not... at least, not before you process the request to begin with!)

-----------

In any case, I think adding "Predict the computational cost, calculate the costs you distributed to different nodes, and then distribute the new load to the node with lowest computational cost given so far" is a good method that works in some applications. (All blocks in a dense matrix multiplication have the same cost, so just keep passing out subblocks to all nodes as you're working on the problem)

snoshy · on Jan 26, 2021

Arguably, one of the most important characteristics of a load balancer is to have extremely low latency. If you're balancing loads, you want to be very quick about making a decision. When generating predictions about the computational cost, that can itself add in a computational cost that might result in a non-negligible amount of computational cost overhead.

Inherently, the idea that you're talking about boils down to having a way to characterize the nature of the request flows in such a manner that they can be evenly distributed. The ideal way to characterize them then, would be to know this information beforehand such that it does not require any computation at all to normalize the costs. As such, the best strategy would be to actually segregate traffic flows such that they're forwarded to "dumb" load balancers that use one of the strategies from TFA like weighted round robin.

Of course, there are many such optimizations available, but TFA seems to be targeting a beginner level introduction to a rather complex topic. As you describe, load balancing and scheduling algorithms have a pretty high overlap in terms of their theoretical foundations, and these concepts manifest themselves throughout any large scale system.

toast0 · on Jan 26, 2021

> But in the case of web-applications, there's probably no way to really predict the "cost of performance" before you start processing the service request (what if its a Facebook request to a really old photograph? Facebook may have to pull it out of long-term storage before it can service that request. There's no real way to know at the load-balancer whether a picture-request would be in the cache or not... at least, not before you process the request to begin with!)

I worked at Facebook, but didn't touch anything related to this; so this is all conjecture.

The load balancer can't (or shouldn't) predict the cost to service a request; but the thing that generated the url for the image could; and that prediction could be passed in the url for the balancer to act on.

If you really need to balance by performance, it's probably simpler and accurate enough to provide frequent load feedback to the balancer. As long as you have a lot of requests, simple things work pretty well.

saranshk · on Jan 26, 2021

Concepts are definitely transferrable across domains. And then adapted according to desired outputs. Cost of performance is interesting and somewhat similar to the least-time approach but adapted to the supercomputers domain.

r0mdau · on Jan 26, 2021

Software load balancing solutions have now more algorithms such as least time (nginx+ for example). And yes some ISP cache DNS entries for a long time... But DNS load balancing should be used only on disaster scenarios to mitigate.

toast0 · on Jan 26, 2021

DNS load balancing works good enough if you have smart enough clients (not web browsers), and your pool of server IPs is fairly static. If you can select randomly from a list of names, and then try several of the A/AAAA records from that result, then you may have some delay if you pull a dead server from a cached record, but it won't be too bad. SRV records and really smart clients should work pretty well too, but not a lot of people have really smart clients.

The vast majority of ISP caches won't keep your low TTL records in cache for years, but some do; this is a problem if you have to move your load balancers ever too though.

Depends on how stable your servers are vs your load balancers, and how many connections you need; and if you have enough IP addresses to give public IPs to your servers. Also, if you really absolutely need to control the load precisely, DNS isn't going to ever give you that.

saranshk · on Jan 26, 2021

Added least time to the post as well. Thank you!

sparrc · on Jan 26, 2021

Decent summary but a little out-dated on DNS load balancing.

Major cloud services like AWS support health/status checks through DNS these days: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/re...

It's also trivial to get around the client caching issue, just set a low TTL. Perhaps in the olden days providers had stricter limits on the minimum TTL you can set, but these days you can set it practically as low as you want.

EDIT: as a few commenters have fairly pointed out, TTL can easily be ignored by poorly-behaved ISPs and clients, so I'll admit calling it "trivial" to get around is not exactly accurate.

samprotas · on Jan 26, 2021

Having executed several "no-downtime" cutovers between systems via DNS updates, I will warn you that a surprising number of clients never re-resolve DNS, so the TTL is effectively "forever" from their point of view.

For the rare case of lift-and-shift-ing for a system upgrade I felt morally okay about eventually pulling the plug on them, but I'd hesitate to design a system that relied on well-behaved DNS clients if I had a reasonable alternative.

tyldum · on Jan 26, 2021

Another gotcha would be UDP based services. Since it is packet oriented and not connection oriented, when should it re-resolve? Most will not until the application is restarted.

gary_0 · on Jan 26, 2021

When I last updated a domain most clients saw the change within the TTL (1 hour)... except for my cable ISP at home. It took them the better part of a week.

xorcist · on Jan 26, 2021

Moving by DNS change isn't usually that bad. The old system (load balancer) can proxy requests to the new system. Most clients will follow DNS but the laggers won't have too much trouble. Assuming the service already works behind a load balancer of course, that is usually not something than can be fork-lifted in.

dilyevsky · on Jan 26, 2021

Except it’s not trivial at all because isp resolver will just disregard your low ttl

jorblumesea · on Jan 26, 2021

TTL is difficult in practice due to client implementations and other issues like that. Be careful using DNS anything. DNS was not designed to immediately resolve anything. That's why IPs are mostly used.

notabee · on Jan 26, 2021

Many applications do not refresh their DNS with every connection either. Take for example an Apache reverse proxy that's reusing long lived connections. So updating DNS may still require restarting/reloading many upstream services.

https://stackoverflow.com/questions/52032150/apache-force-dn...

saranshk · on Jan 26, 2021

I know about the caching issue is a little trivial but it was worth mentioning. Though I should have mentioned the low TTL piece. I will add that to the post. And also will add the health check part too. Reading up a bit about it. Thanks for the information!

Terretta · on Jan 26, 2021

I like this intro!

Good to see ‘random’ redefined as pick random two then assign to one with least connections, which works strictly better than either random or least connections alone.

Misses a couple categories that may be relevant: least hops or best transit type network-mapped balancing to get to the ideal set of servers globally, as well as a technique that not-so-simply connects the user to the geography with the fastest response for them right then.

While the article notes value of fewer connections for a stream, you can take a bit longer in setup for a stream to get it right, as you and the viewer will pay the price longer if you get it wrong.

All this gets much more complicated when balancing very large objects, as you have to consider content availability and cache bin packing among the servers you balance to.

saranshk · on Jan 26, 2021

Thank you. I am not sure how to introduce transit and the concept of hops in an introductory post without explaining in depth about the networking side of it. Maybe you could help me out with that?

Terretta · on Jan 26, 2021

I’d think it’s fine to be hand-wavy:

Load balancing isn’t just about server load or congestion, it’s also about network load and congestion. If a web page or video takes longer for a user to download, it ties up the server longer too.[1]

Load balancing algorithms can also consider network paths or round trip times between the user and a server to give users a faster web download or video stream. To do this, they may use information from network routing topology, such as how many “hops” or routers between the user and the server, or may even triangulate actual network performance by assessing measurements from multiple data centers and load balancing to the most responsive.

1. See “snoshy” comment on latency in these comments: https://news.ycombinator.com/item?id=25920284 — roughly, you aim to avoid queuing or connection creep, as you mentioned in the intro, and speed of opening, transmitting data over, then closing the connection, can make a huge difference.

supernova87a · on Jan 26, 2021

Speaking of, seems like wisdomgeek.com needs some load balancing right now...

saranshk · on Jan 26, 2021

Ha, I know. It's a single VM instance right now. I have been thinking of migrating to Gatsby for quite some time now. This unexpected traffic and the server limitations give me a reason to get working on it!

crazypython · on Jan 26, 2021

Or you could serve a fully static site with vanilla JS.

saranshk · on Jan 26, 2021

vanilla JS might be too much to maintain. I have already started working on the Gatsby version and will try and accelerate that.

crazypython · on Jan 27, 2021

I mean generating your HTML on the server. If you use HTML as a semantic markup language, it works pretty well. Then, using Vanilla JS for the interactivity. I doubt it's "too much to maintain"- what is harder in Vanilla JS than Gatsby?

You can write much of your site in raw HTML. For example, your right panel– you can write it in raw HTML, annotate it with some tags. Make it an iframe, just use CSS to get rid of the borders, or an XMLHTTPRequest. As long as you make good use of CSS, you can have a few tags and it'll work.

nwsm · on Jan 26, 2021

Nice succinct writeup. Here's a great deep dive on "Practical Load Balancing with Consistent Hashing" from a Vimeo engineer (2017) [0].

[0] https://www.youtube.com/watch?v=jk6oiBJxcaA

Yhippa · on Jan 26, 2021

I was explaining the DNS system and load balancing today and I kind of mixed it all up based on this wonderful link. Thanks, I will share this out with that person to undo the damage I might have done.

saranshk · on Jan 26, 2021

We all learn everyday. I am learning from the comments here as well. The best we can do is accept we were wrong and correct it.

kvhdude · on Jan 26, 2021

i would have liked to see details of layer4 vs layer7 load balancing. The latter invovles terminating a tcp session and reinitiating one to backend.

toast0 · on Jan 26, 2021

Layer 4 load balancing can be a huge reduction in work for the load balancer; especially if it's in a Direct Server Return configuration, where the load balancer only sees incoming packets, and response packets go directly from the server.

The downside is you lose any ability to balance based on details of the application protocol, it requires some specific network setup, and it's hard to find a DSR load balancer in managed hosting or cloud. I'm not sure if there's off the shelf software to manage DSR either (the basic pieces are there in most firewalls, but management isn't)