The Thundering Herd Problem

cortesoft · 2024-02-02T18:18:29 1706897909

As someone who has worked on very large networks for 15 years, the author left out some important bits about using caching with thundering herd problems.

You really need to make sure that your cache settings will not generate a ton of requests to the origin when new content is requested.

If you have a bunch of clients that are going to request content at nearly the exact same time (this happens a lot with live video streaming, but can also happen when your clients are configured to download something as soon as it becomes available, or on a schedule, etc), you need to make sure your caches are set up so that the second request will wait for the first request to fill from origin rather than send its own cache fill request. Ideally, if 1000 clients all request the same content from your cache, only ONE request should go back to the origin, and the other 999 should wait and used the content retrieved by the first request. Otherwise, your cache isn't going to do anything since every request is going to require pulling from origin anyway.

In nginx, you do this with something like the proxy_cache_lock setting.

bitexploder · 2024-02-02T19:34:28 1706902468

Caching is generally way harder to get right than a lot of people probably think. Having delved into building caching systems that get the behavior and needs of the system just right, it took way more effort and research than I thought it would. You start with something simple and by the time you have chased down all the edge cases and tuned the cache behavior to your situation it is often not so simple after all.

seattleeng · 2024-02-02T22:36:18 1706913378

Yep, Ive found that, generally, caching logic should be a separate flow from business logic. E.g wrap your RPC logic such that an in-memory cache transparently sits in your RPC stack. This lets you implement locking on the cache in a single place, and makes it easy to do things like eager prefetching or offloading cache logic to another thread.

01HNNWZ0MV43FF · 2024-02-02T21:56:16 1706910976

I recall seeing that called "Request coalescing" in Faster Than Lime's article https://fasterthanli.me/articles/request-coalescing-in-async...

oops · 2024-02-02T22:31:14 1706913074

It is also sometimes called request collapsing.

pcthrowaway · 2024-02-03T11:33:58 1706960038

It's also sometimes called request deduplication

rollcat · 2024-02-02T17:57:45 1706896665

Adding a little bit of random delay to any scheduled / triggered actions is always a good idea, and is usually very cheap to implement.

If you're on OpenBSD, their cron supports randomized ranges using the ~ operator (see <https://man.openbsd.org/crontab.5>). Otherwise, you can use something like 'sleep $(($RANDOM % 60)) && some-task', but beware that $RANDOM has a range of 0-65535; you won't get the full, uniform 86400s range for daily jobs.

If you are unable to add a different random delay on every invocation (e.g. you have an RSS reader that only allows you to specify an exact interval), pick some nearby prime number, e.g. 37 instead of 30 or 45, or 71 instead of 60 (71-60=11 being a prime is also great).

If you're deploying a whole fleet at once, you can also vary the exact timings of cron.hourly, cron.daily, etc between machines.

traceroute66 · 2024-02-02T18:04:40 1706897080

For those who don't have the weird systemd allergy, systemd has many great features, including random scheduling, see `RandomizedDelaySec`.

rollcat · 2024-02-02T18:32:17 1706898737

Oh yes, systemd in general and timers in particular do have a whole bunch of awesome stuff - I just tend to forget any of it exists, it's my coping mechanism for dealing with "heterogeneous" environments. I still have CentOS6 machines in prod...

https://www.freedesktop.org/software/systemd/man/latest/syst...

jcul · 2024-02-02T22:17:37 1706912257

Yeah a lot of those nice things like random delays, back offs, evening using timezones are not available, even in the centos7 systemd.

PaulDavisThe1st · 2024-02-02T20:19:57 1706905197

Once again, what TFA describes is NOT the thundering herd problem.

https://en.wikipedia.org/wiki/Thundering_herd_problem

> In computer science, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event.

(emphasis mine)

gilbetron · 2024-02-02T21:27:46 1706909266

Thundering herd has evolved beyond the original definition, that happens. Sometimes terms even flip their meaning (like how "one bad apple" has flipped it's meaning, or how "pre-optimization is the root of all evil" as used these days isn't what the original author actually meant, but in this case I think most of the new usages fit the basic idea, which is a large amount of unusual traffic happens outside of normal usage patterns.

PaulDavisThe1st · 2024-02-02T21:50:19 1706910619

Fair point that terms like this can change meanings.

But this version absolutely does not fit the original, which had nothing to do with amounts of traffic.

It was a problem with kernel APIs: if N threads sleep waiting for an event on a socket (i.e. a thread pool), there was no way to wake up only one of them when a single msg arrived. They would all wake, all check if there was work to do, then all (but one) would go back to sleep. This is a behavioral flaw even in the case of next to no traffic (though it may not have an material effect).

The problem has been solved with better kernel APIs.

pcthrowaway · 2024-02-03T11:39:12 1706960352

I initially heard of "thundering herd" in reference to the pattern where a server handling N requests for the same data may only need to run the data handling logic once to fulfill all of those requests.

For example, if I have a server with an endpoint that needs to make a request to a different service for some data, I don't want to make that request 10 times when my server receives 10 requests while the first request is being handled; all 10 of those incoming requests can be fulfilled by 1 outgoing request to the secondary service.

In that sense, it's very similar to what you described, but it's still likely one process handling the requests.

I'll admit that the author seemed to use "thundering herd" in reference to your server just suddenly receiving a lot of traffic, which is also different from the usage I was familiar with.

Rapzid · 2024-02-03T02:28:31 1706927311

Okay but that's actually not the most general definition I'm familiar with.

It's a synchronization and resource constraint issue with various symptoms depending on domain specifics.

Let's discuss specifics about how the post isn't discussing a thundering heard problem in the general sense..

PaulDavisThe1st · 2024-02-03T02:56:59 1706929019

Please read the comment from user gilbetron appearing really close to yours, and my reply.

Rapzid · 2024-02-03T03:25:27 1706930727

Ah sorry, didn't realize thundering heard was solved by newer kernels. I can remove all my exponential back off with jitter retry strategies.

PaulDavisThe1st · 2024-02-03T05:03:50 1706936630

Your exponential back offs were not a solution to the problem identified and named in the mid-90s. They might be valuable for some other problem.

Rapzid · 2024-02-03T17:36:17 1706981777

Good thing the problem is solved then.

matsemann · 2024-02-02T18:30:39 1706898639

Once had to interface with an external API for many of our client requests (booking requests for transportation). Due to certain combinations of queries, it could sometimes be slow to get a response if the requested journey would contain multiple legs and combinations. So we needed to have a high timeout for our requests to that system. Which was fine most of the time.

Unfortunately, if that system started to struggle, all our calls would be slow, or even time out. And of course us continuously spamming them wouldn't help (we were like 95% of their traffic). But that didn't just wreak havoc on their side, but it completely exhausted all our incoming connections, thread pools etc. just waiting for responses that would never come, with long timeouts.

A circuit breaker here was golden. When too many requests in a row were slow, we would break and stop requests from our side. That would allow them to recover on their side, and things on our side to fail fast and let other services still work.

A key here, was that the circuit breaker would allow a small percentage through still. How else would you detect things are back up again? And then slowly let more and more through if things are going fine (so not just open completely all at once).

cbsmith · 2024-02-02T17:51:44 1706896304

The thundering herd problem comes from throughput decreasing due to overhead when you need to scale up capacity (traditionally due to increased context switching, but these days often from autoscaling activity). While rate-limiting, circuit-breaking, load-shedding, etc. are decent band aids for the problem, the better solution tends to be finding ways to increase amortization of request processing overhead as request load increases. Caching is a way to do it, but if done improperly can exacerbate problems. Another common approach, that I didn't see mentioned here in the article, is microbatching, where your pay your overhead costs on a per batch basis (often using streaming interfaces like Kafka). As you fall behind, your batch sizes increase, which increases the number of requests you process per batch, spreading the batch processing overhead over more and more requests.

pphysch · 2024-02-02T17:48:37 1706896117

The single biggest way to mitigate this is to switch from "push" to "pull" architecture where possible. Let the central servers reach out to the "clients" at their own honest pace. Not every process can be modeled this way, but a surprising amount can, particularly internal processes where you control all of the agents.

cbsmith · 2024-02-02T17:53:07 1706896387

Ironically, stream processing, which we tend to call "push" architecture, is largely the same solution.

pphysch · 2024-02-02T18:03:46 1706897026

Do you mean something like "at most once" delivery over UDP?

cbsmith · 2024-02-02T20:54:20 1706907260

No, that's more like traditional load shedding. I'm talking more like Kafka/NATS/0mq/etc.

pphysch · 2024-02-02T23:19:26 1706915966

Sure, but that's just moving most of the chaos from your app to a message bus. Which is almost always correct, assuming you are actually at the scale where you need that level of architectural complexity.

cbsmith · 2024-02-02T23:28:08 1706916488

I wouldn't agree that it is moving reliability issues to the message bus. This is more a scaling/availability problem than a reliability problem, but the message bus is definitely helping to solve scaling/availability problems.

While Kafka can indeed be pretty complex, streaming architectures aren't necessarily complex at all. For example, having a AWS Kinesis stream sitting in front of an AWS Lambda is pretty simple, and provides efficiency benefits even at comparatively small scale.

Berniek · 2024-02-02T22:17:16 1706912236

Philosophically, the "thundering herd problem" actually CAUSES congestion.

There are 2 ways to handle it.

1 Stop the thundering herd.(make all the clients do something different). That may make things worse. Congestion in networks is usually exponential. You can't fulfill a request so you repeat the request, it can't be fulfilled so it repeats. You can add a random delay at the client end but that is just The US governments answer to the debt problem, it kicks the can down the road in the short term but it will almost certainly come back and bite you. Mathematically it is very easy for this scenario to become catastrophically exponential when a threshold is reach

2 Stop the congestion (make the handling process faster or add processes)

The system already has a cache to handle this but if its not in the cache it doesn't help. There needs to be an extra request cache exclusively for congestion scenarios. The existing cache request is already doing some processing, so extend that to route "thundering herd requests processing". This second cache does a bit more processing as well.As each new request is routed to it, it checks itself to see if this requestor is in the cache and removes it or overwrites it. It should never contain more than one entry per client.

When no more editions are made to this congestion cache (or the rate has slowed significantly) then the requests can be forwarded and processed via the original cache system.

Under this configuration, the congestion does not become exponential and should only delay the thundering herd requests. All other requests will be handled as per normal.

Once the original cache has the information there is no need for any thundering herd requests to be routed to the congestion cache.

Some clients will encounter some delays but not all and only on the "thundering herd process".

_whiteCaps_ · 2024-02-02T16:55:15 1706892915

When working on an embedded product, we used the last few digits of the MAC address to control timing of API accesses. Obviously doesn't work unless you control the whole stack, but for us, it was an easy fix since we had our own OUI.

btbuildem · 2024-02-05T20:16:41 1707164201

Wouldn't the obvious precaution here be to roll out the new feature to just a portion of the waitlist?

I guess the risk here is that you'd never hit the same thresholds as when letting it all rip at once, but wouldn't it be a bit more graceful and let you work out at least some of the problems in front of a smaller audience?

muffa · 2024-02-02T18:53:43 1706900023

Props to encore to writing interesting blogs, also I just want to say I tried the encore product and it's awesome(I only used it for a hobby project but still)

jodydonetti · 2024-02-02T22:09:58 1706911798

Shameless plug: for protection against thundering herd, cache stampede and various transient failures on .net I created FusionCache.

Hope this may be of help to someone.

https://github.com/ZiggyCreatures/FusionCache

AlbertCory · 2024-02-02T19:25:54 1706901954

> user journeys

meaning, they tested that if the user does everything the way they're "supposed to" it works.

That explains a lot.

bluelightning2k · 2024-02-02T18:28:10 1706898490

Hey - just wanted to give props to the author. I always enjoy Encore content when it pops up here.

Brand is cool too

bluelightning2k · 2024-02-02T18:30:44 1706898644

Although I just checked your pricing on a whim and it's the most expensive per member service I've ever seen for devs. By a LOT.

I mean I still like the brand but wow.

eandre · 2024-02-02T21:46:37 1706910397

Encore CEO here. Thanks for the feedback.

We sell developer productivity and devops automation, and compared to hiring additional engineers Encore is very cheap.

We’ve tried to align our incentives with the needs of our customers, so there are no usage-based or surprise fees when using Encore. The per-seat price may be higher but it’s transparent and predictable.

he0001 · 2024-02-02T18:07:43 1706897263

My experience is that too many doesn’t even know what this problem actually is, and yet it exists in many places, but in different shapes.