Hacker News new | past | comments | ask | show | jobs | submit login
The 5-Hour CDN (fly.io)
417 points by robfig on Aug 3, 2021 | hide | past | favorite | 85 comments



This article touches on "Request Coalescing" which is a super important concept - I've also seen this called "dog-pile prevention" in the past.

Varnish has this built in - good to see it's easy to configure with NGINX too.

One of my favourite caching proxy tricks is to run a cache with a very short timeout, but with dog-pile prevention baked in.

This can be amazing for protecting against sudden unexpected traffic spikes. Even a cache timeout of 5 seconds will provide robust protection against tens of thousands of hits per second, because request coalescing/dog-pile prevention will ensure that your CDN host only sends a request to the origin a maximum of once ever five seconds.

I've used this on high traffic sites and seen it robustly absorb any amount of unauthenticated (hence no variety on a per-cookie basis) traffic.


Back when I was just getting started, we were doing a lot of WordPress stuff. A client contacted us, "oh yeah, later today we're probably going to have 1000x the traffic because of a popular promotion". I had no idea what to do so I thought, I'll just set the varnish cache to 1 second, that way WordPress will only get a maximum of 60 requests per second. It worked pretty much flawlessly, and taught me a lot about the importance of request coalescing and how caches work.


I'll echo what Simon said; we share some experiences here. There's a potential footgun, though, anyone getting started with this should know about-

Request coalescing can be incredibly beneficial for cacheable content, but for uncacheable content you need to turn it off! Otherwise you'll cause your cache server to serialize requests to your backend for it. Let's imagine a piece of uncacheable content takes one second for your backend to generate. What happens if your users request it at a rate of twice a second? Those requests are going to start piling up, breaking page loads for your users while your backend servers sit idle.

If you are using Varnish, the hit-for-miss concept addresses this. However, it's easy to implement wrong when you start writing your own VCL. Be sure to read https://info.varnish-software.com/blog/hit-for-miss-and-why-... and related posts. My general answer to getting your VCL correct is writing tests, but this is a tricky behavior to validate.

I'm unsure how nginx's caching handles this, which would make me nervous using the proxy_cache_lock directive for locations with a mix of cacheable and uncacheable content.


And to add the last big one from the trifecta:

Know how to deal with cacheable data. Know how to deal with uncacheable data. But by all means, know how to keep them apart.

Accidentally caching uncacheable data has lead so some of the most ugly and avoidable data leaks and compromises in recent times.

If you go down the "route everything through a CDN route (that can be as easy as ticking a box in the Google Cloud Platform backend), make extra sure to flag authenticated data as cache-control: private / no-cache.


no-cache does not mean content must not be cached - in fact, it specifies the opposite!

no-cache means that the response may be stored in any cache, but cached content MUST be revalidated before use.

public means that the response may be cached in any cache even if the response was not normally cacheable, while private restricts this to only the user agent's cache.

no-store specifies that this response must not be stored in any cache. Note that this does not invalidate previous cached responses from being used.

max-age=0 can added to no-store to also invalidate old cached responses should one have accidentally sent a cacheable response for this resource. No other directives have any effect when using no-store.


That’s the best synopsis of the cache options I’ve ever read. It’s one of those things I have to pull documentation on every time I use it, but the way you just explained it makes so much sense that I might just memorize it now.

Edit: And now I see that you just copied bits from the Moz Dev page. I'll have to start using those more. I think the MS docs always come up first in Google.


MDN docs are quite good at times. And yes, certain parts were copy pasted in, as I didn't want to accidentally end up spreading misinformation.

Also note that I only mentioned the usual suspects - there are many more options, like must-revalidate.


Speaking of non-cacheable data:

https://arstechnica.com/gaming/2015/12/valve-explains-ddos-i...

Caching is HARD.


In varnish, if you have some requirements flexibility you can enable grace mode in order to serve stale responses but update from the origin, and avoid long requests every [5] seconds.

Not quite the same layer, but in node.js I’m a fan of the memoize(fn)->promise pattern where you wrap a promise-returning function to return the _same_ promise for any callers passing the same arguments. It’s a fairly simple caching mechanism that coalesces requests and the promise resolves/rejects for all callers at once.


I've implemented this manually in some golang web applications I've written. It really helps when you have an expensive cache-miss operation, as it can stack the specific requests so that once the original request is served, all of the stacked requests are served with the cached copy.


"Thundering herd" problem is how I have always heard it called.


The thundering herd problem isn't really about high levels of traffic. To the extent that that's a problem, it's just an ordinary DOS.

The thundering herd problem specifically refers to what happens if you coordinate things so that all your incoming requests occur simultaneously. Imagine that over the course of a week, you tell everyone who needs something from you "I'm busy right now; please come back next Tuesday at 11:28 am". You'll be overwhelmed on Tuesday at 11:28 am regardless of whether your average weekly workload is high or low, because you concentrated your entire weekly workload into the same one minute. You solve the thundering herd problem by not giving out the same retry time to everyone who contacts you while you're busy.


Hmm. I think of thundering Herd being about retries.

All your failing requests batch up when your retry strategy sucks, then you end up really high traffic on every retry, and very little in between


Retries without jitter are indeed a common source of thundering herd problems. Even with exponential backoff, if all the clients are retrying simultaneously, they'll hammer your servers over and over. Adding jitter (just a random amount of extra delay that's different for every client+retry), they get staggered and the requests are spread out.


What do you do when you’re an API SaaS, and it’s your clients’ apps that are making thundering-herd requests?

Imagine you’re a service like Feedly, and one of your “direct customer” API clients — some feed-reader mobile client — has coded their apps such that all of their connected clients will re-request the specific user’s unique feed at exact, crontab-like 5-minute offsets from the start of the hour. So every five minutes, you get a huge burst of traffic, from all these clients—and it’s all different traffic, with nothing coalescesable.

You don’t control the client in this case, but nor can you simply ban them—they’re your paying customers! (Yes, you can “fire your customer”, but this would be most of your customers…)

And certainly, you can try to teach the devs of your client how to write their own jitter logic—but that rarely works out, as often it’s junior frontend devs who wrote the client-side code, and it’s hard to have a non-intermediated conversation with them.


If you have no control at all over the client, then ultimately, you have to just take it and build your service to handle that amount of traffic. Adding jitter is a technique that you use when writing clients. That's why I mentioned it in the context of retries. If you are writing a CDN per the article, at some point your CDN has to make requests back to the origin. If one of those requests fails and you retry, you add jitter there to avoid DoSing yourself. If you are working in a microservices architecture, you add jitter on retries between your services.

The best you can do with clients that are out of your control is to publish a client library/SDK for your API that is convenient for your customers to use and implements best practices like exponential backoff, jitter, etc. If you have documentation with code snippets that junior devs are likely to copy and paste, include it in those.

If you've painted yourself into a corner like you describe and are seeing extremely regular traffic patterns, you might be able to pre-cache. Ie, it's 12:01 and you know that a barrage is coming at 12:05. Start going down the list of clients/feeds that you know are likely to be requested based on recent traffic patterns and generate the response, putting it in your cache/CDN with a five minute TTL. Then at least a good portion of the requests should be served straight from there and not add load to the origin. There are obviously drawbacks/risks to that approach, but it might be all you can really do.


If you're extremely desperate, you can start adding conditional jitter (somewhere within 5ms - 200 ms) to your load balancer/reverse proxy, such as your NGINX/Envoy/Apache box, which sits in front of your API. You can make the jitter conditional on count of concurrent requests or on latency spikes. It's an extreme last resort, and may require a bit of custom work via custom module or extension, but it is possible.

In general, try to avoid not having any control over the client and if you must lack control over the client (such as if you're a pure SaaS company selling a public API), you can apply jitter based on API key in addition to other metrics I mentioned above.

As better engineers than I used to say at a previous engagemen: "if it's not in the SLA, it's an opportunity for optimization"


I like the “jitter based on API key” idea.

It’s somewhat hard in our case, as our direct customers (like the mobile app I mentioned) have API keys with us, but they don’t tell us about which user of theirs is making the request. And often they’ll run an HTTP gateway (in part so that they don’t have to embed their API key for our service in their client app), so we don’t even get to see the originating user IPs for these requests, either. We just get these huge spikes of periodic traffic, all from the same IP, all with the same API key, all about different things, and all delivered over a bunch of independent, concurrent TCP connections.

I’ve been considering a few options:

- Require users that have such a “multiple users behind an API gateway” setup, to tag their proxied requests with per-user API sub-keys, so we can jitter/schedule based on those.

- Since these customers like API gateways so much, we could just build a better API gateway for them to run; one that benefits us. (E.g. by Nagle-ing requests together into fewer, larger batch requests.) Requests that come as a single large batch request, could be scheduled by our backend at an optimal concurrency level, rather than trying to deal with huge concurrency bursts as we are now.

- Force users to rewrite their software to “play nice”, by introducing heavy-handed rate-limiting. Try to tune it so that the only possible way to avoid 429s is to either do gateway-side request queuing, or to introduce per-client schedule offsets (i.e. placing users on a hash ring by their ID, so for a periodic-per-5-minutes request, equal numbers of client apps are set to make the request at T+0, vs. T+2.5.)

- Introduce a middleware / reverse-proxy that holds an unbounded-in-size no-expire request queue, with one queue per API key, where requests are popped fairly from each queue (or prioritized according to the plan the user is paying for). Ensure backends only select(1) requests out from the middleware’s downstream sockets as quickly as they’re able to handle them. Require API requests to have explicit TTLs — a time after which serving the request would no longer be useful. If a backend pops a request and finds that it’s past its TTL, it discards it, answering it with an immediate 504 error.


Jitter is one way to solve it. Request coalescing is another.

It depends on the request type. Is it cacheable? Do you require a per-client side effect? ...


Request coalescing in a shared cache does not solve thundering herd, it just reduces propagation to backend services. Your cache is still subject to a thundering herd, and may be unable to keep up.

The only way to solve thundering herd - which is that a load of all requests arrive within a short timespan - is to distribute requests over larger timespan.

Reducing your herd size by having fewer requests does not solve thundering herd, but may make it bearable.


Retries tend to amplify it, but a more common cause is scheduled tasks in clients/end user devices.

E.g. all clients checking for an update at 10:00 UTC every day, all clients polling for new data at fixed times, etc.


Where does your perspective differ from what I said above?


Thundering herd is about mitigating a problem with backpressure scenarios. If you have a backoff and a delayed queue of requests, letting them all proceed at once when the backpressure scenario resolves is likely to recreate it/create a new one. Staggering them so they proceed slightly off in time avoids that.


unrelated to CDNs but IIRC vitess did/does query coalescing too -- if it starts to serve a query for "select * from users where id = 123" and then another 20 connections all want the same query result, vitess doesn't send all 21 select queries to the backend, it sends the first one and then has all the connections wait on the backend response, then serves the same response to them all.


Vitess still does this. It can also do similar with writes on hot rows where someone is incrementing a counter for example.


Wait, how can it do it with writes/increments? Does it keep track of which rows are hot and add a short but stochastically distributed delay to writes to try to coalesce more updates into a single hit to the DB?

I would think you'd need to do it that way, you wouldn't want to reply "done" to the first increment if that operation is going to be batched up with other ops; you'd want to keep that connection hanging until all the increments you're going to aggregate have all been committed by the backend.

In the select coalescing case, except for bookkeeping overhead, none of the queries are slower (it's a big net win all around because not only do clients get their answers on average somewhat sooner, but the DB doesn't have to parse those queries, check for them in the query cache, or marshal N responses).

But in the increment/write case, it seems like in order to spare some DB resources, some clients will perceive increased write delays (or does it still net a win because the DB backend doesn't have to deal with the contention?).


Do you know if varnish's request coalescing allows it to send partial responses to every client? For example, if an origin server sends headers immediately then takes 10 minutes to send the response body at a constant rate, will every client have half of the response body after 5 minutes?

Thanks!


I don’t know about Varnish, but having worked on other implementations, you would usually have a timeout on the initial lock (semaphore) to prevent a slow connection from impacting all clients.

But this is much, much harder to do once you are already streaming the response - if the time to first byte (TTFB) is quick, but the connection is low-throughout, you can’t do much at this point. But nearly all modern implementations stream the bytes to all clients immediately; they don’t try to fill the cache first (they do it simultaneously).

Some implementations might avoid fanning in too much - maintaining a smaller pool of connections rather than trying get to ”1”, but that’s ultimately a trade-off at each layer of the onion, as they can still add up.

(I worked at both Cloudflare and Google, and it was a common topic: request coalescing is a big deal for large customers)


I think the nginx that members of the public can get from their package manager does not have this feature, and will force each client other than the first to either wait for the entire body to be downloaded or wait for a timeout and hit the origin in a non-cacheable request.


I don't know for certain, but my hunch is that it streams the output to multiple waiting clients as it receives it from the origin. Would have to do some testing to confirm that though.


Varnish has defaulted to streaming responses since varnish 4. I think it gets used for a lot of video streaming use cases.


Is this the same idea as `stale-while-revalidate`?


Love the level of detail that Fly's articles usually go into.

We have a distributed CDN-like feature in the hosted version of our open source search engine [1] - we call it our "Search Delivery Network". It works on the same principles, with the added nuance of also needing to replicate data over high-latency networks between data centers as far apart as Sao Paulo and Mumbai for eg. Brings with it another fun set of challenges to deal with! Hoping to write about it when bandwidth allows.

[1] https://cloud.typesense.org


I'd love to read about it.


This is cool and informative and Kurt's writing is great:

The briny deeps are filled with undersea cables, crying out constantly to nearby ships: "drive through me"! Land isn't much better, as the old networkers shanty goes: "backhoe, backhoe, digging deep — make the backbone go to sleep".


We can't take credit for the backhoe thing; that really is an old networking shanty.


fly.io has a fantastic engineering blog. Has anyone used them as a customer (enterprise or otherwise) and have any thoughts?


Yes, I'm using it. I deploy a TypeScript project that runs in a pretty straightforward node Dockerfile. The build just works - and it's smart too. If I don't have a Docker daemon locally, it creates a remote one and does some WireGuard magic. We don't have customers on this yet, but I'm actively sending demos and rely on it.

Hopefully I'll get to keep working on projects that can make use of it because it feels like a polished 2021 version of Heroku era dev experience to me. Also, full disclosure, Kurt tried to get me to use it in YC W20 - but I didn't listen really until over a year later.


One of my side projects is a DNS hosting service, SlickDNS (https://www.slickdns.com/).

I moved my authoritative DNS name servers over to Fly a few months ago. After some initial teething issues with Fly's UDP support (which were quickly resolved) it's been smooth sailing.

The Fly UX via the flyctl command-line app is excellent, very Heroku-like. Only downside is it makes me mad when I have to fight the horrendous AWS tooling in my day job.


I run my own worldwide anycast network and still end up deploying stuff to Fly because it is so much easier.

The folks who actually run the network for them are super clueful and basically the best in the industry.


just started to use them for an elixir/phoenix project. multi region with distributed nodes just works. feels almost magically after all the aws work I've done the past few years.


What’s magically?

I was under the impression that fly.io today (though they are working on it) doesn’t do anything unique to make hosting elixir/Phoenix app easier.

See this comment by the fly.io team.

https://news.ycombinator.com/item?id=27704852


I still wouldn't say we do any magic Elixir stuff; rather, our platform just happens to have a combination of features (particularly edge delivery for stuff like LiveView and zero-config private networking for clustering) that make Elixir apps sing.

But we've got full-time people working on Elixir now, too; we'll see where that goes. We've still got Elixir limerence here. :)


Hey Thomas, weren’t you running Latacora last time I checked?



I haven't been at Latacora for a while now.


They're not doing anything special to make Elixir specifically better yet, but their private networking is already amazing for it - you can cluster across arbitrary regions completely trivially. It's a really good fit for Elixir clustering as-is even without anything specially built for it. I have no idea how you'd do multi-region clustering in AWS but I'm certain it'd be a lot harder.


I read their blogs and I visit their site every new project I start but it just hasn't clicked with me yet.

Tinkering has been great but the addon style pricing scares the jeebs out of me (my wallet), I just assume I can't afford it for now and spin up a DO droplet. The droplet is probably more expensive for my use case but call it ADHD tax haha, at least it's capped


I've used them in the past. All I can say is that the support was (and probably still is) fantastic.


>The term "CDN" ("content delivery network") conjures Google-scale companies managing huge racks of hardware, wrangling hundreds of gigabits per second. But CDNs are just web applications. That's not how we tend to think of them, but that's all they are. You can build a functional CDN on an 8-year-old laptop while you're sitting at a coffee shop.

huh yeah never thought about it

I blame how CDNs are advertised for the visual disconnect


It's misleading.

CDN software might be simple in the basic happy case, but you still need a Network of nodes to Deliver the Content.


Well it's a self serving article! It's easy to turn up a network of nodes on Fly.io. It's a little harder, but not impossible, to do the same elsewhere.


Years ago I was involved with some high performance delivery of a bunch of newspapers, and we used Squid[1] quite well. One nice thing you could do as well (but it's probably a bit hacky and old school these days) was to "open up" only parts of the web page to be dynamic while the rest was cached (or have different cache rules for different page components)[2]. With some legacy apps (like some CMS') this can hugely improve performance while not sacrificing the dynamic and "fresh looking" parts of the website.

[1] http://www.squid-cache.org/ [2] https://en.wikipedia.org/wiki/Edge_Side_Includes



As someone who’s mostly clueless about BGP but have a fair grasp of all the other layers mentioned, I’d love to see posts like this going more in depth on it for folks like myself.


Some of the things they miss in the post are Cloudflare uses a customised version or Nginx, same with Fastly for Varnish (don't know about Netlify and ATS)

Out of the box nginx doesn't support HTTP/2 prioritisation so building a CDN with nginx doesn’t mean you're going ti be delivering as good service as Cloudflare

Another major challenge with CDNs is peering and private backhaul, if you're not pushing major traffic then your customers aren't going to get the best peering with other carriers / ISPs…


HTTP/2 prioritization is a lot of hype for a theoretical feature that yields little real world performance. When a client is rendering a page, it knows what it needs in what order to minimize blocking. The server doesn't.


Yes, which is why the browser send priorities with the requests but many servers ignore these and just server responses in what ever order suits them.

If a low priority response is served before a high priority one the page is likely to be slower to render etc.


> 3. Be like a game server: Ping a bunch of servers and use the best. Downside: gotta own the client. Upside: doesn't matter, because you don't own the client.

"If you can run code on it, you can own it". Your front page could just be a tiny loader js that fires off a fetch() for a zero byte resource to all your mirrors, and then proceeds to load the content from the first responder.


Now you just have the bad latency of the non-cached content, plus the ok latency of your CDN.


> DNS: Run trick DNS servers that return specific server addresses based on IP geolocation. Downside: the Internet is moving away from geolocatable DNS source addresses. Upside: you can deploy it anywhere without help.

Can anyone expand on how/why "the Internet is moving away from geolocatable DNS source addresses"?


Some public/recursive DNS Servers like Cloudflare (1.1.1.1) do not tell the authoritative dns server the ip address or subnet of the requestor. Your ISP's DNS server usually does this. This makes CDN via DNS more difficult, as it is not always entirely clear from where the request comes (Cloudflare itself does not need this, they do everything with Anycast).


Sounds like a fun weekend project


It is strange that you put a Time duration in front of CDN ( content delivery network ), because given all the recent incident with Fastly, Akamai and Bunny, I read it as 5 hours Centralised Downtime Network.


Does Nginx still not support cache invalidation? If you setup long TTL, is there a way to remove some files from cache without nuking entire cache and restarting an instance?


It's supported, but only for NGINX Plus. You can kind of work around it by using proxy_cache_bypass though


Or delete the file in question on disc, the full path is encoded, and I've come across scripts and lua modules that does it for you.


The hard part of building a CDN is not setting up an HTTP cache, it is setting up an HTTP cache that can serve thousands of different customers.


Making a service multitenant is more complex, yes. But many companies roll their own CDNs. There are lots of good reasons to do that, and it's a problem that can be reduced to a single developer for understanding.


I like to blog from the raw origin and not use CDNs because if a blogpost is changed I have to manually purge the CDN cache, which can happen a lot. Also CDNs have the caveat in that if they're down, it can make a page load very slow since it tries to load the asset.


If you’re okay with every request having the latency all the way to your origin, you can have the CDN revalidate its cache on every request. Your origin can just check date_updated (or similar) on the blog post to know if the cache is still valid without needing to do any work to look up and render the whole post.

To further reduce load and latency to your origin, you can use stale-while-revalidate to allow the CDN to serve stale cache entries for some specified amount of time before requiring a trip to your origin to revalidate.


> If you’re okay with every request having the latency all the way to your origin, you can have the CDN revalidate its cache on every request.

It's also worth mentioning that even when revalidating on every request (or not caching at all), routing through a CDN can still improve overall latency because the TLS can be terminated at a local origin server, significantly shortening the TLS handshake.


Ah, the TLS shortening aspect of a CDN is something that seems obvious in hindsight but I'd never really thought about it. Thanks!


Some of the HTTP/2 and HTTP3 design choices are seen as trying to solve this problem another way.

If a round trip to New York is too long, then twenty of them is way worse. So I can either do 20 round trips to Nevada, which does <20 round trips to Chicago, which does <<20 round trips to New York. Or, I can do some more cleverness with transport and session bootstrapping and end up with 14 round trips to New York.


Not just tls but generally tcp will slowstart faster on lower rtt connection (and edge can keep origin connection always open so it stays “warm”)


Also CDN providers will hopefully have good pearing. My company uses OpenVPN TCP on port 443 for maximum compatibility. When around the globe the VPN is pretty slow, so I proxy the tcp connection via a cheap VPS, and speed goes from maybe 500kbit/s to 10Mbit/s, just because the VPS provider pearing is way better than my company "business internet". (The VPS is in the same country as the VPN server).


We've seen people use background revalidation to great effect, particularly in front of S3. You can get pretty close to one stale request per cache entry this way. And if-modified-since requests are really cheap.


I set an s-maxage of at least a minute. Keeps my servers from being hugged to death while not having to invalidate manually.


You can fix this with proper cache headers


Author has a great sense of humor. I love it!


Fly is great and I love reading their blog posts.

Just hoping they come back around on CockroachDB-- I feel like it's a match made in heaven for what they're providing.


We love CockroachDB. There are people tinkering with it on Fly.io. I think anything formal would involve our companies talking to each other, which we're happy to do, but everybody is busy all the time. :)



PM at CRL here--we love Fly too! Definitely can see our two products working together!


Waiting for IPFS to shake this all up.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: