Can you elaborate on CloudFronts "poor cache hit ratio"? Wouldn't this just be u...

saurik · on Dec 9, 2012

One of the main ways that CDNs compare with each other is how well they manage to actually cache content and not have to go back through to your origin server; I see a lot of developers go the path you are, saying "well, that's fixed, isn't it?", but if you think about the architecture of how you build a CDN, you rapidly see that that doesn't really make sense.

Yes: if you had a single server somewhere out in the cloud that was sitting in front of yours as a cache (which is how most developers seem to conceptualize this: such as running a copy of Varnish), it is easy to say "ok, I'll cache this for a day"; however, you first are going to run into limits on what can be stored on that box: there is only so much RAM, and it doesn't even make sense to cache everything anyway.

You can spill the content to disk, but now it might actually be slower than just asking my server for the content, as my content might have it in RAM. This is a major differentiating factor, and some CDNs will charge more to get guaranteed access to RAM for either certain files or in certain regions or for certain amounts of data, as opposed to getting spilled off to some slower disk. However, there is also only so much disk.

Even with an infinitely large and very fast disk, though, you don't have one server: in an ideal situation (Akamai), you have tens of thousands of servers at thousands of locations all over the world, and customers are going to hit the location closest to them to get the file. Now: what happens if that server doesn't have the file in question? That is where the architecture of how you build your CDN really starts to become important.

In the most naive case, you simply have your server contact back to the origin; but, that means as you add servers to your CDN, you will require more and more places that will need a copy to get cached. This isn't, in essence, a very scalable solution. Instead, you want to figure out a way to pool your cached storage among a bunch of servers... and somehow not make that so slow that you are better off asking my server for the original copy.

You then end up having bigger servers in each region that a front-line server can fall back on to get the cached copy, but that server is now going to become more and more of a centralized bottleneck, as more and more traffic will have to flow through it (as a ton of stuff always ends up not being currently warm in cache). It also eventually looks more and more like the centralized server straining for storage, and will have to evict items more often.

Some places might spend more time trying to specialize specific front-line servers for specific customers (some kind of hashing scheme), but then the CNAME'd DNS gets more complex and is less likely to be cached in whatever ISP this is, or you can attempt to distribute routing tables between your own servers for where to find content, etc.; this simply isn't a simple problem, and certainly doesn't come down to something as simple as "well, I told them to cache it, so they did: read my Cache-Control headers".

In some cases, you might even "well ahead of time" go ahead and download a file that you think might be valuable to other regions; you might also attempt to optimize for latency, and do prophylactic requests for things that clients haven't even asked for yet, so you can get them cached and ready for when they do (CDNetworks, for example, normally re-request files that are actively used when they are still 20% away from expiring, to make certain that the file never expires in their cache, which would cause a latency spike for the poor user who first requests it afterwards).

http://blogs.gartner.com/lydia_leong/2011/12/23/akamai-buys-...

> ... Cotendo pre-fetches content into all of its POPs and keeps it there regardless of whether or not it’s been accessed recently. Akamai flushes objects out of cache if they haven’t been accessed recently. This means that you may see Akamai cache hit ratios that are only in the 70%-80% range, especially in trial evaluations, which is obviously going to have a big impact on performance. Akamai cache tuning can help some of those customers substantially drive up cache hits (for better performance, lower origin costs, etc.), although not necessarily enough; cache hit ratios have always been a competitive point that other rivals, like Mirror Image, have hammered on. It has always been a trade-off in CDN design — if you have a lot more POPs you get better edge performance, but now you also have a much more distributed cache and therefore lower likelihood of content being fresh in a particular POP.

Given all of this, what I've heard from people using CacheFront that have shopped around and know enough to pay attention to this kind of metric is that their cache-hit ratio is somewhat poor in comparison to other CDNs you might use. I am thereby curious if that's one of the things that is causing CloudFlare to come up much better than CloudFront, if it is a specific feature from CloudFlare that is helping, or if it is just that CloudFront is so expensive in general (bandwidth from CloudFront is ludicrously bad: even Akamai tends to hit you with initial quotes that are better than what CloudFront offers; but 95% reduction in price seems "unbelievable").