Hacker News new | past | comments | ask | show | jobs | submit login
CDN vs S3 (jdorfman.posthaven.com)
140 points by mclarke on May 18, 2013 | hide | past | favorite | 82 comments



For everyone saying this is obvious: I'm one of those idiots that didn't quite get it and I appreciate this post a lot.

For the past week, I've been working on writing a little static site generator and putting it on S3 and I thought I was pretty damn clever for finally getting around to doing that... except now it turns out I'm clueless yet again. I'm looking at Cloudfront now, but I'm still not sure if has all the features that I was expected out of S3 alone (someone already mentioned Route 53 integration).


I don't think you're an idiot, it's more likely that you were just unaware of cloudfront or perhaps CDNs in general.

S3 is for storing files, Cloudfront is for serving cached versions of them really quickly out of edge locations (i.e. the closest CDN datacentre to the user that requests it).

In your usecase, your best bet is to use them in combination. Set up a Cloudfront distribution to point to your S3 bucket, then setup DNS to point to your Cloudfront distribution.

One big point to note is that you need to either:

* Configure Cloudfront to expire objects after a TTL (time-to-live) that is reasonable to you (e.g. 1 hour, 1 day etc). You can do this from the Cloudfront 'new distribution' wizard.

OR

* Let Cloudfront respect HTTP headers and then make S3 (or whatever your custom origin is) set cache-control headers that make sense for how often you update your site. Not sure if/how you can do this with S3, with a custom origin its your app so you can set whatever http headers you like.

To be clear: if you don't do this, I'm pretty sure cloudfront caches things forever, or at least a very long time.

Personally, I think triggering cache invalidations should only be for emergencies (e.g. someone has uploaded questionable content to serve to other users and it's cached in at and edge). Rather than screwing around with that, save yourself some headaches: pick a sensible TTL and wait a little longer to have things up to date at your edges.

Note that by using Cloudfront in this manner, you get the performance benefit of serving static files. If performance at the expense of convenience was your main reason for going with static site generation, you might want to rethink that decision (there are other perfectly good reasons for wanting to use a static site generator, security being my favourite).

Do feel free to ping me over email if you have any questions on the above.


>If performance at the expense of convenience was your main reason for going with static site generation, you might want to rethink that decision (there are other perfectly good reasons for wanting to use a static site generator, security being my favourite).

Great comment. I didn't get this part though. What would be a better alternative?

Performance is one of the main reasons I considered this (uptime is another). Let's just say I've had the same shared hosting for over 5 years and the speed/uptime have been a disappointment for a long time. When I was working on a site and noticed a 500kb background image was taking 2 seconds to load and around the same time I saw that the spotify homepage was streaming a fullscreen video instantly, that was kind of the last straw.

So I thought the idea was that skipping dynamic generation, using distribution (well, I think my assumption was S3 did have edge locations), and just having a better host was a big win.


> I didn't get this part though. What would be a better alternative?

To clarify: I'm not saying that static assets behind an edge cache is not performant, just saying that a dynamically generated site behind an edge cache is effectively the same performance wise. It's probably not worth the sacrifice in convenience if performance was your main reason for going the static generated route.

I can't speak to the argument from uptime as I haven't used shared hosting in a while. Using an edge cache (without S3) might give you a little help there, as it only needs to hit your shared hosting on cache expiry, but that obviously won't be as safe as statically generating the files and making CF read them out of S3.

I think S3 behind CF is a perfectly good approach. I was just saying that if you've currently got a dynamically generated site and are considering moving to static generation because of performance alone, the trade-off probably isn't worth the effort.

I wouldn't advise you personally to go back on that decision at all, especially because of the issues you've seen with uptime.

> I think my assumption was S3 did have edge locations

I think you get this from the rest of my comments but just to be clear: I've only ever experienced bad download times from S3 and would not feel comfortable recommending that you use it to serve traffic directly from the internet. It's not what S3 is for and so you shouldn't expect good performance from it in that use case.


I was working on a site and noticed a 500kb background image was taking 2 seconds to load and around the same time I saw that the spotify homepage was streaming a fullscreen video instantly

I don't think that's a fair comparison - it seems unlikely to me that the video requires much data to get started, or a sustained 250kB/s (about 2 megabits/s) connection to play.


You should simply put CloudFront in front of S3.

In that case, you get the benefit of both worlds.


You're not clueless. S3 is almost certainly good enough for what you're doing.


I wrote a static site generator at the beginning of this year:)

https://github.com/jimktrains/gus


@jimktrains2 +1


You should really be using CloudFront and "NOT" S3 if you are looking for a CDN solution. There is a reason why CloudFront exists after all. S3 is a merely a drop-in storage option, consider it as a big file system residing within big fat piped data centers.


Back Fastly with your S3 static site. Works really well.



I think I'm going to go with CloudFront, but I'm pretty upset about not being able to use a naked domain.


True, S3 is not a CDN, but for a lot of use cases, serving directly from S3 is fine.

Taking Vine for example, I picked a random vine from twitter (https://vine.co/v/bE3YI365gxd) and a popular vine from twitter (https://vine.co/v/bEFFxdwjK9x). The popular video and thumbnails are served from a CDN, where the new one with no traffic is served directly from S3.

YouTube did the same thing in it's early days. CDNs are not required for all traffic, and blindly recommending them is a bad prescient. They're really great for content that is frequently accessed, but their value greatly decreases on long-tail content.


When S3 first came out, people used it for this without considering its latency issues, and it was a disaster. S3 is not a CDN and shouldn't be used as one; this is the reason for CloudFront.

While I can't compare CloudFront to other CDN's, I do know it works well for my clients that have been using it (and certainly better than serving directly from S3).


Since this thread will likely turn into people asking about Cloudfront performance, does anyone have any real-world experience with CloudFront vs. Rackspace with Akamai CDN?

On paper the Rackspace one looks like a great performance/price alternative.


I am in the process of abandoning Cloudfront, because they have a serious bug when serving video files. They serve HTTP 206 (Range-Get) as HTTP 1.0 but 206 didn't exist in HTTP 1.0. Chrome and Firefox treat this as "uncacheable", thus media assets bypass the local media cache.

Depending on the nature of the content, Cloudfront is not usable for video, particularly the kind where people seek around a bunch, like instructional videos and tutorials. Also, people with slow connections expect the video to buffer while paused. This doesn't happen if you serve the videos from cloudfront.


> HTTP 1.0

If that is the only problem, flipping one bit in every response seems like a really simple solution. Why hasn't Cloudfront fixed it yet, do they know about this?


Yes they know about it. Below is the response from Amazon. The logic they employ is that since it is broken in an old version of Squid, it is fine for it to be broken on Cloudfront.

https://forums.aws.amazon.com/thread.jspa?messageID=351384&#...

- - -

Hello,

While we are aware of the issue with range request HTTP/1.0 206 responses and Chrome, we cannot provide an ETA for a fix. Since this issue is specific to range requests, an immediate workaround is to disable range requests on your origin server if this is possible for your use case.

It is also worth mentioning that multiple web proxy and cache application vendors have using HTTP/1.0 as a de facto standard for many years, so you will probably sporadically get similar reports from your end users using Chrome, but not other browsers such as Firefox or Safari. For example, here is a discussion between a Chrome developer on the mailing list for the popular Squid web cache about a similar report: http://www.squid-cache.org/mail-archive/squid-dev/201204/011... I am not saying that always returning HTTP/1.0 will stick around forever, but it is fairly common in real world situations today.


Why would Cloudfront cause the player to stop buffering the video? Does it happen with a HTTP Download distribution, or the RTMP streaming distribution? I've used both CloudFront and Akamai for serving video, and I don't remember running into those issues. I have only tested it with normal HTTP Download distributions, however.


Cloudfront serves an invalid HTTP response, thus Chrome/Firefox refuse to cache the data. The theory is, it is better to not risk caching bad data. Chrome will still play the video served from Cloudfront, but it wont write any of the data to the media cache. Thus when you pause and buffer, it will only buffer a few seconds of video in the current playback window. It never writes to the cache so buffering must stop.

To see this in action, play a video with chrome://media-internals/ open.

A video served from S3 will get saved to the media cache, and the full video can buffer when paused. A video served from cloudfront will only buffer a few seconds of video.

Update:

It is probably just easier to see for yourself:

Here is a video served from S3. Pause the video in chrome, and you will see the whole video gets buffered:

http://s3.amazonaws.com/jctest20080526/test/test.mp4

Here is the same video served from Cloudfront. Start the video, and pause it, and notice how it doesn't buffer more than a few seconds:

http://d3oocspv43fsel.cloudfront.net/test/test.mp4


Oh is that why my Coursera videos suddenly won't buffer for more than a few seconds? I figured it was an optimization, like with YouTube, so they don't waste bandwidth loading videos for people who have the tabs open but won't watch them.

Do you know of any workarounds client-side?


> Do you know of any workarounds client-side?

I don't think there is one. You can't even compile a custom Chromium because it doesn't have h264 codecs. I even spend a day reading the Chrome source code, hoping there was some combination of headers that could trick Chrome into using the media cache, but I didn't find anything. I probably should have just spent the day finding a better CDN.


Ugh, that's too bad. Thank you for the reply.


Thanks for the info, thats very informative. Can you recommend any CDNs that work well with online video? It would be great to find one that is as easy to configure as cloudfront that doesn't have this issue.

I'm not a big fan of Akamai's control panel -- setting up new distributions is way to complicated (albeit more flexible), and configuration changes were taking almost a day to propogate.


I haven't had the time to find another vendor, thus can't recommend one.

Amazon promised a fix, but then back tracked. Their explanation was something along the lines of "an old version of Squid has the same problem, thus it is okay". I cannot comprehend the logic system they employ to think that is a good reason to live with the bug.


Both buffer while paused on Chrome iOS.


Note that Chrome on iOS is basically unrelated to every other version of Chrome. Due to Apple's silly rules, it's essentially just a reskinned Safari.


Chrome on iOS is probably using Apple's AVPlayer, which makes sense because there is probably no other way to access the dedicated decoding hardware. Apple is not afraid to Cache an HTTP 1.0 206 response.


Both are buffering fine for me on Chrome for Android, too.


If you haven't found a good alternative yet, we'll be happy to help.

/co-founder at Advection.NET


I'll have a look. Here is what I need:

CDN that support Range Requests correctly. I want to use S3 as the origin. I need an edge location in Australia. Also, there can be no 301/302 HTTP redirects, thus I have eliminated Google as a potential offering. My current spend is around $1000 a month, so it is a tiny account.


Don't have an edge location in Australia currently, but our Hong Kong POP could work with reasonable latency - we can test that. Let's discuss offline.

malatortsev at advection dot net


You're doing something horribly wrong. I work for a live streaming company and we make extensive use of Varnish. It can probably solve the problem you're describing.


I'm not doing a damn thing wrong other than using Cloudfront. The problem is on their end, not mine. Thinking Varnish could solve this problem is utterly confused. Do you know what CDN does? CDNs have servers located around the world so files are loaded quickly and with low latency.

Furthermore, your profile suggest you work for a pump and dump penny stock company (basically a scam). If your employer is paying you in something other than cash, you need to walk away asap.


Sounds like troll bait, recall the recent article about PG's modding algorithms. Life's too short.

There's lots of video CDN "solutions", and it's almost always cheapest (even after labor support) to DIY with bare metal at very large scale. If it were me, I would eval video CDN shops using tsung test cases wired up as nagios checks. Gotta make sure their stuff stays working.

A payment gateway once mistakenly deployed API changes to production without notice. Trust no one.

Anyone evaluated? http://live.bittorrent.com


and it's almost always cheapest (even after labor support) to DIY with bare metal at very large scale

If you simply need to deliver files or live streams, without needing to provide complex functionality at the edge (various kinds of protection, geo blocking, or pay-per-minute), and your traffic patterns are predictable - it's often cheaper to build your own solution. Once you start thinking about backbone and colo redundancy, deploy in different countries with contract commits - things get expensive very quickly.

The beauty of using a massive third party delivery service isn't performance, it's elasticity. Just like with the web apps (frequently hosted on DIY systems) that go down as soon as the link goes up on HN - being able to absorb traffic spikes without failing (and without forcing you to commit to a higher tier for a year) can be very valuable.


[deleted]


I am sincerely not being spiteful. I am trying to warn you. Your company's unaudited financials state that there is only $7,000 in the bank.

http://www.sec.gov/Archives/edgar/data/1499274/0001096350130...

I sign my own paycheck, so the notion of whether he is overpaying is a bit confusing.


I'm entirely aware of the financial situation. How is calling my employer a pump and dump scam NOT spiteful when I've worked on this project from the beginning?

Edit: Also, I don't appreciate you posting that. It's completely off-topic. Keep it classy.


At this point, the article has left hackernews. I am writing to you as fellow hacker looking to help you out.

You are involved in a stock fraud. The company you work for is a sham.

If you live in the US, then you have a plausible defense that you have no understanding of the underlying business. In this case, you likely can't afford the lawyer to present this case.

If you don't live in the US, then be careful. Imagine, ten years down the road, you are a successful engineer, and want to take your family to Disney world. Unfortunately, there is an outstanding bench warrant for your arrest, and rather than a nice family vacation that your wife wanted, you end up in a US prison.


Have you (or anyone else) had a chance to A/B test caches? That is, setup network API requests to be duplicated/filtered from production and sent to test environment(s).. credit to netflix.

Setup enough identical boxes with each of Squid, Nginx, Varnish, trafficserver, etc. and evaluate each with basically the same traffic and however much tweaking.


Let's keep in mind that individual box performance will not directly translate into your cluster performance or global network performance. A box can be fine-tuned to serve a file at lightning speed, but once you connect a bunch of them together, and start delivering lots of different files to millions of people - different factors come into play. Distributing files, replacing files when updated, content churn, etc etc

Simple caching works for images, but doesn't work for large video files, for example (look at latest financials from public CDNs - they are all bleeding cash).

It's really not that simple as testing a box to see which setup works best.


Can also be done with logs - play back same request logs against several different setups, compare performance.


I don't know about Cloudfront vs Rackspace+Akamai, however, based on some quick tests I did a year ago, I seem to recall that Rackspace+Akamai didn't deliver the same performance as professional Akamai services. In my experiments, Rackspace+Akamai used different servers, often located farther away, than big Akamai sites. (If anybody has background information about this, I would be very interested to hear.)


Rackspace CDN says this:

"Rackspace uses 213 of Akamai's edge locations, selected especially for our customers' typical usage patterns, and designed to cover all major areas of the globe." http://www.rackspace.com/cloud/files/

Akamai has approximately 5 gazillion edge locations, so yes, it is a cut down version. It is still a lot more POPs than just about every other CDN, though this doesn't necessarily translate into performance.


Ah, thank you. That info wasn't there when I conducted the tests: http://web.archive.org/web/20120915095343/http://www.rackspa...?

"though this doesn't necessarily translate into performance" - Yes, even when the files were already cached, I wasn't always satisfied with the performance.

A detail that bugged me, by the way, was the high number of CNAME requests, although this should at worst affect the first view.


Amazon Web Services does offer a worldwide CDN, CloudFront. http://aws.amazon.com/cloudfront/


Cloudfront doesn't offer some services that I find necessary, such as nested directory indices (e.g. example.com/folder/ instead of example.com/folder/index.html) and it doesn't return a 404 header on missing pages. I just emailed MaxCDN to see if they provide these.


Cloudfront as a CDN supports directory indexes and 404s, it's just S3 that doesn't. If you point a CF distribution at your own server with directory indexes enabled, CF will send those through to the user.


S3 supports both of those, via its "bucket as a website" feature.


I thought this was obvious? Amazon's Cloudfront on the other hand is a CDN and works great :)


If only Route53 allowed you to point the apex domain at Cloudfront (as I understand it, it's currently S3 or ELB).


You may not want to do so as:

  1) POST (and so are PUT, DELETE, OPTIONS and CONNECT) 
     are not yet supported on CloudFront.
  2) HTTPS/SSL for your own domain is not yet supported 
     on CloudFront.


@davidandgoliath you would think so. But a lot of companies ignore it.


You also seem to have ignored it in your post, any reason you didn't test cloudfront?


@davidandgoliath I work for MaxCDN so I would probably get fired. =P


So you post something that basically compares apples and oranges? Honestly, this post turns me off from MaxCDN.


Agreed. Especially if he was well aware that AWS offers a comparative service, but instead chose to compare to a different one.


Why would you get fired for covering an obvious and well-known CDN alternative in your blog post? Honestly interested :-)


Because it's not in his or his company's best interest? I bet that Cloudfront's performance isn't too different from MaxCDN's.


What is next? A blogpost saying "Hammers are terrible screwdrivers. Don't use a hammer with a screw!"?


    $ curl -I http://phaven-prod.posthaven.netdna-cdn.com/uploads%2F2013-05-17%2F20%2F3128%2FErQE0vKlNMIeNvaxbneY75nWy

    HTTP/1.1 403 Forbidden
    Date: Sat, 18 May 2013 20:31:08 GMT
    Content-Type: application/xml
    Connection: keep-alive
    x-amz-request-id: 41706FB9149898AF
    x-amz-id-2: d5F1JMIBLaQzNG5A
Boo. :)


@kmfrk

PostHaven cut off the URI:

curl -I http://phaven-prod.posthaven.netdna-cdn.com/uploads%2F2013-0... HTTP/1.1 200 OK Date: Sat, 18 May 2013 20:34:13 GMT Content-Type: binary/octet-stream Content-Length: 52958 Connection: keep-alive x-amz-id-2: NO6o51/19JsQJN9YHc+T/sraZSGNT+f3R+1GWl2QL3aD4SubqazjbMURb4VYaZyS x-amz-request-id: E640348D2D6EDA7B Last-Modified: Sat, 18 May 2013 00:47:10 GMT ETag: "f95534e9752b560f4acdda20228f90ba" Server: NetDNA-cache/2.2 X-Cache: HIT Accept-Ranges: bytes


As the sysadmin to a company that does use both S3 and Cloudfront, I'm a little shocked anyone would think to use S3 for distribution. A little testing will reveal just how slow S3 can be.


obvious: Amazon product isn't that great @ a service that's optimally provided by another Amazon product.

http://aws.amazon.com/cloudfront/


@molecule obvious to you and I. I wrote this to inform those who think it is a good idea to use S3 as a CDN, that it isn't. If we can educate a few developers then we (this awesome community of hackers) are making the web faster.


> If we can educate a few developers then we (this awesome community of hackers) are making the web faster.

You're not making the web faster, you're shilling for your employer by comparing their apples to a competitor's oranges and proclaiming "our competitor's oranges make bad apple sauce!"

Failure to mention CloudFront is disingenuous.


CloudFront


So why is the title of your post "CDN vs S3"? Shouldn't it be something like "Don't use S3 as a CDN"?

To be honest, your post feels like spam for MaxCDN.


Honest question: Do you actually know anyone that thinks S3 is appropriate as a CDN?


Tumblr and Twitter both used to, and I've met a bunch of developers at meetups who rather than think S3 is a CDN, do not understand what a CDN is.


Using S3 for file hosting is still a better solution than using a raw box (or shared virtual hosting) which is what most people who use S3 instead of a CDN would have been doing otherwise.


Who said s3 is a CDN at the first place!?


Certainly not Amazon, or they wouldn't have provided a service that acts as a CDN based on an S3 bucket.


> I think S3 is a great origin server for static assets

Pro-tip: If you're using Rails, just create a distribution with your app as the origin server and in production.rb, set your asset host to your distributions host. You get the asset cache without having to do the precompile step. Tastes great with Heroku.


This is smart, I usually use the precompile with upload to S3 via asset_sync gem. Since Heroku will precompile by default, what do you do to disable it. Just turning on config.serve_static_assets = true is likely not enough.


> This is smart, I usually use the precompile with upload to S3 via asset_sync gem

I used to do the same until cloudfront rolled out custom origins.

> Since Heroku will precompile by default, what do you do to disable it.

I misspoke in my comment, what I meant was you can skip the synch with S3 step. I've never actually bothered to stop heroku precompiling assets (though I may as well). This question on SO looks promising:

http://stackoverflow.com/questions/8953360/preventing-heroku...

If you do that though, as you mentioned, you will definitely need to flip serve_static_assets on.


I was using S3 to deliver secure signed downloads to customers. It worked well enough for a long time but eventually customers started having major connectivity issues and dead slow downloads.

I switched to CliudFront and, of course, downloads improved dramatically. I had to go with CF because we needed signed downloads. Would be nice to have alternatives but I'm happy with CF.



What do you think CloudFront is for?


Just put Cloudflare in front of it, it's free :)


This is what I ended up doing. I've been using Cloudfront for a while now, but Cloudflare is free, and people seem to indicate it is just as fast if not faster.

I have my static assets on a sub-domain, so I just set cloudflare to cache everything on that subdomain (and left it off on everything else).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: