Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Your Static Site to a Global Market for a Fraction of the Cost on AWS (medium.com/elliot_f)
138 points by emforce on Aug 15, 2018 | hide | past | favorite | 61 comments



I'm surprised no one is mentioning Netlify in the comments.


Netlify is one of my new favorite apps. It's does so many simple things really well and the roadmap of features is pretty dope. Best static site toolchain service I've seen.


Big fan of Netlify and use it for several sites. They have had some issues with DDoS attacks lately (and I've seen outages), but the blips seem to be under control.

The other one to consider is Google Firebase Static Hosting. Really excellent performance.


1000x this. Netlify is one of those tools that I really have trouble remembering life without, especially for static sites. The first time I used it, I distinctly remember saying, “oh wow,” aloud. And I still find myself saying that the more I get to know their offerings.


What is Netlify?



When I go to https://www.netlify.com/pricing in Firefox I get this error:

Corrupted Content Error The site at https://www.netlify.com/pricing has experienced a network protocol violation that cannot be repaired.

Never seen this error before. Seems okay in Chrome and IE, so not sure what's going on in Firefox.


I get this quite a lot in Firefox, especially on O365. It's usually fixed by doing a hard-refresg (ctrl-Shift R or something).


Data point: For me it's fine in Firefox Nightly from The Netherlands.


AWS and the like make a killing from the basic fact that most web devs don't know anything about the real costs of bandwidth (a blind spot, if you will).

HE.net is currently advertising 10gig ethernet for $1300/mo. Obviously there are various scaling issues associated with this but the basic premise still holds - AWS and other cloud services need major pressure to push their bandwidth prices down.


This is less about cost of bandwidth and more costs of having edge POPs around the world.

With cloudfront (or any other edge cache system), you get TCP termination closer to end-users so the site loads faster, particularly if you have users around the world. Dropbox case study: https://blogs.dropbox.com/tech/2017/06/evolution-of-dropboxs...

Edge cache bandwidth costs quite a bit more than the not quite bottom of barrel IP transit that HE.net is selling.


I have set up some sites the same way, the only ugly part for me was that I wanted to use the features of S3 web hosting interface, but restrict all traffic to come through CloudFront. This blog post (not mine) describes the problem and the approach: https://abridge2devnull.com/posts/2018/01/restricting-access...


Have you thought about using Cloudflare Workers instead? It's pretty easy to host from a bucket: https://developers.cloudflare.com/workers/recipes/static-sit...


I wanted to stay fully AWS, it was for personal projects so it is more about convenience so I didn't even consider other CDNs.


there's actually a way to do this with OAI, you just have to configure the html5 routing stuff using CloudFront distribution rules instead of configuring the S3 bucket as a static website. As a nice side effect, you can also enforce https<>https<>https communication all the way through the S3<>Cloudfront<>world chain, which isn't possible to force when the S3 bucket is configured for static site hosting.

I have about 75% of a blog post about how to do this, it's not terribly complicated. This comment just gave me the motivation to finish it, I'll post it on HN when it's ready.


yeah, I tried to do that or something similar first, IIRC the s3 website setup was handling the index document and error document stuff, but using cloudfront to fetch through the S3 API wasn't doing that so that is how I ended up where I ended up.


Just wondering, what's the motivation to restrict access? Is it to keep costs down in case someone decides to run up your bill with a flurry of requests?


Just so the CloudFront logs and monitoring are complete. It won't affect the hosting cost as far as I know, I just use billing alerts on the overall account to keep track of that.


Right now, I am hosting my personal site and blog on S3 + Cloudflare for 6 cents / month. I use Middleman, because I feel that it offers more flexibility than Hugo.


Cost-comparisons aside, the massive speed improvement makes me think he didn't have caching configured correctly with Cloudflare and his PHP backend (thus defeating the entire point of the CDN aspect).

I'm guessing a couple of Cache-Control: headers would have provided similar latency improvements.


Comparisons were done with CloudFlare on static site, it's been a very long time since the site was based on Laravel and PHP.

I had set Browser Cache Expiration to 1 month within my CloudFlare settings as was luckily screenshotted within a previous article: https://medium.com/@elliot_f/my-journey-into-web-speed-optim...

I'm fairly sure that I had some speed optimizations set within my Nginx server block. Unfortunately, I don't have the Nginx config file to hand anymore as I've (somewhat stupidly) deleted the snapshots without taking a backup.


I'm no Cloudflare nor Cloudfront expert but these should be apples-to-apples comparisons and seeing that huge disparity in latency makes me think it's most likely a configuration issue, as all comparisons I've ever seen claim they perform roughly about the same: https://blog.latency.at/2017-09-06-cdn-comparison/

Specifically though we're talking about how long Cloudflare caches your content on their proxy/edge servers. Browser caching is irrelevant for this discussion, as speed tests of this nature should always be performed on a clean request.


I can't claim I am either, unfortunately. I required assistance ensuring the certs were in place and my configuration was correct when doing the migration.

And just to clarify, based on some of the articles I've read, I actually think CloudFlare may be a better choice of CDN, this post just highlights one way of achieving a global-scale website and doing it in such a way that it's resilient and cheap as chips!


There is a substantial performance difference in SSL handshake speed and site performance if you have a central server, vs multi-edge storage before you get to Cloudflare. The closer your edges are to Cloudflare's edges when the handshake happens the faster things go. The handshake delay is particularly slow internationally with Heroku running under CF, since you can't predict what heroku IP you are fetching from, unless you fix to one point and then latencies are still long at far away locations.


That setting is what Cloudflare passes on down to your visitors, not how long it caches itself.

I don't actually know cloudflare personally, but assuming they're following the spec, you need to set "Cache-Control: max-age=300" or "Cache-Control: s-maxage=300" to tell cloudflare to cache a given response for 5 minutes.


GAE is far better than AWS for static sites. No cost for medium scale static websites (50,000 pageviews/day), CDN built-in.


Why not Github pages? Has anyone measured performance? CDN is always an option you can add on top, if you don't like Github's one.


Spaces + BunnyCDN is even cheaper. ;)

$6/month = 250GB of static websites + domain registration fees.

If you want to get really cheap, OVH's cloud storage is basically free but you lose https. ($.01/GB of traffic/storage roughly)


Link for Spaces? Couldn't search it since was fairly generic term.



Sorry. Spaces = Digital Ocean's Spaces product. $5/month. Park a CDN in front of it with Edge rules and you get a similar setup.


Spaces will get a native CDN this year if DOS roadmap is to be believed.


The problem with Spaces as static hosting is it doesn't handle index.html/index.htm correctly so you need a CDN with edge rules.


Does this require invalidating the main page in cloudfronts CDN when you add a blog post or article?


If you want the updated site to be available immediately then you must invalidate the cache.

I do it as the last step in my CI/CD process:

    - aws configure set preview.cloudfront true
    - aws cloudfront create-invalidation --distribution-id $CLOUDFRONT_ID --paths '/*'
    - curl https://google.com/ping?sitemap=$SITEMAP_URL


Thx. I was hoping there was a way to avoid that extra cost. Also, thx for sharing your code to do this.


why can't he use the free cloudflare?

A static website cached by cloudflare seems very vanilla.

Also, why S3? this seems very easy to do with Linode or anything else similarly cheap or cheaper ($20 gets you a KVM VPS with 2 Gb nowadays)

Unless you have to regenerate constantly, all it needs is nginx, to serve the pages to Cloudflare edges when thry drop out of the cache


Nowadays, if you really don't need a server, then don't spin up one. In this case, the author does not need a server. If you think you might need to do some backend processing, see if you can use 'serverless' functions and offload the admin work to someone else. If that doesn't suffice, maybe something like App Engine. Still not enough? Ok, so maybe a AWS batch job (or equivalent). Does it have to run all the time? Then maybe a container managed by ECS or similar. None of this works and you need an actual VM? Fine, spin up one. But now you have to care for it and keep it monitored, patched and secured.


You can just use Cloudflare Workers to do whatever serverless things you need.


I think it's customary to declare any product affiliations on HN if your post is promoting a product.


I was previously using Cloudflare in conjunction with a Linode server. The disadvantage of this, however, is that you need to ensure that one server never goes down otherwise anyone hitting a cold cache would see the site down.

Also, why do you need a server at all? If you are doing nothing but serving static files then why incur the overhead of maintaining LetsEncrypt certificates and writing Nginx config files? AWS provides a simple interface in which you can view and manage your static files without having to ssh into a server.

On the point of price, I'm predicting my total cost of hosting for a site serving roughly 40-45k users per month will be roughly $7/month. All whilst minimizing the amount of administration time required and monitoring of a more traditional system.

Hope this clarifies this, would be keen to hear any counter thoughts!


We use Azure and we have multiple sites we're getting that visitor count (or more, for PPC campaigns) for way less than $7/mo using Cloudflare/Azure Edge/Web Apps. This might not work for a single site, but multiple sites on a service plan we can handle way more visits per site (especially if it were static) for $2-$3/site/mo and we could auto scale if needed though that would raise that number, it would not exceed your number. Unless I'm missing something, I don't see how $7 isn't viewed as rather expensive for that visitor count.


Sure! here are my counter thoughts: What if your site changes every 5 minutes, and is about 100Mb?

12x24x100= lots of mb per day

S3 costs (if only for the bandwidth) will be greater than a server with a fair share unlimited connection

Also, cloudfront non free plans will kill you - again because of the bandwith.


> What if your site changes every 5 minutes, and is about 100Mb?

This is probably an exceptional rate of change for the _vast_ majority of static sites.


> This is probably an exceptional rate of change for the _vast_ majority of static sites.

I also doubt that many static sites have much "overhead" related to maintaining nginx config files and Let's Encrypt certificates.


Sure, but then there's the whole overhead of maintaining a server.


Clarify this: "The difference between zero servers and one server is much larger than the difference between 100 servers and 101 servers"

The person you're talking to is likely thinking of the "100=>101" case, not the "0=>1" case.


Maintaining a server is not complicated nowadays, with ansible. I have a few dozens, some of them don't need any admin for years (like linode), the others have debian testing and ansible to deploy matching configurations and keep up with the updates


S3 is going to be more reliable than anything you can reasonably be expected to administrate yourself, and it's cheaper as well. It's also supported by all CI tools, so you just script the deployment based on git commits.


My linode hasn't been updated for years, still runs nginx just fine. port knocking means little risks of intrusion, as ssh is on a non standard port too.

My deployment is shell scripts launched by systemd timers

Instead of downvoting, can someone please explain why this setup that is stable is not up to whatever standards?


Most likely your "hasn't been updated for years". Given all the security vulnerabilities that have been disclosed, including things like Heartbleed, I hope you meant your "setup hasn't been changed in years".


No, I run nothing like apt update. Why would I fix something not broken? And no, I do not run ssl. I like to limit the number of moving parts.

Anyway, if someone can manage to access my servers with only nginx serving static files, they deserve to 0wn it :-)


You're exposing your users to MITM attacks by not deploying SSL.

While your setup stays the same, major security flaws are found in different parts of the stacks.

Security is a process, by neglecting it you're paying for resources that are abused by attackers in order to harm other users.


If you want to mitm my static site with no login and mostly PDFs, you are welcome to.

Security is a state of mind, not a bunch of recipes. Some things must be protected, some for some others it doesn't make sense


How do you know your users are seeing "a static site with no login and mostly PDFs"?

Security is a state of mind, indeed.


I know because I make the site and use it too


I know, but now anyone can:

* Inject a cryptocurrency miner JavaScript into your page during the transmission of your static HTML page to your clients, without you or your users knowing it [1]

* Injecting explicit, illegal image material that would get your clients immediately in legal trouble for possession of such material, without you knowing

* Injecting a JS snippet that, instead of your site's contents, shows a fake antivirus page, telling the clients that your site is malicious and that a threat was eliminated by Fake Antivirus 10.0 and that they should immediately call Microsoft Support (Phone number in Bangladesh) for further "assistance". There they're told they need to get a full cleaning of the hard drive for only 99$ and are asked for their CC number

The point is: If you don't have end-to-end-encryption, you can never be sure what your users see. They might see your site - or some slightly modified version of your site, with a login box, phishing passwords from your users, abusing their trust in your brand.

A MITM has nothing to do with someone gaining access to your server. It's someone gaining access to infrastructure - a vulnerable public wifi, for instance.

[1] https://www.hacking.reviews/2018/01/coffeeminer-collaborativ...


Fair enough. I will enable https on cloudflare.


You have a single point of failure and a linux box (and CMS?) that needs to be constantly patched - how is that in any way comparable to the simplicity and scalability of dropping files onto an S3 bucket?


No cms. Nginx serving static files. Not patched or upgraded for years. Still running fine.

Doing anything else than putting cloudflare in front of my nginx is needlessly overhead- because cloudflare scales quite well!


I would say because using S3 for static sites costs pennies instead of $20 :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: