Hacker News new | past | comments | ask | show | jobs | submit login

Most people just deal with it and accept that their site will go down for 20 minutes every 3-4 years or so, even when hosting on a major cloud, because:

1) the cost of mitigating that risk is much higher than the cost of just eating the outage, and

2) their high traffic production site is routinely down for that long anyway, for unrelated reasons.

If you really, really can't bear the business costs of an entire provider ever going down, even that rarely (e.g. you're doing life support, military systems, big finance), then you just pay a lot of money to rework your entire system into a fully redundant infrastructure that runs on multiple providers simultaneously.

There really aren't any other options besides these two.




This here is right on.

I will add that if you can afford the time and effort to do so, it would be good to design your system in the beginning to work on multiple providers without many issues. That means trying as hard as you can to use as little provider-specific things as you can (RDS, DynamoDB, SQS, BigTable, etc). In most cases, pjlegato's 1) will still apply.

But you get a massive side-benefit (main benefit, I think) in cost. There are huge bidding wars between providers and if you're a startup and know how to play them off each other, you could even get away with not having to pay hosting costs for years. GC, AWS, Azure, Rackspace, Aliyun, etc, etc are all fighting for your business. If you've done the work to be provider-agnostic, you could switch between them with much less effort and reap the savings.


If you are doing life support, military systems, or HA big finance, then you are quite likely to be running on dedicated equipment, with dedicated circuits, and quite often highly customized/configured non-stop hardware/operating systems.

You are unlikely to be running such systems on AWS or GCE.


And that's why IBM is still in the server business: There's nothing like a mainframe when it comes to uptime.


HP also has some good products in the highly available space - http://h20195.www2.hp.com/v2/getpdf.aspx/4aa4-2988enw.pdf , likely from their acquisition of Tandem.


> likely from their acquisition of Tandem.

Yep. Those were originally Itanium-only, so their success was somewhat… limited, compared to IBM's "we're backwards compatible to punch cards" mainframes.

Only recently did Intel start to port over the mission critical features like CPU hotswap to Xeons, so they can finally let the Itanic die, so we're hopefully going to see more x86 devices with mainframe-like capabilities.


IBM also owns Softlayer which is a great cloud provider for the more traditional VM/dedicated servers architecture.


And have similar failure rates. Human errors are inevitable.


> even when hosting on a major cloud

Hosting on anything/anywhere really. Even if one builds clusters with true 100% reliability running on nuclear power buried 100 feet underground, you still have to talk to the rest of the world through a network which can fall apart for variety of reasons. If most of your users are on their mobile phones, they might not even notice outages.

At some point adding an extra 9 to the service availability can no longer be justified for the associated cost.


Also, 20 minutes every 3 years is 5-9s anyways.


> Most people just deal with it and accept that their site will go down for 20 minutes every 3-4 years or so, even when hosting on a major cloud

If THAT is what I get for the prices of Google Cloud Engine, I could just as well use OVHs cloud -- uptime isn't worse, and price is a lot cheaper.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: