Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hm, bad week for the Cloud. Can't even get to the status page; hopefully it's not hosted on App Engine.

So going forward, what's the best way to protect against cloud downtime? Have a hot/standby failover with a different provider? Prepare customers' expectations for the possibility of server outages? Do a ton of research, pay $$$ for lots of nines uptime, and lambast the host when they don't deliver?



Downtimes happen regardless, unless you have a lot of money and talent to spend on your own infrastructure and even then it's hard to beat cloud providers like Amazon, or Google, who have more resources and knowledge than you do.

The greatest thing about cloud-hosting is that you can just sit by and let them fix it. It usually takes about half an hour, or a couple of hours if the outage is severe, but usually less than the time it takes for an update of DNS records (unless you've got some proxy in front of your IPs, which would be another point of failure).

And then, even with these severe outages, the overall monthly uptime is still better than %99.9 and it's really hard to beat that, so just relax and let them fix it.


There's no such thing as "cloud downtime" - it's still servers, data centers, networks, same as everything else.

You need to decide how much uptime you're willing to pay for, how much your service can degrade for how long, and methodically address each level of the hierarchy between you and your customers – and it might be the case that you decide that the ongoing costs of your engineering support for e.g. wide geographic separation just aren't sustainable at the level your customers are willing to pay, particularly if you have something like a CDN helping keep your site partially responsive during less than catastrophic failures.


I'd say the answer depends on how fast GAE recovers. If you're building redundancy over multiple clouds, if there's a lot of data:

1) It's very complex and expensive 2) You're looking at DNS to hot failover, in most cases.

If GAE can recover in less than 30 minutes and sticks to, say, one outage a year, you just can't justify the kind of cost you're looking at with 2 (seriously, it's a lot of cash).


Build redundancy into your software to deal with single provider failure.


That is not always a feasible option, specially for young projects with limited capital.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: