Hacker News new | past | comments | ask | show | jobs | submit login

At my not Google job we talk about "what happens if a meteor hits a DC".

We agree that that is so rare that as long as there are buttons we can push to recover after a reasonable timeframe that is an acceptable risk, we don't need a fully automatic way to recover from that.

However your SRE teams needs a way to recover without intervention which is why there is talk of backups.

BTW even using different cloud providers isn't enough to avoid a DC outage necessarily. No amount of redundancy can protect you from it beyond a ton of services which intentionally slice off access to the DC leading to the risk of that happening accidentally which is its own risk.




But that’s incredibly stupid. A meteor hitting a DC is an intentionally dumb way to eclipse the much more likely risks of thousands of other things that can wipe out a DC.

I’ve been involved in such discussions and on the surface they seem reasonable but it turns into an easy reason to write of DC being wiped out.

In comparison there are far more likely reasons for a DC to be wiped out effectively permanently. Data centers at the base of WTC 1 and 2 are good examples. The myriad of targeted attacks on the power grid are also prime examples of attacks that would cripple a data center for weeks if they were in the cross chairs.

The Cascadia subduction zone has a much higher probability of wiping out all of them in the PNW simultaneously than a meteor hitting a single one.

Looking at it from a different perspective, there are other teams at Google who understand how sensitive their data centers are and they act appropriately. The list of data center locations is not public. A motivated group with long range rifles purchase from Walmart could wipe out a Google data center.

Google is very bad at modeling long tail events.


Meteor isn't real. It is literally just a metaphor and more fun than "we lost power" or "egress died". Anybody claiming it is impossible is silly.

You also won't lose a DC a year so you need some strict uptime guarantees to justify automatic roll over.

Prepare to avoid dataloss but needing SRE to change over is fine if you can stomach a hour of downtime.


> Meteor isn't real

Yes, just like "bus factor" isn't about buses




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: