This is very real. After a production outage caused by excessive health check fa...

This is very real. After a production outage caused by excessive health check failures the day after a massive GCP outage (Sept 2020) -- where we quickly hit our already-oversized quota (quotas, another GCP issue) during a traffic spike -- we've moved all of our sensitive workloads to AWS.

We continue to use GCP for less sensitive workloads and for GKE, but our entire ops team has unspoken distrust. This is totally an infra-specific opinion, ignoring the fact that we've had to rewrite apps entirely after breaking changes from Google products.

GCP has a great UI, the project structure makes much more sense, and billing is way easier, but after having a massive outage during a pretty standard scaling event, we just can't justify the risks.