Hacker News new | past | comments | ask | show | jobs | submit login

"Even if you had the state of the art microservice architecture running on a kubernetes cluster on your own hardware, you still wouldn't be able to source disk/CPU fast enough if your service happens to experience loads beyond what you provisioned." That basically means you never planned. As everyone moves to cloud what makes you think AWS, Azure wont have same issue. If entire region is down do you think other regions can handle the load. If you think so you're kidding yourself. Unless you have business where you dont know your peak number then cloud does not matter.



You can plan all you'd like, failures happen not necessarily due to poor planning but because in real life, shit happens. Pokemon Go for instance experienced like 50x the amount of traffic they planned for.

Secondly, software companies like Microsoft, Google and IBM might know a thing or two about running data centers. Due to economies of scale, these companies are inherently in a better position to supply hardware at scale.

> If entire region is down do you think other regions can handle the load. If you think so you're kidding yourself

Netflix routinely does just this to test the resilience of their systems. They pick a random AWS region, and they evacuate it. All the traffic is proxied to the other regions and eventually via DNS the traffic is routed entirely to the surviving regions. No interruption of service is experienced by the users.

Here's a visualization of Netflix simulating a failure on the US-east-1 region and failing over to US-west-1/US-west-2

https://www.youtube.com/watch?v=KVbTjlZ0sfE

The top right node is the one that fails. As the error rate climbs, traffic starts getting proxied over to the surviving nodes, until a DNS switch redirects all traffic to the surviving nodes. Netflix does this monthly, in production. They also run https://github.com/Netflix/SimianArmy on production.

The cloud enables fault tolerance, resiliency and graceful degradation.


I think you missed the point, Netflix evacuating a region is not the same thing as that region failing. If the whole region goes down, their (AWS's) total capacity just took a major hit and unless they have obscenely over-provisioned (they haven't), shit is going to hit the fan when people start spinning up stuff in the remaining regions to make up for the loss.


>The cloud enables fault tolerance, resiliency and graceful degradation

No, tooling to failover and spin up new instances does that. An enterprise with 3 data centers can do that.

"the cloud" is just doing it on someone else's hardware.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: