> What I suspect is more likely true is that AWS suffers thousands of small ...

> What I suspect is more likely true is that AWS suffers thousands of small failures every month and most are contained as designed, with no(or minuscule) customer impact.

Isn't that the whole point of moving to The Cloud? There's supposed to be some magical system in place such that hardware failures are routed around and don't interrupt service. Of course you can roll this yourself with your own hardware, but this is done for you.

It should be no small surprise that a system complicated enough to appear magical has some crazy complexity behind the scenes, and accidental dependencies can result in catastrophic failure.