About 5 or 6 years ago we had an alert in the middle of the night that our RDS i...

ceejayoz · on June 9, 2022

They don't tell you because you're not supposed to care, and there's no human involved in the process to do a post-mortem.

Something on the instance host died, most likely.

WatchDog · on June 9, 2022

I’ve seen a few AWS instance hardware failures, they happen with some regularity. You can handle single instance failure without being multi AZ. Testing an actual AZ failure, as in the whole AZ going offline or getting partitioned from the other AZs, is pretty much impossible.

fulafel · on June 11, 2022

AZs are connected via normal user visible networks, you can just break those. They even provide examples, https://github.com/awslabs/aws-well-architected-labs/tree/ma...

Those are basic (don't cover flapping or glacial-speed slowdown degradation modes, some services only, etc) but a starting point at least that can be extended.

ceejayoz · on June 11, 2022

Huh, didn't know about aws rds reboot-db-instance --force-failover