Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

About 5 or 6 years ago we had an alert in the middle of the night that our RDS instance dieded. It failed over in about 15 seconds (SQL Server so it’s a bit slow compared to PostgreSQL) but the MultiAZ worked as advertised. The downside is AWS never told us why it occurred.


They don't tell you because you're not supposed to care, and there's no human involved in the process to do a post-mortem.

Something on the instance host died, most likely.


I’ve seen a few AWS instance hardware failures, they happen with some regularity. You can handle single instance failure without being multi AZ. Testing an actual AZ failure, as in the whole AZ going offline or getting partitioned from the other AZs, is pretty much impossible.


AZs are connected via normal user visible networks, you can just break those. They even provide examples, https://github.com/awslabs/aws-well-architected-labs/tree/ma...

Those are basic (don't cover flapping or glacial-speed slowdown degradation modes, some services only, etc) but a starting point at least that can be extended.


Huh, didn't know about aws rds reboot-db-instance --force-failover




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: