Why would throwing the switch be risky? It's supposed to be HA. If it doesn't work, that's a bug, and you fix it! Just like backups are not backups until they have been restored (we verify this by making our data warehouse depend on the backup) and hot standbys aren't standbys until switched in (we do this to databases regularly.)
Netflix apparently has a chaos generator that randomly kills machines as a standard process. If you're supposed to deal with failure, make sure you're dealing with failure regularly!
> Netflix apparently has a chaos generator that randomly kills machines as a standard process.
This sounds pretty neat, but a quick Google didn't turn up any information about it besides this post. Do you know of anywhere to get more information on what they're doing? It sounds like a sensible idea, although I can only imagine trying to implement it would be ... challenging, for most companies/organizations.