Hacker News new | past | comments | ask | show | jobs | submit login

Why would throwing the switch be risky? It's supposed to be HA. If it doesn't work, that's a bug, and you fix it! Just like backups are not backups until they have been restored (we verify this by making our data warehouse depend on the backup) and hot standbys aren't standbys until switched in (we do this to databases regularly.) Netflix apparently has a chaos generator that randomly kills machines as a standard process. If you're supposed to deal with failure, make sure you're dealing with failure regularly!



> Netflix apparently has a chaos generator that randomly kills machines as a standard process.

This sounds pretty neat, but a quick Google didn't turn up any information about it besides this post. Do you know of anywhere to get more information on what they're doing? It sounds like a sensible idea, although I can only imagine trying to implement it would be ... challenging, for most companies/organizations.


check out item no 3. on this list, which is AWS lessons-learned: http://techblog.netflix.com/2010/12/5-lessons-weve-learned-u...

see also: http://techblog.netflix.com/2011/04/lessons-netflix-learned-...

and: http://techblog.netflix.com/2011/07/netflix-simian-army.html for the other simian themed services they've developed for care and feeding of their AWS stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: