Hacker News new | past | comments | ask | show | jobs | submit login

The only way to optimize for lowest overall risk is to optimize for speed of change.

All the checklists in the world to prevent something from happening are fine and dandy until something happens anyway (which it will). And then they hamstring you from actually fixing it.

Instead, if you can move fast consistently, you can minimize the total downtime.




> Instead, if you can move fast consistently, you can minimize the total downtime.

In safety critical software where _a_ failure can result in loss of life, is “total cumulative duration of downtime” really the metric we’re optimizing for?


Yep, this is the exact point I tried to make above and got heavily downvoted.

If you can't move fast when things are working well, you can't move fast when things are broken. Acting like moving slow is going to prevent things from ever breaking is just wishful thinking.


>All the checklists in the world to prevent something from happening are fine and dandy until something happens anyway (which it will)

Eventually, but not as frequently.


Downtime isn’t the metric their procedures are optimised to minimise. It’s optimised to minimise air traffic accidents. Moving fast might minimise total down time (though I seriously doubt that), but what effect would it have on accuracy and reliability? Mistakes mean dead people. In this incident zero people died. You really sure you know you can improve on that?


Please don't change jobs from Facebook to the air traffic control industry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: