The only way to optimize for lowest overall risk is to optimize for speed of change.
All the checklists in the world to prevent something from happening are fine and dandy until something happens anyway (which it will). And then they hamstring you from actually fixing it.
Instead, if you can move fast consistently, you can minimize the total downtime.
> Instead, if you can move fast consistently, you can minimize the total downtime.
In safety critical software where _a_ failure can result in loss of life, is “total cumulative duration of downtime” really the metric we’re optimizing for?
Yep, this is the exact point I tried to make above and got heavily downvoted.
If you can't move fast when things are working well, you can't move fast when things are broken. Acting like moving slow is going to prevent things from ever breaking is just wishful thinking.
Downtime isn’t the metric their procedures are optimised to minimise. It’s optimised to minimise air traffic accidents. Moving fast might minimise total down time (though I seriously doubt that), but what effect would it have on accuracy and reliability? Mistakes mean dead people. In this incident zero people died. You really sure you know you can improve on that?
All the checklists in the world to prevent something from happening are fine and dandy until something happens anyway (which it will). And then they hamstring you from actually fixing it.
Instead, if you can move fast consistently, you can minimize the total downtime.