Your response highlights a good idea to mitigate the risk I was trying to highlight in mine.
They want to have a rapid response path (little to no delay using staging envs) to respond to emergencies. The old SOP allowed all releases to use the emergency path. By not using it in the SOP anymore, I'd be concerned that it would break silently from some other refactor or change.
Your notion is to maintain the emergency rollout as a relaxation of the new SOP such that the time in staging is reduced to almost nothing. That sounds like a good idea since it avoids maintaining two processes and having greater risk of breakage. So, same logic but using different thresholds versus two independent processes.
Right. The emergency path is either something you end up using always, or something you use so rarely that it gets eaten by bit-rot before it gets ever used[0]. So I think we're in full agreement on your original point. This was just an attempt to parse a working policy out of that bullet point.
[0] My favorite example of this had somebody accidentally trigger an ancient emergency config push procedure. It worked, made a (pre-canned) global configuration change that broke everything. Since the change was made via this non-standard and obsolete method, rolling it back took ages. Now, in theory it should have been trivial. But in practice, in the years since the functionality had been written (and never used), somehow all humans had lost the rights to override the emergency system.
My personal rule is that any code which doesn't get exercised at least weekly is untrustworthy. I once inherited a codebase with a heavy, custom blue-green deploy system (it made sense for the original authors). While we deployed about once a week, we set up CI to test the deployment every day.
They want to have a rapid response path (little to no delay using staging envs) to respond to emergencies. The old SOP allowed all releases to use the emergency path. By not using it in the SOP anymore, I'd be concerned that it would break silently from some other refactor or change.
Your notion is to maintain the emergency rollout as a relaxation of the new SOP such that the time in staging is reduced to almost nothing. That sounds like a good idea since it avoids maintaining two processes and having greater risk of breakage. So, same logic but using different thresholds versus two independent processes.