Believe me, I know this. Entirely too well, in fact. At my last job, the *senior...

Believe me, I know this. Entirely too well, in fact.

At my last job, the senior DevOps dude foot-gunned the entire company, by running a read-write disk benchmark (using fio(1)) against the block device (instead of a partition, which, while still stupid, would at least not have been actively destructive) on both my master and all of my slave PostgreSQL hosts. At the same time. And, of course, without telling anyone what he was doing, so the first inkling I had that there was a problem was about 20 minutes later, when I started getting a steadily increasing number of errors suggesting disk corruption.

How does one make such a tool drool-proof enough to prevent that kind of idiocy? Please, help me figure that out. And then give me a time machine, because that was a 16-hour day I'd really rather not have experienced.

And, no, the right move is generally not to fire the jackass who makes that kind of mistake. In my case, above, the company spent about three quarters of a million dollars (just in revenue, never mind how much time was burned in meetings about the incident, my efforts to fix the problem, as well as his and the rest of his team's efforts, and so on) teaching him never to do that again. You don't buy lessons that expensively and then let someone else benefit from them.

(That said, he did get fired several months later for telling the entire engineering lead team to fuck off, in so many words, for their having made a perfectly reasonable request, which was entirely within his responsibilities, and his skills, to satisfy.)