Offtopic, but: what's an example of a situation where using rm -f is bad compared to rm in practice? That is, an example where rm would save you but rm -f would make your life upsetting?
On topic: idempotency may be a red herring in this context. Unfortunately filesystems are designed with the assumption that every modification is inherently stateful. (It may be possible to design a different type of filesystem without this assumption, but every filesystem currently operates as a sequence of commits that alter state.) So installing a library or a program is necessarily stateful. What do you do if the program fails to install? Trying again probably won't help: the failure is probably due to some other missing or corrupted state. So indempotency won't help you because there's no situation in which a retry loop would be helpful. That is, if something fails, then whatever operation you were trying to accomplish is probably doomed anyway (if it's automated).
I think docker is the right answer. It sidesteps the problem by letting you create containers with guaranteed state. If you perform a sequence of steps, and those steps succeeded once, then they'll always succeed (as long as errors like network connectivity issues are taken into account, but you'd have to do that anyway). EDIT: I disagree with myself. Let's say you write a program to set up a docker container and install a web service. If at some future time some component that the web service relies upon releases an update that changes its API in a way that breaks the web service, then your supercool docker autosetup script will no longer function. The only way around this is to install known versions of everything, but that's a horrible idea because it makes security updates impossible to install.
It's a tough problem in general. Everyone agrees that hiring people to set up and manually configure servers isn't a tenable solution. But we haven't really agreed what should replace an intelligent human when configuring a server.
well, the rm example is overly simple on purpose - the only thing that -f is actually going to do that's remotely dangerous is removing files that have the readonly bit set. I've never actually been bitten by that. In general though, I think this pattern scales poorly - the more complicated your task is, the more like the "force it" mode is going to be more and more dangerous.
---
On the subject of what to do when something goes wrong:
Sometimes retrying installing a package does fix the problem: if there was a network error, for example, and you downloaded an incomplete set of files, the next time you run it it will be fine.
If your package manager goes off the rails and gets your system into an inconsistent state, then you have a decision to make. Is this going to happen again? If not, just fix the stupid thing manually: there's no point in automating a one-time task. If it is probably recurring, then, you need to write some code to fix it (and file a bug report to your distro!). I do not believe that there is a safe, sane way to pre-engineer your automation to fix problems you haven't seen yet!
In the meantime maybe your automation framework stupidly tries to run the install script every 20 minutes and reports recurring failure. The cost of that is low.
Docker is awesome, for sure, and I'll definitely use it on my next server-side project. It isn't a magic bullet, though - you still have to configure things, they still have dependencies. Just, hopefully, failures are more constrained.
---
and on the point of upgrading for security fixes: the sad reality is that even critical fixes for security holes must be tested on a staging environment. No upgrade is ever really, truly guaranteed to be safe. I guess if the bug is bad enough you just shut down Production entirely until you can figure out whether you have a fix that is compatible with everything.
well, the rm example is overly simple on purpose - the only thing that -f is actually going to do that's remotely dangerous is removing files that have the readonly bit set.
Since you originally outlined the requirements as:
Take "rm" as a trivial example - when I say `rm foo.txt`, I want the file to be gone.
then the file should be gone even if "the readonly bit" was set.
This is not only a contrived example, but a bad one, for system management. rm is an interactive command line tool, with a user interface that is meant to keep you from shooting yourself in the foot. rm is polite in that it checks that the file is writable before attempting to remove it and gives a warning. System management tools I would expect to call unlink(2) directly to remove the file, which doesn't have a user-interface, rather than run rm.
However, the system management tool doesn't start with no knowledge of the current state of the system, but rather one that is known (or otherwise discoverable/manageable). And then attempt to transform the system into a target state. They can not be expected to transform any random state into a target state. As such, the result of unlink(2) should be reported, and the operator should have the option of fixing up the corner cases where it is unable to perform as desired. If you've got 100 machines and 99 of them are able to be transformed into the target state by the system management tool and one of them is not, this isn't a deficiency of the system management tool, but most likely a system having diverged in some way. Only the operator can decide if the divergence is something that can/should be handled on a continuous basis, by changing what the tool does (forcing removal of a file that is otherwise unable to be removed, for example), or fixing that system, after investigation.
The other option is to only ever start with a blank slate for each machine and built it from scratch into a known state. If anything diverges, scrap it and start over. This is an acceptable method of attack to keep systems from diverging, but not always the pragmatic one.
it's probably safer to just remember to use mv instead, because there's a very high chance that you'll do the wrong thing on a terminal that doesn't have that alias available.
On topic: idempotency may be a red herring in this context. Unfortunately filesystems are designed with the assumption that every modification is inherently stateful. (It may be possible to design a different type of filesystem without this assumption, but every filesystem currently operates as a sequence of commits that alter state.) So installing a library or a program is necessarily stateful. What do you do if the program fails to install? Trying again probably won't help: the failure is probably due to some other missing or corrupted state. So indempotency won't help you because there's no situation in which a retry loop would be helpful. That is, if something fails, then whatever operation you were trying to accomplish is probably doomed anyway (if it's automated).
I think docker is the right answer. It sidesteps the problem by letting you create containers with guaranteed state. If you perform a sequence of steps, and those steps succeeded once, then they'll always succeed (as long as errors like network connectivity issues are taken into account, but you'd have to do that anyway). EDIT: I disagree with myself. Let's say you write a program to set up a docker container and install a web service. If at some future time some component that the web service relies upon releases an update that changes its API in a way that breaks the web service, then your supercool docker autosetup script will no longer function. The only way around this is to install known versions of everything, but that's a horrible idea because it makes security updates impossible to install.
It's a tough problem in general. Everyone agrees that hiring people to set up and manually configure servers isn't a tenable solution. But we haven't really agreed what should replace an intelligent human when configuring a server.