How do you all write dry-run logic for utilities that perform file operations (edits, deletions, creations) and have some kind of sequential dependency in operations such that later steps depend on the previous steps having been executed?
I have found it a frustrating nut to crack because it seems like it needs to involve either writing a simulation layer that tracks the state of all the file calls virtually, or else copy all the files to a temp folder (so the dry run isn't dry, it's just on a separate version of the data). Both of these seem like bad solutions.
Yes, if you call a black box outside of your code base which does something then subsequent operations can't easily be dry-run'd.
So if you need to install a package which sets up a systemd service, the subsequent dry-run of managing that systemd service would fail because the service doesn't exist yet. Or you need to make assumptions in the service dry run that it should have been set up before.
You can just assume that the service would have been setup and report that you were, say, enabling and starting the service. Or you could require some kind of hint to be setup from the package installation to the specific service configuration (for well known things this could be done by default, but for arbitrary user-supplied packages this cannot be done). Or you could list the entire package contents and try to determine if the manifest looks like it is configuring a service.
And it still may fail because you don't know the contents of that file without extracting it and the service may not parse at all.
And that's just a simple package-service interaction example. You could spend a week noodling on how to do that fairly precisely, then there's a hundred or a thousand other interactions to get correct.
You're being told not to do the thing, so there's fundamentally a black-box there, and you need to figure out how much you're going to cheat, how much you're going to try to crack open the black box into some kind of sandbox so you can figure out its internals, and how much you're going to offload onto the user. Not actually an easy problem at all.
Yeah, the ability to indicate a shell command is pure (or that it must be pure) is something that's really missing in POSIX-like APIs. It's something I've certainly missed. Something like fork that disables write outside a handful of file descriptors (like one the parent starts with popen) would be pretty awesome. Maybe BSD jails do that.
In general an composable dry run API with the ability to make promises would be good. Then its on the lower level black box to have been tested correctly and make accurate promises.
In practice though what you'll find is that its easier to treat whole systems as black boxes, and test changes in throwaway virt systems, then test them in development environments, then roll them out to prod (or that whole immutable infrastructure thing and throw it all away and replace so you don't get bitten by dev-prod discrepancies in theory).
I don't see anything wrong with the "copy to a temporary directory" approach if your function actually does operate on files within a directory. In that scenario, copying to a different directory is actually probably exactly what you want for a dry-run, since then the code that you're executing is the same exact code that would run were you to execute in non-dry-run mode (as opposed to virtual operations which are prone to bugs if that virtualization layer and your real operations ever fall out of sync).
My kdesrc-build has a dry-run style of logic, not because of destructiveness but just because it was annoying to only realize after 4 hours of compiling software that a module you needed was somehow not in the build, or that your configuration change caused you to re-download all the repositories instead of pulling small updates.
This is problematic in multiple ways:
1) The script uses a single repository of project metadata to derive the rest of the build. This repo needs to be present and updated to figure out what to do.
2) There are sequential dependencies on a great deal of steps that can fail without much predictability (including network traffic, git commands, and the module build itself). But in a mode where you're compiling every day, even a transient module build failure does not necessarily mean you shouldn't press on to also try and build dependent modules.
I've ended up using an amalgamation of logical steps:
* If a pass/fail check can be performed non-destructively, do it (e.g. a dry run build will fail if a needed host system install is missing, or the script will decide on git-pull vs. git-clone correctly based on whether the source dir is present)
* If pass/fail can't be done, just assume success during dry run (e.g. for git update or a build step)
* For the metadata in particular, download it to a temp dir if we're in dry run mode and there's no older metadata version available.
* Have the user pass in whether they want a build failure to stop the whole build or not.
For separate reasons, I've also needed to maintain logic to help modules with build systems that didn't support a separate build directory to act as if the build was being executed in the source directory (this was done by copying symlinks to the source directory contents into the proper build directory, file by file).
Having this separate, disposable directory made it safer to not have to be perfect on tracking files changes or alterations perfectly. And of course it helped greatly being able to offload the version control heavy lifting to CVS (back in the day, then Subversion and now Git).
Besides having dry-run support, I think this gets to the question of what the level of abstraction should be, and I think that is clearly application-specific.
Maybe in your case the level you are trying to report dry-run information at is too granular.
Or maybe you have too tight coupling between your code that determines what needs to be changed and that which actually does the change and you might want to refactor the code that determines changes to the code that makes the changes.
I don't know of any system that works like this currently, but it seems like you are asking if you can switch branches, do the work, then your "dry-run" is a pull request complete with all files changes which you can approve manually.
I have found it a frustrating nut to crack because it seems like it needs to involve either writing a simulation layer that tracks the state of all the file calls virtually, or else copy all the files to a temp folder (so the dry run isn't dry, it's just on a separate version of the data). Both of these seem like bad solutions.