> but at the time the code seemed completely correct to me It always does. > Wel...

dkersten · on May 5, 2022

I was involved with archiving of data that was legally required to be retained for PSD2 compliance. So it was pretty important that the data was correctly archived, but it was just as important that it was properly removed from other places due to data protection.

This is basically the approach that was taken: log before and after every action exactly what data or files is being acted on and how. Don't actually do it. Then have multiple people inspect the logs. Once ok'd, run again, with manual prompts after each log item asking to continue, for the first few files/bits of data. Only after that was ok'd too did it run the remainder.

In other things I've worked on, I've taken the terraform-style plan first, then apply the plan approach, with manual inspection of the plan in between.

tauwauwau · on May 5, 2022

Once we get used to doing same thing multiple times a day, it doesn't matter if the log shows that we're about to take a destructive action, we'll still do it. Only thing that is foolproof is to not take the destructive action because people make mistake, it's human nature. I don't know how this can be implemented, may be encrypt the files, take a backup in some other location (which may not be allowed).

Multiple reviewers here didn't catch the mistake

https://www.bloombergquint.com/markets/citi-s-900-million-mi...

HowardStark · on May 5, 2022

While this is a huge issue, a solution (well, a partial mitigation) I've seen and used is the "Pointing and Calling" technique. The basic idea is that you incorporate more actions beyond reading and typing or pressing a button—generally by having people point at something and say aloud what it is they're doing and what they expect to happen.

It's used rather extensively in safety-critical public transportation in Japan [1] and to a lesser extent in New York (along with many other countries) [2]. This can easily extend to software without overcomplicating by just setting the expectation that engineers, Q&A, etc. do this even when alone.

[1] https://www.atlasobscura.com/articles/pointing-and-calling-j...

[2] https://en.wikipedia.org/wiki/Pointing_and_calling

samhw · on May 5, 2022

Hell, GitHub does that to an extent, with the "type the name of this repository to delete it" prompts. Typing the name of the repository isn't exactly perfect, but it's an interesting direction.

Blackcatmaxy · on May 5, 2022

There was a thread recently about a repo that accidentally went private and lost all of its stars because of confusion with GH teams vs GH profile readme repo naming. I think this type of prompt is very useful for explicitly preventing the rare worst case scenarios but the problem is making any type of prompt "routine" so that our brains fail to process it.

swid · on May 5, 2022

The suggestion in that post about how to fix it is good, and mirrors one I read in the Rachael by the Bay blog - type the number of machines to continue:

https://rachelbythebay.com/w/2020/10/26/num/

The take away by both is there is actually something to do which can wake people up when the stakes are high, and they might not be doing what they expect.

oauea · on May 5, 2022

And most importantly, don't let yourself get into the habit of copy pasting the value

underwater · on May 5, 2022

I wonder if your could print some non visible characters in there to taint the copied value in some detectable way.

dkersten · on May 7, 2022

Prompt in words, but expect the value in numbers, eg: "Twenty-five" and the box requires you to type "25"? At least this specific case, it would require you to type it.

weaksauce · on May 6, 2022

yeah, that would possibly stop the copy and paste problem. to make it robust they would need to use a string of a few non-visible characters but that would fail if the browser's clipboard system doesn't copy them over for some kind of privacy initiative. might be another way it fails that I can't think of right now.

lostlogin · on May 5, 2022

This is it I think. https://news.ycombinator.com/item?id=31033758

skrtskrt · on May 5, 2022

I always copy-paste into that box as well, they should probably make at least an attempt at disabling pasting into it

dustymcp · on May 6, 2022

Azure has the same when deleting a database just a verify this is the correct one by typing the db name

akavel · on May 5, 2022

I heard of this technique, but unfortunately I don't see how it can be easily applied in software engineering/devops.

Also, I now realized that aviation checklists seem to tend to be done similarly with gestures - at least from what I saw on YouTube, not sure if that's representative or only used during education (?)

samus · on May 5, 2022

Spelling out loudly the command you are about to execute and explaining the reasoning behind it can help a lot too.

akavel · on May 6, 2022

Ok, but am I to do it on every single command I do on my terminal? Or on which ones specifically? If the problem we're trying to solve is that I can sometimes overlook the "dangerous commands" among "safe ones", by definition of overlooking it won't work if I tell myself to "spell out the command only in case of the dangerous ones", no?

I'm honestly trying to think of the way how I could approach this for myself, just I don't see a clear solution yet that wouldn't require me to spell out everything I type in my terminal window.

emerged · on May 5, 2022

“I’m removing that semicolon!” (Pointing)

bbarnett · on May 5, 2022

Parent meant this sort of pointing.

https://t.co/TjfX5K54H7

irrational · on May 5, 2022

Because everyone assumes that everyone else is looking at it more closely than they are. “I’ll just do a cursory look since I’m sure everyone else is doing a in-depth look.” Narrator: nobody did an in-depth search.

slaymaker1907 · on May 5, 2022

I'm a fan of doing things temporally so data is very rarely actually deleted from the database. Most of the time, you just update the "valid_to" field to the current time. Sometimes real deleted are required such as with privacy requests, but I think that sort of thing is pretty rare.

If your application has space concerns, you can modify this approach to be like a recycle bin where you delete records which are no longer valid and have been invalid for over a month (or whatever time frame is appropriate for your application). However, I think this is unnecessary in most cases except for blob/file storage.

Danieru · on May 5, 2022

That form had a couple weird checkboxes with odd wording. It is a famous mistake, but also rather understandable just because the form was cryptic.

dkersten · on May 5, 2022

> Multiple reviewers here didn't catch the mistake

Sure, but we can only do so much. I find its good bang for buck and alternatives that might prevent that are not always available, so we do the best we can. You gotta make a call on whether its enough or not.

dredmorbius · on May 5, 2022

mv then rm is another idiom. So long as you have the space.

For database entries, flag for deletion, then delete.

In the files case, the move or rename also accomplishes the result of breaking any functionality which still relies on those file ... whilst you can still recover.

Way back in the day I was doing filesystem surgery on a Linux system, shuffling partitions around. I meant to issue the 'rf -rm .' in a specific directory, I happened to be in root.

However ...

- I'd booted a live-Linux version. (This was back when those still ran from floppy).

- I'd mounted all partitions other than the one I was performing surgery on '-ro' (read-only).

So what I bought was a reboot, and an opportunity to see what a Linux system with an active shell, but no executables, looks like.

Plan ahead. Make big changes in stages. Measure twice (or 3, or 10, or 20 times), cut once. Sit on your hands for a minute before running as root. Paste into an editor session (C-x C-e Readline command, as noted elsewhere in this thread).

Have backups.

marcosdumay · on May 5, 2022

You mean cp then rm?

And yes, copy, verify, delete. And make sure by the code structure that you either do the three on the same files, or their fail.

Also, do it slowly, with just a bit of data on each iteration. That will make the verification step more reliable.

Anyway, for a huge majority of cases, only having backups is enough already. Just make sure to test them.

dredmorbius · on May 6, 2022

No, mv.

Example:

  cd datadir
  mkdir delete
  mv <list of files to be deleted> ./delete
  # test to see if anything looks broken.  
  # This might take a few seconds, or months, though it's usually reasonably brief.
  rm -rf ./delete

The reasons for mv:

- It's atomic (on a single filesystem). There's no risk of ending up with a partial operation or an incomplete operation.

- It doesn't copy the data, it renames the file. (mv and rename are largely synonyms.)

- There's no duplication of space usage. Where you're dealing with large files, this is helpful.

The process is similar to the staged deletion most desktop OS users are familiar with, of "drag to trash, then empty trash". Used in the manner I'm deploying it, it's a bit more like a staged warehouse purge or ordering a dumpster bin --- more structured / controlled staged deletion than a household or small office might use.

andi999 · on May 5, 2022

I think mv then rm is probably meant as 'windows trash bin' style.

crispyambulance · on May 5, 2022

  > ... Then have multiple people inspect the logs. Once ok'd, run again, with manual prompts after each log item asking to continue...

This sort-of reminds me of some "critical" work I had to do a couple of decades ago. I was in a shop that used this horrifically tedious tool for designing masks for special kinds of photonic devices-- basically it was tracing out optical waveguides that would be placed on a crystal that was processed much like a silicon IC.

The process was for TWO of us to sit in front of computer and review the curves in this crazy old EDA layout tool called "L-edit" before it got sent to have the actual masks made (which were very expensive). It took HOURS to check everything.

The first hour was tolerable but then boredom started to creep in and we got sloppy. The whole reason TWO people got tasked with this was because it was thought that we would keep each other focused-- 2 pairs of eyes are better than one, right?. Instead, it just underscored the tedium of it all. One day someone walked in and found us BOTH in DEEP SLEEP in front of the monitor. Having two people didn't decrease the waste caused by mistakes, it just bored the hell out of more people.

foota · on May 5, 2022

How many mistakes did you catch?

crispyambulance · on May 6, 2022

ONE real one and some occasional nitpicks to show that we were busy (after being caught asleep).

Was it worth it? No, I don't think so from an opportunity cost perspective-- even though we were the most junior folks there. A mind is a terrible thing to waste!

Freestyler_3 · on May 5, 2022

From his story I can tell he found one big mistake. The tedious work itself.

mmmm2 · on May 5, 2022

Another good approach is do deletions slowly. Put sleeps between each operation, and log everything. That way if you realize something is broken, you have a chance of catching it before it's too late.

JadeNB · on May 5, 2022

> Then have multiple people inspect the logs.

I think that this is the most important part of any check. Your parent refers to checking the log five times, but, at least in my experience, I won't catch any more errors on the fifth time than the first—if I once saw what I expected rather than what was there, I'll keep doing so. Of course everyone has their blind spots, but, as in the famous Swiss-cheese approach, we just hope that they don't line up!

zeristor · on May 5, 2022

Yes, I love the idea of the Plan Apply.

water8 · on May 5, 2022

It never hurts to ask for another set of eyes to review. At the least if something goes awry, the blame isn't solely on you.

csours · on May 5, 2022

Make a plan, check the plan, [fix the plan, check the plan (loop)], do the plan

See PDCA for more a more time critical decision loop. https://en.wikipedia.org/wiki/PDCA

zrail · on May 5, 2022

Another technique that I've used with good success is to write a script that dumps out bash commands to delete files individually. I can visually inspect the file, analyze it with other tools, etc and then when I'm happy it's correct just "bash file_full_of_rms.sh" and be confident that it did the right thing.

francis-io · on May 5, 2022

This was taught to me in my first linux admin job.

I was running commands manually to interact with files and databases, but was quickly shown that even just writing all the commands out, one by one gives room personally review and get a peer review, and also helps with typos. I could ask a colleague "I'm about to run all these commands on the DB, do you see any problem with this?". It also reduces the blame if things go wrong if it managed to pass approval by two engineers.

While I'm thinking back, another little tip I was told was to always put a "#" in front of any command I paste into a terminal. This stops accidentally copying a carriage return and executing the command.

koolba · on May 5, 2022

> This stops accidentally copying a carriage return and executing the command.

For a one-liner sure, but a multi line command can still be catastrophic.

Showing the contents of the clipboard in the terminal itself (eg via xclip) or opening an editor and saving the contents to a file are usually better approaches. The latter let’s you craft the entire command in the editor and then run it as a script.

afiori · on May 5, 2022

From [0]:

[For Bash] Ctrl + x + Ctrl + e : launch editor defined by $EDITOR to input your command. Useful for multi-line commands.

I have tested this on windows with a MINGW64 bash, it works similarly to how `git commit` works; by creating a new temporary file and detecting* when you close the editor.

[0] https://github.com/onceupon/Bash-Oneliner

* Actually I have no idea how this works; does bash wait for the child process to stop? does it do some posix filesystem magic to detect when the file is "free"? I can't really see other ways

mh- · on May 5, 2022

It does create and give a temporary file path to the editor, but then simply waits for the process to exit with a healthy status.

Once that happens, it reads from the temporary file that it created.

remram · on May 5, 2022

The 'enable-bracketed-paste' setting is an easier and more reliable way to deal with that: https://unix.stackexchange.com/a/600641/81005

It will prevent any number of newlines from running the commands if they're pasted instead of typed.

You can enable it either in .inputrc or .bashrc (with `bind 'set enable-bracketed-paste on'`)

cruano · on May 5, 2022

That was our SOP for running DELETE SQL commands on production too, a script that generates a .sql that's run manually. It saved out asses a fair amount of times

ineedasername · on May 5, 2022

Yeah, wish I'd learned that the easy way. Fresh into one of my first jobs I was working with a vendor's custom interface to merge/purge duplicate records. It didn't have a good method of record matching on inserts from the customer web interface so a large % of records had duplicates.

Anyway, I selected what I though was a "merge all duplicates" option without previewing results. What I had actually done was "merge all selected". So, the system proceeded to merge a very large % of the database... Into One. Single. Record.

Luckily the vendor kept very good backups, and so I kept my job. Because I also luckily had a very good boss and I had already demonstrated my value in other ways, he just asked me "Well, are you going to make that mistake again?". I wisely said no, and he just smiled and said "Then I think we're done here."

I have been particularly fortunate throughout my career to have very good managers. As much as managers get a lot of flack here on HN, done well they are empowering, not a hindrance, and I attribute a lot of success in my career to them.

JadeNB · on May 5, 2022

> Yeah, wish I'd learned that the easy way.

I think that, if you've only learned something like that the easy way, then you haven't learned it yet. As long as everything's only ever gone right, it's easy to think, I'm in a rush this one time, and I've never really needed those safety procedures before, ….

karlding · on May 5, 2022

At a previous job the DB admin mandated that everyone had to write queries that would create a temporary table containing a copy of all the rows that needed to be deleted. This data would be inspected to make sure that it was truly the correct data. Then the data would be deleted from the actual table by doing a delete that joined against the copied table. If for some reason it needed to be restored, the data could be restored from the copy.

hinkley · on May 5, 2022

I tend to write one script that emits a list of files, and another that takes a list of files as arguments.

It's simple to manually test corner cases, and then when everything is smooth I can just

    script1 | xargs script2

It's also handy if the process gets interrupted in the middle, because running script1 again generates a shorter list the second time, without having to generate the file again.

When I'm trying to get script1 right I can pipe it to a file, and cat the file to work out what the next sed or awk script needs to be.

KMnO4 · on May 5, 2022

Ah, I’m glad I’m not the only one who did this. It also means that you can fix things when they break halfway. Say you get an error when the script is processing entry 101 (perhaps it’s running files through ffmpeg). Just fix the error and delete the first 100 lines.

wildmanx · on May 6, 2022

The only issue with that is if subsequent lines implicitly assume that earlier ones executed as expected, e.g. without error.

Over-simplified example:

1. Copy stuff from A to B

2. Delete stuff from A

(Obviously you wouldn't do it like that, but just for illustration purposes.) It's all fine, but (2) assumes that (1) succeeded. If it didn't, maybe no space left, maybe missing permissions on B, whatnot, then (2) should not be executed. In this simple example you could tie them with `&&` or so (or just use an atomic move), but let's say these are many many commands and things are more complex.

XorNot · on May 5, 2022

At the point you're doing this, you should be using a proper programming language with better defined string handling semantics though. In every place it comes up you'll have access to Python and can call the unlink command directly and much more safely - plus a debugging environment which you can actually step through if you're unsure.

zrail · on May 5, 2022

Eh, I think that misses the point a bit. Use whatever you want to generate the output, but make the intermediary structure trivial to inspect and execute. If you're actually taking the destructive actions within your complicated* logic then there's less room to stop, think, and test.

You could always generate an intermediary set, inspect/test/etc, and then apply it with Python. I've done that too, works just as well. The important thing is to separate the planning step from the apply step.

* where "complicated" means more complicated than, for ex, `rm some_path.txt` or `DELETE FROM table WHERE id = 123`.

bambax · on May 5, 2022

Yes. Also, maybe not have a delete action in the middle of a script. It's usually better to build a list of items to be deleted. In that case, two lists: items to be deleted, items to be kept. Then compare the lists:

- make sure the sum of their lengths == number of total current items

- make sure items_to_be_kept.length != 0

- make sure no two items appear in both lists

- check some items chosen at random to see if they were sorted in the correct list

At this point the only possible mistake left is to confuse the lists and send the "to_be_kept" one to the delete script; a dry run of the delete list can be in order.

ectopod · on May 5, 2022

This. The original approach can fail horribly if there's a problem on the server when you run the script for real. Your code can be perfect but that's no guarantee the server will always return what it ought to.

pc86 · on May 5, 2022

I've had good success with this approach, have two distinct scripts generate the two lists, then in addition to your items here also checking that every item appears in one of the lists.

ufo · on May 5, 2022

What do you recommend, to not get intro trouble if there are spaces or newlines in the file names?

marcosdumay · on May 5, 2022

Try not to delete stuff with Bash.

This is the most reliable way. Bash has a few niceties for error handling, but if you are using them, you would probably fare better in another language.

If you do insist on Bash, quote everything, and use the "${var}" syntax instead of "$var". Also, make sure you handle every single possible error.

ricardobeat · on May 5, 2022

`set -e` will abort on any error, anywhere in the pipeline. It’s a must for any critical script.

kevinmgranger · on May 5, 2022

Don't use a shell script.

ufo · on May 5, 2022

Do you mean, always pass the list directly to the next script via function calls, without writing it to an intermediate file / pipeline?

kevinmgranger · on May 6, 2022

I'm being flippant, because shell scripts are so inherently error prone they're to be avoided for critical stuff like this.

If you _absolutely_ must use a shell script:

0. Use shellcheck, which will warn you about many of the below issues: https://www.shellcheck.net/

1. understand how quoting and word splitting work: https://mywiki.wooledge.org/Quotes

2. if piping files to other programs, using `-print0` or equivalent (or even better, if using something like find, its built in execution options): https://mywiki.wooledge.org/UsingFind

3. Beware the pitfalls (especially something like parsing `ls`): https://mywiki.wooledge.org/BashPitfalls

(warning: the community around that wiki can be pretty toxic, just keep that in mind when foraying into it.)

plonk · on May 5, 2022

Yes, use the list argument to Python’s subprocess.run for example. It’s much easier to not mess up if your arguments don’t get parsed by a shell before getting passed.

gilleain · on May 5, 2022

Yes, I find command line tools that have a "--dry-run" flag to be very helpful. If the tool (or script or whatever) is performing some destructive or expensive change, then having the ability to ask "what do you think I want to do?" is great.

It's like the difference between "do what I say" and "do what I mean"...

bzxcvbn · on May 5, 2022

That's what I like about powershell. Every script can include a "SupportsShouldProcess" [1] attribute. What this means is that you can pass two new arguments to you script, which have standardized names across the whole platform:

- -WhatIf to see what would happen if you run the script;

- -Confirm, which asks for confirmation before any potentially destructive action.

Moreover these arguments get passed down to any command you write in your script that support them. So you can write something like:

    [CmdletBinding(SupportsShouldProcess)]
    param ([Parameter()] [string] $FolderToBeDeleted)
    
    # I'm using bash-like aliases but these are really powershell cmdlets!
    echo "Deleting files in $FolderToBeDeleted"
    $files = @(ls $FolderToBeDeleted -rec -file)
    echo "Found $($files.Length) files"
    rm $files

If I call this script with -WhatIf, it will only display the list of files to be deleted without doing anything. If I call it with -Confirm, it will ask for confirmation before each file, with an option to abort, debug the script, or process the rest without confirming again.

I can also declare that my script is "High" impact with the "ConfirmImpact = High" switch. This will make it so that the user gets asked for confirmation without explicitly passing -Confirm. A user can set their $ConfirmPreference to High, Medium, Low, or None, to make sure they get asked for confirmation for any script that declare an impact at least as high as their preference.

[1]: https://docs.microsoft.com/en-us/powershell/scripting/learn/...

spookthesunset · on May 5, 2022

I’m a bit confused (because I didnt read the docs)… does calling it with “—whatif” exercise the same code path as calling without, only the “do destructive stuff” automagically doesn’t do anything? Or is it a separate routine that you have to write?

Cause if it is an entirely separate code path, doesn’t that introduce a case where what you say you’ll isn’t exactly what actually happens?

justsomehnguy · on May 6, 2022

Well, just read the...

> because I didnt read the docs

Ouch.

> Or is it a separate routine that you have to write?

If you are writing a function or a module what would do something (eg API wrapper) then of course you need to write it yourself.

But if you are writing just a script for your mundade one-time/everyday tasks and call cmdlets what supports ShouldProcess then it works automagically. Issuing '-whatif' for the script would pass `-whatif` to any cmdlet what has 'ShouldProcess' in it's definition. Of course if someone made a cmdlet with a declared ShouldProcess but didn't write the logic to process it - you are out of luck.

But if have a spare couple of minutes check the docs in the link, it was originally a blog post by kevmarq, not a boring autodoc.

bzxcvbn · on May 5, 2022

It's the first option. And yes, sometimes you have to be careful if you want to implement SupportsShouldProcess correctly, it's not something you can add willy-nilly. For example, if you create a folder, you can't `cd` there in -WhatIf mode.

mmcclimon · on May 5, 2022

The rule we have is that anything that is not idempotent and not run as a matter of daily routine must dry-run by default, and not take action unless you pass --really. This has saved my bacon many times!

maweki · on May 5, 2022

Deleting actually is idempotent. Doing it twice wont be different from doing it once.

maccard · on May 5, 2022

Deleting * may not be though. Your selection needs to be idempotent.

maweki · on May 5, 2022

idempotency means that f(X) = f(f(X)). Modifying the X inbetween is not allowed. Is there really an initial environment where rm * ; rm * ; does something different than rm * once?

einsty · on May 5, 2022

In the case of any live system, i would say yes. Additional, and different, files could have appeared on the file system in between the times of each rm *.

mikeryan · on May 5, 2022

* is just short hand for a list of files. Calling rm with the same list of files will have the same results if you call it multiple times. That’s idempotent.

Your example is changing the list of files, or arguments to rm between runs. Same as pc85’s example where the timestamp argument changes.

pc86 · on May 5, 2022

In addition to what einsty said (which is 100% accurate), if you're deleting aged records, on any system of sufficient size objects will become aged beyond your threshold between executions.

jameshart · on May 5, 2022

Right. You can kind of consider the state of a filesystem on which you occasionally run rm * purges to be a system whose state is made up of ‘stuff in the filesystem’ and ‘timestamp the last purge was run’.

If you run rm * multiple times, the state of the system changes each time because that ‘timestamp’ ends up being different each time.

But if instead you run an rm on files older than a fixed timestamp, multiple times, the resulting filesystem is idempotent with respect to that operation, because the timestamp ends up set to the same value, and the filesystem in every case contains all the files added later than that timestamp.

hansel_der · on May 5, 2022

> Is there really an initial environment where rm * ; rm * ; does something different than rm * once?

if * expands to the rm binary itself, maybe.

maweki · on May 5, 2022

How is the system different after the first and after the second call?

jgoldshlag · on May 5, 2022

If there is an rm executable in the current directory, and also one later in your PATH, the second run might use a different rm that could do whatever it wants to

dotancohen · on May 6, 2022

This is actually a likely scenario, as it is common to alias rm to rm -i. Though your bash config will still run after .bashrc is nuked, some might wrap with a script instead of aliasing (e.g., to send items to Trash).

hansel_der · on May 6, 2022

# rm rm

rm: command not found

zrail · on May 5, 2022

Early in my career I used --yes-i-really-mean-it and then a coworker removed it with the commit message "remove whimsy".

T'was a sad day.

rjh29 · on May 5, 2022

Going further, make it dry run by default and have an --execute flag to actually run the commands: this encourages the user to check the dryrun output first.

FriedrichN · on May 5, 2022

All my tools that have a possible destructive outcome use either a interactive stdin prompt or a --live option. I like the idea of dry running by default.

kortex · on May 5, 2022

This is why I like to always write any sort of user-script batch-job tools (backfills, purges, scrapers) with a "porcelain and plumbing" approach: The first step generates a fully declarative manifest of files/uris/commands (usually just json) and the second step actually executes them. I've used a --dry-run flag to just output the manifest, but I just read some folks use a --live-run flag to enable, with dry-run being the default, and I like that much better so I'll be using that going forward.

This pattern has the added benefit that it makes it really easy to write unit tests, which is something often sorely lacking in these sorts of batch scripts. It also makes full automation down the line a breeze, since you have nice shearing layers between your components.

http://www.laputan.org/mud/mud.html#ShearingLayers

InfoSecErik · on May 5, 2022

I tend towards a --dry-run flag for creative actions and --confirm for destructive actions. Probably sightly annoying that the commands end up seemingly different, but it sure beats accidentally nuking something important.

mkr-hn · on May 5, 2022

This sounds like a "do nothing script."

https://news.ycombinator.com/item?id=29083367

It defaults to not doing anything so you can gradually and selectively have it do something.

Learned about when I posted my command line checklist tool on HN: https://github.com/givemefoxes/sneklist

(https://news.ycombinator.com/item?id=25811276)

You could use it to summon up a checklist of to-dos like "make sure the collection in the dictionary has the expected number of values" before a "do you want to proceed? Y/n"

mipmap04 · on May 5, 2022

I do this, too, but I also take a count of the expected number of items to be deleted as well. If my collection I'm iterating over doesn't have exactly that number of objects I expect, I don't proceed.

lifthrasiir · on May 5, 2022

Human-in-the-loop is so important concept in ops and yet everyone (that's including me) seems to learn it the hard way.

pc86 · on May 5, 2022

I just want to say as someone currently working on a script to delete approximately 3.2TB of a ~4TB production database, this subthread is pure gold.

rawgabbit · on May 5, 2022

To ensure that the files are actually are downloaded (step1), before deleting the original (step2). I would make make step1 an input to step2. That is step2 cannot work without step1. Something like:

    (step1) Download video from URL.  Include the Id in the filename.
    (step2) Grab the list of files that have been downloaded and parse to get the Id.  Using the Id, delete the original file.

veltas · on May 5, 2022

Yep, even writing a simple wildcard at command-line I will 'echo' before I 'rm'.

pjerem · on May 5, 2022

On computers I own, I always install "trash-cli" and i even created an alias for rm to trash. It's like rm, but it goes to the good old trash. It will not save your prod but it's pretty useful on your own computer at least.

cryptoboid · on May 6, 2022

That's a good tip, thanks!

mbiondi · on May 5, 2022

Agreed, I've also been burned doing stupid things like this and always print out the commands and check them before actually doing the commit.

As they say, measure twice, cut once.

Don't feel bad, I think every professional in IT goes through something similar at one time or another.

V__ · on May 5, 2022

This was my first thought too. Another think I like to do, is to limit the loop to say one page or 10 entries and check after each run that it was correctly executed. It makes it a half-automated task, but saves time in the long run.

hinkley · on May 5, 2022

Condensed to aphorism form:

    Decide, then act.

There's a whole menagerie of failure modes that come from trying to make decisions and actions at the same time. This is but one of them.

Another of my favorites is egregious use of caching, because traversing a DAG can result in the same decision being made four or five times, and the 'obvious' solution is to just add caches and/or promises to fix the problem.

As near as I can tell, this dates back to a time when accumulating two copies of data into memory was considered a faux pas, and so we try to stream the data and work with it at the same time. We don't live there anymore, and because we don't live there anymore we are expected to handle bigger problems, like DAGs instead of lists or trees. These incremental solutions only work with streams and sometimes trees. They don't work with graphs.

Critically, if the reason you're creating duplicate work is because you're subconsciously trying to conserve memory by acting while traversing, then adding caches completely sabotages that goal (and a number of others). If you build the plan first, then executing it is effectively dynamic programming. Or as you've pointed out, you can just not execute it at all.

Plus the testing burden is so drastically reduced that I get super-frustrated having to have this conversation with people over and over again.

GordonS · on May 5, 2022

It's amazing the number of times I look at some simple code and think "nah, this is so simple it doesn't need a test!", add tests anyway (because I know I should)... and immediately find the test fails because of an issue that would have been difficult to diagnose in production.

Automated tests are awesome :)

Too · on May 7, 2022

A few assertions would have also stopped this.

    During buildup of the our_id list: assert (vimeoId not in our_ids). 
    After creating the list:  assert len(set(our_ids)) > 10000 and assert len(set(our_ids)) == len(our_ids)
    Before each final deletion: assert id not in hardcoded_list_of_golden_samples. 
    Depending on the speed required you could hit the api again here as an extra check.

But as always everything is obvious in hindsight. Even with the checks above, Plan+Apply is the safest approach.

ineedasername · on May 5, 2022

>literally simple prints statements

Yes, that can be a simple but powerful live on screen log. I developed a library to use an API from a SaaS vendor, in much the same way as the author. It was my first such project & I learned the hard way (wasted time, luckily no data loss or corruption) that print() was an excellent way to keep tabs on progress. On more than one occasion it saved me when the results started scrolling by and I did an oh sh*t! as I rushed to kill the job.

aqme28 · on May 5, 2022

Rather than commenting it out, I suggest adding a --live-run flag to scripts and checking the output of --live-run=false (or omitted) before you run it "live."

sdevonoes · on May 5, 2022

But then you have double the chances of introducing a bug for the specific scenario we are talking about:

Before: there is chance there is a bug in my "delete" use case

Now: what we have before plus the change that there is a bug in my "--live-run" flag

aqme28 · on May 5, 2022

You can make automated tests for your flag. You can’t make automated tests for your code comments.

ivanhoe · on May 6, 2022

Beside doing this, I like to first just move files to another dir (keeping the relative path) instead of deleting them. It's basically like a DIY recycle bin.

If both paths are on the same disk moving files is a fast operation - and if you discover a screw up, you can easily undo it. On the other hand if everything still looks fine after a few days, you just `rm -rf` that folder and purge the files.

inglor_cz · on May 5, 2022

Yeah, that is what I recommend too.

Instead of performing the dangerous action outright, just log a message to screen (or elsewhere) and watch what is happening.

Alternatively, or subsequently, chroot and try that stuff on some dummy data to see if it actually works.

sam0x17 · on May 5, 2022

Indeed. I would say that framework or even language-level support for putting things in "dry-run" mode is something sorely missed from many modern frameworks and languages, that old C libraries used to do.

jagged-chisel · on May 5, 2022

This is how I do it in compiled code. In shell, I print the destructive command for dry runs - no conditions around whether to print or not, I go back to remove echo and printf to actually run the commands.

hayd · on May 5, 2022

I'd make sure those include WARN or ERROR (I'd use logging to do that), that way you can grep for those. Spot checking might be difficult if the logs get long.

krono · on May 5, 2022

The No. 2 philosophy!

Make sure you got everything out and off before you pull up your pants, or else you better be prepared to deal with all the shit that might follow!

password4321 · on May 5, 2022

   SELECT COUNT(1) FROM table 
   -- UPDATE table SET col='val'
   WHERE 1=1

worble · on May 5, 2022

    BEGIN TRANSACTION 
    UPDATE table SET col='val' WHERE 1=1
    ROLLBACK

password4321 · on May 5, 2022

Definitely better, when you can afford the overhead!

tomrod · on May 5, 2022

Exactly!

abrookewood · on May 5, 2022

100% on the logging and dry run.

thunderbong · on May 5, 2022

That is called experience.

Good decisions come from experience. Experience comes from making bad decisions.

dncornholio · on May 5, 2022

Dry run really is key here. Most automated tests wouldn't find this bug.

OrwellianTimes · on May 5, 2022

Experience is the best teacher™