These days, I've been very implicit in how I run rm. To the extent that I don't ...

kstrauser · on Feb 1, 2017

BTW,

  find ... -delete

avoids any potential shell escaping weirdness and saves you a fork() per file.

wodny · on Feb 1, 2017

This seems to be the best here. As a side note: if someone does something more complicated and uses piping find output to xargs, there are very important arguments to find and xargs to delimit names with binary zero -- -print0 and -0 respectively.

Very interesting article: https://www.dwheeler.com/essays/fixing-unix-linux-filenames.....

willemmali · on Feb 1, 2017

I've been writing an `sh`-based tool to check up on my local Git repos, and it uses \0-delimited paths and a lot of `find -print0` + `xargs -0`:

https://gitlab.com/willemmali-sh/chegit/blob/master/chegit#L...

I admit the code can look a little weird, but it was because I had some rather tight contrainst: 1 file, all filenames `\0` separated internally and just POSIX `sh`. I still wanted to reuse code and properly quote variables inside `xargs` invocations (because `sh` does not support `\0`-separated read's), so I ended up having to basically paste function definitions into strings and use some fairly expansive quotation sequences.

bpchaps · on Feb 2, 2017

Nice plug for gitlab ;).

\0 is an insanely useful separator for this sort of thing and yeah, it definitely gets messy. I'm working on a similar project that uses clojure/chef to read proc files in a way that causes as little overhead as possible. \0 makes life so much easier used. The best example I can think of off of the top of my head is something similar to:

  bash -c "export FOO=1 ; export BAR=2 && cat /proc/self/environ | tr '\0' '\n' | egrep 'FOO|BAR'"
  FOO=1
  BAR=2

willemmali · on Feb 2, 2017

I was so freaked out at the news, I normally have local backups of my projects but I just happened to be in the middle of a migration where my code was just on Gitlab, and then they went down... Luckily it all turned out OK.

\0 is very useful but I really wish for an updated POSIX sh standard with first-class \0 support.

On your code, why do you replace \0's with newlines? egrep has the -z flag which makes it accept \0-separated input. A potential downside to it is that it automatically also enables the -Z flag (output with \0 separator).

I solved the "caller might use messy newline-separated data"-problem by having an off-by-default flag that makes all input and output \0-separated; this is handled with a function called 'arguments_or_stdin' (which does conversion to the internal \0-separated streams) and 'output_list' (which outputs a list either \0- or \n-separated depending on the flag).

gizmo · on Feb 1, 2017

Good advice.

I would add a step where you dump the output of find (after filtering) into a textfile, so you have a record of exactly what you deleted. Especially when deleting files recursively based on a regular expression that extra step is very worthwhile.

It's also a good practice to rename instead of delete whenever possible. Rename first, and the next day when you're fresh walk through the list of files you've renamed and only then nuke them for good.