Hacker News new | past | comments | ask | show | jobs | submit login

> A lobste.rs user asked why you would use find | xargs rather than find -exec. The answer is that it can be much faster. If you’re trying to rm 10,000 files, you can start one process instead of 10,000 processes!

Fair enough, but I still favor find -exec. I find it generally less error prone, and it's never been so slow that I wished I had instead used xargs.

Also, if you're specifically using -exec rm with find, you could instead use find with -delete.




A benefit I didn't mention in the post (but probably should) is that the pipe lets you interpose other tools.

That is, find -exec is sort of "hard-coded", while find | xargs allows obvious extensions like:

    find | grep | xargs   # filter tasks

    find | head | xargs   # I use this all the time for faster testing

    find | shuf | xargs
Believe it or not I actually use find | shuf | xargs mplayer to randomize music and videos :)

So shell is basically a more compositional language than find (which is its own language, as I explain here: http://www.oilshell.org/blog/2021/04/find-test.html )


You can also use `find -exec` with `'+'` instead of `';'` as the terminator. This will call `rm` on all of the found files in one call.


I tend to prefer xargs because it works in more contexts e.g. I've got a tool which automatically generates databases but sometimes the cleanup doesn't work. `find -exec` does nothing, but `xargs -n1 dropdb` (following an intermediate grep) does the job. From there, it makes sense to… just use xargs everywhere.

And I always fail to remember that the -exec terminator must be escaped in zsh, so using -exec always takes me multiple tries. So I only use -exec when I must (for `find` predicates).


i agree. `find somewhere -exec some_command {} +` can be dramatically faster. but it does not guarantee a single invocation of `some_command`, it may make multiple invocations if you pass very large numbers of matching files

after spending a bit of time reading the man page for find, i rarely use xargs any more. find is pretty good.

tangent:

another instance i've seen where spawning many processes can lead to bad performance is in bash scripts for git pre-recieve hooks, to scan and validate the commit message of a range of commits before accepting them. it is pretty easy to cobble together some loop in a bash script that executes multiple processes _per commit_. that's fine for typical small pushes of 1-20 commits -- but if someone needs to do serious graph surgery and push a branch of 1000 - 10,000 commits that can can cause very long running times -- and more seriously, timeouts, where the entire push gets rejected as the pre-receive script takes too long. a small program using the libgit2 API can do the same work at the cost of a single process, although then you have the fun of figuring out how to build, install and maintain binary git pre-receive hooks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: