Find is a beautiful tool

pstadler · on July 24, 2011

`find` is beautiful, no discussion. Together with tools like xargs there's hardly anything you can't do with it.

Following is slightly off-topic but for searching inside files like

    find . -name "*.css" -exec grep -l "#content" {} \;

try `ack` instead

    ack --css '#content'

Above command will recursively search for '#content' in all css files under the current directory. It is faster than grep and does automatically exclude your hidden version control folders. No need to mess with `--exclude`. I really stopped using grep at all. ack can be found here: http://betterthangrep.com/

Groxx · on July 24, 2011

Interesting... would love to use it, but I'm using haml and sass files, and I can't seem to --type-add anything. I get this:

  > ack --type-add haml=.haml
  ack: --type-add: Type "haml" does not exist, creating with ".haml" ...
  ack: No regular expression found.

followed by being unable to use the type, and not showing it in --help types. Am I using it incorrectly? I got it via `brew install ack`, so that could be the problem as well.

edit: --type-set=haml=.haml in a .ackrc file works. Though I may have to tweak it to look in the current directory as well - not all projects need the same settings, and in `~` it's not git-able.

shabble · on July 24, 2011

if you want a dir-specific one, you could specify

    export ACKRC=./.ackrc.local

as an env variable, and then create that file where-ever you need to override default behaviour, otherwise it'll use

~/.ackrc

The downside to doing it this way is that you only get one or the other, you can't (afaik) load ~/.ackrc, and then only override certain aspects of it locally. Also, you're constrained to running it from the basedir of your project. (You might be able to overcome this with a hack based on PROMPT_COMMAND (for bash), or a precmd/preexec function for zsh), which redefines ACKRC appropriately.

Regarding the 'gitability' of it, there's no reason why you can't have a ~/.dotfiles/ackrc, with a symlink to ~ That's how I manage my dotfiles anyway, along with a makefile to create any missing symlinks.

Something like http://kitenet.net/~joey/code/etckeeper/ might work out even better, although I've never really found time to play with it.

clebio · on July 24, 2011

I've struggled to understand 'find' for a while. It doesn't seem to fit *nix conventions in its syntax. For one, the path argument comes before any options or flags. Also, what is the point of the '-print' option? Pretty much every other utility prints results to stdout by default. This is expected and necessary for piping commands together.

I'm not arguing about the value of 'find' (indeed, this recent article well explains its usefulness: http://news.ycombinator.com/item?id=2698180). I just fumble every time I try to use it because it seems to contravene conventions. I welcome any historical context or clarification if my expectations are in fact flawed.

chimeracoder · on July 24, 2011

I've had the same problem. find and dd two *nix commands which betray the Principle of Least Surprise. I still use them, but I always find myself having to think twice before I do.

(Though you should probably always think twice before using 'destroy disk' anyway!)

sigil · on July 24, 2011

> For one, the path argument comes before any options or flags.

Actually, it does conform to unix conventions:

   find [OPTIONS] [PATHS] [EXPRESSION]

Where options are "-PLHDO" in the GNU version. The elements of EXPRESSION may be introduced by a dash, but they're not really options. Together, they specify a little program to evaluate on every path. The program could have easily taken the form 'type f a name foo' instead of '-type f -a -name foo'. The only issue would be recognizing the end of PATHS arguments and the beginning of EXPRESSION arguments...perhaps that's why the dash is used? It does seem like OPTIONS EXPRESSION [--] PATHS would be easier and less ambiguous to parse.

> Also, what is the point of the '-print' option? Pretty much every other utility prints results to stdout by default.

find does print to stdout by default. Perhaps -print was added later to distinguish the default mode from -print0, which is common when pairing with xargs.

> I welcome any historical context or clarification if my expectations are in fact flawed.

I was kind of hoping the recently rehabilitated UNIX 1972 sources would shed some light on your questions. Unfortunately, find is present but only as a PDP-11 executable [1].

[1] http://code.google.com/p/unix-jun72/source/browse/#svn%2Ftru...

shabble · on July 24, 2011

> find does print to stdout by default. Perhaps -print was

> added later to distinguish the default mode from -print0,

> which is common when pairing with xargs.

  find ... -print0 | xargs -0 $cmd

is essential if you don't want all sorts of terrible things happening because a filename happens to contain a space or some sort of shell metachar.

See http://en.wikipedia.org/wiki/Xargs#The_separator_problem for the details.

Personally, I use

  find ... -exec $cmd '{}' \; # calls $cmd once for each match

  find ... -exec $cmd '{}' + # calls $cmd with as matches as possible - minimising number of invocations of $cmd.

I think this is either reasonably new, or not entirely standard, since I've used versions that don't have it.

  -ok $cmd \;

is also pretty neat - same as exec, but asks for confirmation first.

silentbicycle · on July 24, 2011

It's a _very_ old command, predating many of the command conventions. As the OpenBSD man page puts it:

    HISTORY
         A find command appeared in Version 1 AT&T UNIX.

philjackson · on July 24, 2011

I like to write complex find commands with find-cmd which is bundled with emacs (disclaimer: I wrote it):

    (find-cmd '(prune (name ".svn" ".git" ".CVS"))
              '(and (or (name "*.pl" "*.pm" "*.t")
                        (mtime "+1"))
                    (fstype "nfs" "ufs"))))

becomes:

    "find '/home/phil/' \\( \\( -name '.svn' -or -name '.git' -or
      -name '.CVS' \\) -prune -or -true \\) \\( \\( \\( -name '*.pl'
      -or -name '*.pm' -or -name '*.t' \\) -or -mtime '+1' \\) -and \\(
      -fstype 'nfs' -or -fstype 'ufs' \\) \\)"

fexl · on July 24, 2011

Now that you mention -prune, I only recently figured out how to use it to exclude a specific path from a search, but my understanding is that of a trained dog.

Estragon · on July 24, 2011

If you extended this to run in emacs and put the results in a dired buffer on the fly, it'd be awesome!

phugoid · on July 24, 2011

I think there comes a time for any unix user when you reach a break-even point - the added productivity of shell commands is greater than the head-scratching, googling and man page searches.

I remember finally 'getting it', and asking myself why anyone would want a computer where you don't have such tools.

My main gripe with the shell at this point is inconsistencies in syntax from one command to another, especially when it means different dialects of regex.

tambourine_man · on July 24, 2011

My main gripe with the shell at this point is inconsistencies in syntax from one command to another, especially when it means different dialects of regex.

Check out Perl. Part of its raison d'être is to deal with this very frustration.

shabble · on July 24, 2011

Perl is awesome for it's one-liner abilities, but you still have to dance around the shell quoting issues, which can vary from shell to shell. Remembering exactly what needs to be escaped and how can get sufficiently tedious to just put the script into a file instead.

That said, there are some nice constructs such as:

  q(some string);      # same as a single-quoted string
  qq(another $string); # double-quoted string.
  qx();                # backticks, or $() in bash.

duggan · on July 24, 2011

I wrote a similar post last year (http://rossduggan.ie/blog/codetry/extracting-information-fro...); it's one of those things you can't believe you worked without before. Easily one of the best tools in a developer's arsenal.

div · on July 24, 2011

I use find all the time and have added a few handy functions to my fish shell so I can:

f '.css' # regular recursive find for css files

f '.css' '#content' # find and grep

fgu '*.css' '#content' # give a listing of unique css files containing '#content'

If anyone wants to take a peek, the f, fg, and fgu functions are on github (https://github.com/bartvandendriessche/fish-nuggets/tree/mas...).

scottw · on July 25, 2011

I wrote an article on `find` performance a few years ago, may be useful to someone:

http://scott.wiersdorf.org/blarney/071024a.html

tl;dr version:

find examines each entry in a directory hierarchy; if the entry (be it a regular file, symlink, directory, device, or whatever) matches the given criteria, it will be printed. If no criteria are given, it’s an automatic match (and will be printed).

find evaluates its expressions in the order you specify them; by carefully choosing the order of the expressions, you can shave tons of time off the cost of the search and get your wanted results quicker.

As is true with most programming endeavours, a little investment up front in crafting the expressions will yield better results later. If you’re only running a find operation once, maybe you don’t want to take too much time fussing over saving a few syscalls, but if you will be running find frequently (e.g., as part of a cron, or some other regular occurance), do your disk a favor and let find skip as much as possible.

jff · on July 24, 2011

It's not a beautiful tool, it's an over-large conglomeration of functionality that already exists in the shell. We don't even have it in Plan 9; here's how you can do some of the examples I'm seeing in comments here:

To just find a file:

  du -a . | grep filename

To find every file containing the string "foo" (the awk is necessary to remove file sizes, you could combine du and awk in a shell function if desired):

  for (i in `{du -a . | awk '{ print $2 }'}) {
    grep 'foo' $i
  }

Really, the for loop above takes care of a few of find's options. Why do you need both "-exec" and "-delete"? Exec is far more versatile, you can just exec "rm" on the files. Except that as I've shown above, it's already quite easy to execute a program for every file that matches some criteria, and you don't have to learn the dozens of options used in find--you just apply the shell programming knowledge you should (hopefully) already have.

masklinn · on July 24, 2011

> It's not a beautiful tool, it's an over-large conglomeration of functionality that already exists in the shell

Find absolutely is a beautiful tool. It does one and only one thing: apply predicates to the file system.

> du -a . | grep filename

Except that's going to match on the dirname content as well, whereas `find -type f -name filename` will only match the actual file name.

And you're going to get complete junk in the first section of each line which you'll have to cleanup afterwards.

> (the awk is necessary to remove file sizes, you could combine du and awk in a shell function if desired)

Right, whereas if you use `find` it gives exactly what you want (or need)

> Why do you need both "-exec" and "-delete"?

You don't need both, but -delete provides additional clarity and forces depth-first traversal.

> Except that as I've shown above, it's already quite easy to execute a program for every file that matches some criteria

Except your examples are all broken since you're not using `basename`. Turns out it's not that easy, and it soon becomes complete shell-soup.

> and you don't have to learn the dozens of options used in find

They're not actually options, they're predicates.

tptacek · on July 24, 2011

Your argument kind of lost me at the point where you said "the awk is necessary to". You make a well-reasoned point, but really, isn't it a slippery slope? Lots and lots of Unix tools can be further broken down into smaller extant Unix tools.

ootachi · on July 24, 2011

Exactly. UI is important, and even if find can be written in terms of other tools, its superior UI justifies its existence. It's sort of like SQL: a declarative interface means that a system administrator doesn't have to think how to do something, but merely of what needs to be done in order to do it.

Minimalism at the expense of helping the user get the job done isn't the right tradeoff.

shabble · on July 24, 2011

  du -a | cut -f2

is a less sledgehammery approach than requiring awk. I don't recognise the OP's shell syntax (unless it's intended more as pseudo-code), but I think at least in bash, you'd have problems with embedded whitespace in filenames.

Find and Xargs have the option nul-separated output records specifically for this reason.

jff · on July 25, 2011

It's Plan 9's rc shell... I didn't use cut because I've got in the habit of using awk for quick shell tasks. I'll have to remember cut, though, since awk is definitely using a sledge to drive in a thumbtack.

Also in Plan 9, we typically don't have spaces in file names, but I think when we do they come out quoted... as far as I know there's no xargs.

a3_nm · on July 24, 2011

I agree that the find command isn't minimalistic and could be simplified, but the use of du in this case semes pretty ugly if you're not interested in the file sizes. A lightweight find would probably be a better solution.

glenjamin · on July 24, 2011

I'm inclined to agree, I have never taken the time to learn the various somewhat arcane switches you can pass to find (not to mention that it doesn't follow - for short -- for long switch convention).

I simply do find <path> to give me a list of all the paths under the current directory and then pipe that into grep.

masklinn · on July 24, 2011

> not to mention that it doesn't follow - for short -- for long switch convention

That's because they're not options, they're predicates. A sequence of predicates to be precise (they execute strictly in-order and are short-circuited: the first predicate failure will stop the whole evaluation unless you're using `-or`). They're closer to `test`'s operator than to `grep`'s switches.

jff · on July 24, 2011

For that purpose, I often make a shell script called "find" containing the line "du -a | awk '{print $2}'" and stick it in my bin directory... although at this point I'm so used to using du, I even do the "du -a | grep pattern" dance on Unix.

rhdoenges · on July 24, 2011

I use find for one thing and one thing only--listing all the files under the current directory. du works, but I prefer find.

    find | while read i; do grep 'foo' $i; done

ars · on July 25, 2011

That's going to fail if you have a file with a newline in it.

And what are you going to do when you want to find all files modified more then 32 hours ago?

-delete is necessary because otherwise there is a race condition which can allow a regular user to make root delete any file.

See: http://news.ycombinator.com/item?id=2699050

You should not use -exec, you should use -execdir, -exec is not safe.

code_duck · on July 24, 2011

The first example is exactly how I search for things, because I can never get the right syntax for find.

ordinary · on July 24, 2011

I've never quite understood the point of find's -exec option. What's wrong with a pipe?

CarolineW · on July 24, 2011

You can embed the name returned by find into the middle of lengthy commands by using the -exec option - it can be harder to do that with a pipe. It's also more resilient (in some cases) to filenames that have spaces in them.

arantius · on July 26, 2011

Because it's a good/easy way to insert the file name into the _middle_ of the statement. For example:

  find . -name "*.gif" -exec convert "{}" "{}.png" \;

Will convert every GIF in the current directory (and its descendants) into a PNG, assuming imagemagick's convert is on your path. Is there a better way? Not that I know of.

iamelgringo · on July 24, 2011

If you’re on Windows, I would recommend installing Cygwin to bring the power of a real shell to your OS.

_yawns_

Can we please get past statements like these? They are _so 90's.

See Powershell: http://en.wikipedia.org/wiki/Windows_PowerShell

Game_Ender · on July 24, 2011

The next step is releasing a decent terminal program for Windows. The standard console app on windows pales in comparison to those available on Gnome, KDE, and OS X based desktops. Copy and paste is difficult, resizing never seems to work right, and it has terrible font rendering.

The reason many people don't know to much about PowerShell is that cmd.exe is still the default windows shell, even after 25 years. PowerShell has been around for 4 of those, and only in the last two years did it start shipping by default with a Microsoft OS.

iamelgringo · on July 25, 2011

The next step is releasing a decent terminal program for Windows.

You're absolutely correct.

You're absolutly correct. I know that I'd pay $30 for a decent terminal program. I live in ipython / PowerShell. I want things like tabs, syntax coloring, copy / paste, automatic hyperlink generation for urls and file links.

satiani · on July 24, 2011

zsh can be a shoein replacement for bash, and its globbing facilities makes 'find' almost obsolete for me. For example:

  find . -name "*.css"

is:

  ls -d **/*.css

in zsh. Other examples, all derived from the article:

  find . -type f -name "*.css" ==> ls **/*.css(.)
  
  find . -name "*.css" -exec grep -l "#content" {} \; 
         ==> grep -l "#content" **/*.css

  find . -ctime -1 -type f ==> ls **/*(.c-1)

  find . \! -path "*CVS*" -type f -name "*.css" 
         ==> ls **/*.css(.e{'echo $REPLY | grep -v "CVS"'})

More can be found in the "Glob Qualifiers" section of the zshexpn(1) man page.

roadnottaken · on July 24, 2011

another great switch is 'maxdepth':

  -maxdepth 1 will only search the current directory
  -maxdepth 2 will search 1 folder deep

etc

I use this all the time

parfe · on July 24, 2011

find -delete -name ".pyc" is not the same as find -name ".pyc" -delete

First one nukes everything including the git repo

Second does what I expect

masklinn · on July 24, 2011

> Second does what I expect

Both do what you expect once you understand `-name` and `-delete` are predicates (not switches) and find short-cuts.

shabble · on July 24, 2011

The other big gotcha that took a while to sink in was

  find -name "foo*" -or -name "*bar" -print

which should actually be

  find \( -name "foo*" -or -name "*bar" \) -print

to do what you might expect (print things matching beginning with foo or ending with bar)

Without the parens, it gets interpreted as

  -name "foo*" -or ( -name "*bar" -and -print ),

and hence only prints things that match the latter predicate.

parfe · on July 24, 2011

Yes, well I know that now that I lost the contents of a project directory. Live and learn.

shabble · on July 24, 2011

I'm suitably conditioned at this point to always give it a trial run with -print before using -delete, and -exec echo '{}' \; before any potentially dangerous commands.

Once bitten, etc...