Stop Piping Cats

jrockway · on Feb 10, 2010

This is my least-favorite Internet meme. People "pipe cats" because they want the entire pipeline to read left-to-right, like:

   cat file | xargs foo | grep bar | sort | wc -l

It just looks nicer than:

   < file xargs foo | grep bar ...

gwern · on Feb 10, 2010

I would also add that not all tools follow the '-' convention, and the ones that do can break in corner-cases. Why invest the effort of remembering how every tool works and which ones don't, for a obfuscating micro-optimization? /proud useless user of cat

jrockway · on Feb 10, 2010

Exactly. I tried to figure out how to make xargs read from a file instead of stdin, and was unsuccessful. My "cat ... |" works every time (at the expense of 5 miliseconds of CPU time. oh noes.)

camccann · on Feb 11, 2010

There's a nice consistency in doing it that way--it's very easy to think of cat as turning an "inert" file into a stream of data, each program as a function that applies a transformation to a stream with a particular kind of content, and the pipe as function composition (given compatible stream content). A very comfortable programming-like perspective, at least to me. (Pop quiz: What programming idiom closely resembles this "repackage content and apply sequential transformations" approach?)

silentbicycle · on Feb 11, 2010

Spoiler alert: See also Oleg Kiselyov's "Monadic i/o and UNIX shell programming" (http://okmij.org/ftp/Computation/monadic-shell.html).

theblackbox · on Feb 11, 2010

Seriously?

You failed at that?

I'm intrigued because it always occured to me that you were a particularly fastidious person (with all due respect).

I often come accross issues that would seem for all the world to be a pointless artifact of an accepted convention. If this accepted convention is an acquired trait and one learns exclusivly from a limited meduim (irc frustrates me), I find it very curious that they persist as they sometimes act as a difficult barrier for a thorough understanding. This xargs thing, though I know almost nothing about it, seems to parallel similar problems I've encountered and I for one would be interested to hear your take on this (and HN in general).

Or just some food for thought for your next blog post maybe?

jrockway · on Feb 11, 2010

Well, my guess is that xargs simply does not accept a filename as the argument. This comes from reading the --help and grepping the manpage for "file".

The point was, when I used "cat file |", I knew it was going to work, and it did. When I tried to eliminate the cat by using the program's built-in ability to read a file, I had to read several manpages before determining that it was not possible. All because some tutorial says "cat is useless", when it clearly saved me much more time than the extra CPU time it used.

And if you are actually asking; xargs is just a utility to read command-line-arguments from stdin. `echo -e "foo\nbar" | xargs rm' === `rm foo; rm bar' (or depending on the xargs implementation; `rm foo bar'). It kind of reminds me of a "functor map" operation, where stdin is a functor (of command-line arguments), and command-line programs are functions. (I will now mention that xargs also does "join" on the "results" of the "function", which is very ... monad-like. But "Monads are teh awesome and everything is one" is my second-least-favorite Internet meme, so I will spare you. :)

olefoo · on Feb 11, 2010

The whole 'useless use of cat' meme is basically designed to let Randal Schwartz make fun of people. At one point it may have made a difference, and certainly if you have a shell script that is getting called 10,000 times a day it might make sense to optimise it. But for doing stuff from the commandline; weird shell acrobatics is a premature optimisation.

jrockway · on Feb 11, 2010

The whole 'useless use of cat' meme is basically designed to let Randal Schwartz make fun of people.

So true.

My requirement for a shell command is that it be reasonably easy to assemble, return something resembling the correct answer, and that it run in a reasonable amount of time.

If your requirements are more strict than those, it's time to write a real program.

Estragon · on Feb 11, 2010

  I tried to figure out how to make xargs read from a file 
  instead of stdin, and was unsuccessful.

"xargs -a <filename>" First option described in the man page on Red Hat.

jrockway · on Feb 11, 2010

My home machine (Debian GNU/Linux) says "--arg-file", and it looks like a relatively new feature. My RHEL box at work definitely didn't have anything with the word "file" in it.

gwern · on Feb 11, 2010

The whole thread descending from here is an excellent example of what I mean by corner-cases and memorization. :)

lg · on Feb 11, 2010

could be a GNU vs. BSD thing.

silentbicycle · on Feb 11, 2010

BSD xargs doesn't seem to have the -a option.

spudlyo · on Feb 10, 2010

Sometimes you can use /dev/fd/0, but I agree, it's a stupid micro optimization, and I don't understand why people get so righteous about it. My co-worker would correct you every time he saw it, but then again he's a pedantic geek.

pyre · on Feb 10, 2010

It's still good to keep it in mind, in case there are instances where it's being executed many times causing a bottleneck (though one-liners aren't really meant to be used in such situations).

wendroid · on Feb 11, 2010

zsh and rc provide primitives for this, this is rc

    diff <{ps} <{sleep 3; ps}

vidarh · on Feb 11, 2010

Equivalent in bash:

    diff <(ps) <(sleep 3; ps)

blasdel · on Feb 10, 2010

Speak for yourself, and I don't know anyone that puts the input redirection first. I do this:

  (xargs foo | grep bar | sort | wc -l) < file

It makes the pipeline one command, both lexically (verb comes first) and concretely (the pipeline is kicked off in a subshell)

barrkel · on Feb 10, 2010

At the expense of the data flowing from right to left to right, from the outside in. At least with cat, data flows unambiguously from left to right.

blasdel · on Feb 11, 2010

But using cat totally fucks up the calling conventions: it puts the operand as the second of many arguments. Which one of these doesn't belong?:

  verb file
  cat file | verb1 | verb2 | verb3
  (verb1 | verb2 | verb3) < file
  verb123 file

EDIT: Consider the calling conventions, which one of these handles it's arguments differently? Assume that verb123 is an equivalent to the pipeline -- the subshell+stdin construction lends itself to shell aliases:

  alias verb123="(verb1 | verb2 | verb3) < "

barrkel · on Feb 11, 2010

I have no idea which one doesn't belong. Is 'verb123' the moral equivalent of '(verb1 | verb2 | verb3)'? In that case, it's the first one. But that's visible from the first word on the line, so I'm still not getting the picture.

Of course, with the usual argument vs stdin conventions, either of the middle two could also be rewritten:

    verb1 file | verb2 | verb3

This is probably how I'd write such a chain.

jrockway · on Feb 11, 2010

Spoken like a Haskell programmer, who might write:

   (wc Lines) . sort . (grep "bar") . (xargs "foo") =<< file

But note in this case that all the data "flows in the same direction".

blasdel · on Feb 11, 2010

has been a Haskell programmer since 2005, before the dawn of dons

  wc Lines $ sort $ grep "bar" $ xargs "foo" =<< file

jrockway · on Feb 12, 2010

That does not parse the way you think it does; the fixity of =<< is greater than $. You would need the shell-style parens if you insist on application instead of composition :)

    Prelude> :info $
    ($) :: (a -> b) -> a -> b 	-- Defined in GHC.Base
    infixr 0 $
    Prelude> :info =<<
    (=<<) :: (Monad m) => (a -> m b) -> m a -> m b
  	-- Defined in Control.Monad
    infixr 1 =<<
    Prelude> :info .
    (.) :: (b -> c) -> (a -> b) -> a -> c 	-- Defined in GHC.Base
    infixr 9 .

Incidentally, the parens in my example are actually unnecessary. Function application is about 10, and (.) is 9.

gnosis · on Feb 11, 2010

I almost never use <

It's too easy to make a mistake and type >

..and that can ruin your whole day.

blasdel · on Feb 11, 2010

I have fucked myself more than once by typing > instead of >>

I find that to be a far more pernicious design error -- they should have made the longer token the destructive one, or used another character in it.

thornist · on Feb 11, 2010

zsh with 'setopt noclobber' won't overwrite an existing file with '>', instead requiring '>|'. The history recall conveniently fills in the '|' for you if you up-arrow after a failed call. It works pretty nicely - I imagine other shells have similar.

pyre · on Feb 10, 2010

But does the time it takes to launch the subshell equal the amount of time it takes to launch cat? If so your improvement is a NOOP.

blasdel · on Feb 11, 2010

fork + 3x(fork+exec) is going to be cheaper than 4x(fork+exec), especially if cat isn't resident.

The point is a logical improvement anyway (not burying the input argument near the beginning). I'm kind of surprised that the bash folks haven't turned cat into a builtin like they did with time and some of the other coreutils.

It's too bad it's about 30 years too late to stem the tide of shit like cat -v: http://harmful.cat-v.org/cat-v/

pyre · on Feb 11, 2010

By the reasoning in your link, your suggestion of bash making 'cat' one of the built-in commands is the "cancer that's bloating UNIX."

blasdel · on Feb 11, 2010

The shell has always been the kitchen sink full of glue holding the whole thing together. There have always been builtins: language control structures, job control, etc. -- there are some things you can't trust others not to fuck up, and where the coupling would just get ridiculous.

Here's cat as a pure sh builtin:

  shcat {
    for arg in "$@"; do
      exec 3<>"$arg"
      while read line <&3; do
        echo "$line"
      done
      exec 3>&-
    done
  }

The shell by it's very nature can't just do exactly one task well, it's a programmable environment for living in. The cancer that's bloating UNIX was the way that the BSD and especially the GNU crews took simple tools and cross-pollinated them randomly with stupid shit. Try running "/bin/true --help" on a GNU system sometime -- there's a damn good reason why "your shell may have its own version of true".

pyre · on Feb 12, 2010

  % /bin/true --help
  Usage: /bin/true [ignored command line arguments]
    or:  /bin/true OPTION
  Exit with a status code indicating success.

        --help     display this help and exit
        --version  output version information and exit

  NOTE: your shell may have its own version of true, which usually supersedes
  the version described here.  Please refer to your shell's documentation
  for details about the options it supports.

  Report bugs to <bug-coreutils@gnu.org>.

I wouldn't necessarily call that 'bloated' unless you feel that any program that uses one bit more than absolutely necessary should be scrapped as 'bloated beyond belief.'

> there's a damn good reason why "your shell may have its own version of true".

Because why exactly?

l0stman · on Feb 11, 2010

If all you want is to group commands, you could use {...} instead of (...) so you don't have to spawn a subshell.

rlpb · on Feb 11, 2010

For ad-hoc stuff, I tend to do different things depending on where my cursor is and how I built up to a complex pipeline.

wendroid · on Feb 11, 2010

You get mostly both with

    xargs foo < file | grep bar | sort | wc -l

hernan7 · on Feb 11, 2010

Exactly. On a modern computer, the gain from not piping cats is almost always negligible. I don't think most shell scripts today ever get to the point where you have to worry about inner loops and the such.

These days, if you hit a performance wall from spawning too many cats, I would think you switch to some scripting language where you have everything in one process. Premature optimization, people...

dy9 · on Feb 11, 2010

What's the rationale for piping sort into wc?

iigs · on Feb 11, 2010

That looks like it was a sample line that wasn't particularly useful, but if you do:

cat file.txt | sort | uniq | wc -l ## notice uniq

it gives you the count of unique lines. If you omit the sort in that instance it will fold duplicates together a/b/a is three lines, not two.

gnosis · on Feb 11, 2010

Instead of "sort | uniq" you could just use "sort -u"

wendroid · on Feb 11, 2010

There's also the argument about debugging, one can insert a tee in any stage of

   cat file | xargs foo | grep bar | sort | wc -l

Goladus · on Feb 10, 2010

I use cat because often I wind up doing a number of different commands on the same file. It's a lot easier to edit the end of the line, especially if you're just adding another filter, than to go back and modify the beginning.

eg

     cat file | less
     cat file | grep thing
     cat file | grep otherthing
     cat file | grep otherthing | cut stuff

instead of

     less file
     grep thing file
     grep otherthing file
     grep otherthing file | cut stuff

prodigal_erik · on Feb 12, 2010

Be aware that "cat file | less" is much more expensive than "less file" because it forces less to buffer everything it reads (it can't seek on a pipe). "cat huge-logfile | tail" is especially bad because it uselessly reads the whole file (and evicts a bunch of more important data from your buffer cache) where "tail huge-logfile" would just seek backwards from the end until it has enough text.

swolchok · on Feb 11, 2010

Sounds like you might want to learn about !$ (last token of previous command).

Goladus · on Feb 11, 2010

Yeah that's interesting, but the second version is still more complicated to assemble and takes more keystrokes, assuming up arrow gives the entire previous command.

swolchok · on Feb 12, 2010

Let's count for your example. In all cases, I'll exclude the actual name of the file. We type in the first command:

     cat file | less
     less file

The second version is 5 fewer keystrokes. On to the next command:

     cat file | grep thing
     grep thing !$

The second example is the same or fewer keystrokes. In both cases, you have to type "grep thing". In the first, you have to press the up arrow and backspace over "less" (at least three keystrokes), and in the second, you have to type an extra " !$".

I'll skip moving to "cat file | grep otherthing" or "grep otherthing !$", and consider the change to get to

     cat file | grep otherthing | cut stuff
     grep otherthing file | cut stuff

In both cases, you have to type " | cut stuff". If you key in the second example as "!! | cut stuff", that's an extra two keystrokes. If you key in the first as up arrow + "| cut stuff", that's only 1 extra keystroke.

In total, my version saves keypresses in the specific example and doesn't seem much worse in general.

aarongough · on Feb 10, 2010

The most useful thing I took away from this was not 'piping cats' but instead the interesting syntax for creating multiple directories in one go:

  ~ $ mkdir -p tmp/a/b/c
  ~ $ mkdir -p project/{lib/ext,bin,src,doc/{html,info,pdf},demo/stat/a}

jerf · on Feb 10, 2010

It should be pointed out, since the article does not, that that is merely one example of shell expansion that can be used anywhere. It is not a mkdir feature.

    $ echo project/{lib/ext,bin,src,doc/{html,info,pdf},demo/stat/a}
    project/lib/ext project/bin project/src project/doc/html 
    project/doc/info project/doc/pdf project/demo/stat/a

(I added a linefeed to prevent wrapping.)

danudey · on Feb 11, 2010

My favourite use is when installing packages using package managers, especially when I need the dev packages.

Old fink example:

   sudo fink install lib{png,jpeg,ssl,whatever}{,-{dev,shlibs}}

which expands to:

   sudo fink install libpng libpng-dev libpng-shlibs libjpeg libjpeg-dev libjpeg-shlibs libssl libssl-dev libssl-shlibs libwhatever libwhatever-dev libwhatever-shlibs

jrockway · on Feb 11, 2010

Brilliant.

I am ashamed to admit that I usually say "apt-get install libfoo.*", wait for the downloads to start, hit Control-c, and then cut-n-paste the package names I actually want onto the command-line.

It sounds bad when I type it out, but it's really not the most horrible thing ever. But your way is definitely like 83x better.

yason · on Feb 11, 2010

If you don't know the exact package names, Ubuntu happily auto-completes apt-get and aptitude on package names. So I would just type "aptitude install libfoo" and hit tab a couple of times to see what libfoos can I install.

jedbrown · on Feb 11, 2010

Or `M-*` to expand all the completions and then, if necessary, delete the ones you didn't want. Beats typing the extensions.

yason · on Feb 17, 2010

This is golden, I didn't know that.

A perfect example of the infinite features that can be found in bash manual page. I've been using bash since the early 90's and I've written lots of non-trivial programs in bash, and I know a lot many other people don't know and yet I had somehow managed to miss this pearl.

tsuraan · on Feb 11, 2010

Another fun thing to do is use an empty entry, like

  mv .xinitrc{,.bak}

or, if you have some stuff mounted under another mountpoint, such as when installing gentoo:

  umount /mnt/gentoo/{proc,dev,boot,}

That empty entry expands to the base path, which can be a nice shortcut.

aarongough · on Feb 10, 2010

Huh, I'm definitely going to have to play with that!

I only drop into a shell at most for 20 minutes a day at the moment, so a lot of the really neat time-saving trick simply don't stick in my head due to disuse...

jerf · on Feb 10, 2010

Shells are one of those things that I find you should basically just plan on reading the man page every couple of months. Every time you do, you are virtually sure to discover something useful you'd swear you never saw before, even though you're pretty sure it's the exact same man page as before....

silentbicycle · on Feb 11, 2010

It works in makefiles, too. (100% sure about BSD make, pretty sure about GNU make and others.)

protomyth · on Feb 10, 2010

let's remember the the -p option is cool, but could cause pain if you mis-type one of the directories. for example:

  $ cd
  $ ls tmp
  - some files lists -
  $ mkdir -p tmm/a/b/c

oops (remember your esc key)

nagrom · on Feb 11, 2010

Why would this cause you any more pain than normal? You can just remove that chain with rm -r, surely?

protomyth · on Feb 11, 2010

normally you would get an error on the first mkdir, but you could be tired and just using your history and changing the commands and realize a little late you did a lot of stuff somewhere you shouldn't. I had a bad night once with this (still use the -p anyway, but...).

nitrogen · on Feb 11, 2010

rmdir -p might be a bit safer, but it can get greedy and remove all the way up to root if you specify an absolute path.

pretz · on Feb 11, 2010

That 0.005 seconds I saved by not piping cat will significantly increase my productivity! No more wasting time!

jerf · on Feb 10, 2010

Oh come on, you can't mention that without mentioning the Useless Use of Cat Award: http://partmaps.org/era/unix/award.html

tyrmored · on Feb 10, 2010

Great practical advice. I still pipe cats though because it's just more intuitive to me for setting up complex pipes.

protomyth · on Feb 10, 2010

especially if your using a test file instead of the program you might eventually use

TheSOB88 · on Feb 11, 2010

"Great advice. I don't follow it." Mayhap it ain't so great?

ynniv · on Feb 11, 2010

You also can't follow a file with grep, but you can with tail. Doing so requires the grep option "--line-buffered", which sacrifices some performance, but not much compared to actually viewing the log data.

  tail -f access.log | grep --line-buffered "GET /blog/post "

randallsquared · on Feb 11, 2010

I do this a lot, and I've never used "--line-buffered", nor apparently needed it. Google shows me a lot of people using that as a solution to a problem (not getting results immediately) I've never seen. Weird.

silentbicycle · on Feb 11, 2010

It's more likely if you have grep and several other operations piped together - the normal buffering leads to grep only printing every time it gets a full block of data, typically several lines.

ynniv · on Feb 11, 2010

I only started using it because results were not appearing in "real-time", so it may depend on your OS, or your definition of "real-time".

almost · on Feb 11, 2010

Why? The extra 3 letters of "cat" are usually going to take less time to type the mental overhead of "where do I put the input file for this command" or the additional pipeline reasoning. I actually sometimes do it one way and sometimes the other, whatever comes to my fingers first (which is going to vary according to how I'm visualizing the pipeline of tasks in my head).

In general I far prefer the (it seems to me) more Unixy way of having lots of very simple commands and chaining them together over the use of extra options and arguments.

chuhnk · on Feb 11, 2010

There is some really great info on that site. I'm a sys admin of 3 years and to see the comparison in performance is going to force me to change my scripting habits.

mhansen · on Feb 11, 2010

Really? You're changing your scripting habits based on gaining milliseconds?

chuhnk · on Feb 11, 2010

You have to define usage here. They are giving an example of a single piped command on a file of who knows what size, probably not very big, hence milliseconds. However when it comes down to scripting large log processing, backups, and other forms of automation then I think it will yield better performance. Not to mention good coding practices, I am looking to improve in any form, something like this changes the way you think which will prove to be very useful when programming in ruby, c, java, etc. Not necessarily the use of piping commands, but just about how to be more efficient and less wasteful.

Zarkonnen · on Feb 11, 2010

http://everything2.com/title/I+am+forced+to+smoke+my+cat - sorry!