For the Love of Pipes

mothsonasloth · on Jan 22, 2019

[Quote]

The Unix philosophy is documented by Doug McIlroy as:

    Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

    Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.

    Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.

    Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.

I really like the last two, if you can do them in development then you are then you have a great dev culture

quietbritishjim · on Jan 22, 2019

Reformatted to be readable:

> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.

> Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.

> Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.

> Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.

majewsky · on Jan 22, 2019

If bots were not discouraged on news.yc, I would have implemented a bot for this long ago. Code-block quotes are so atrocious, esp. on mobile devices.

Leace · on Jan 22, 2019

It seems "white-space: pre-wrap" on code block would solve most of the problem. There is also additional "max-width" on the pre that I think is not needed.

masklinn · on Jan 22, 2019

That would break actual code snippet.

What would solve most of the problems is HN actually implementing markdown instead of the current half-assed crap.

abraae · on Jan 22, 2019

I would hate to see the day HN allowed any way to bold sections of text.

It's way more restful purveying a page of uniformly restrained text.

masklinn · on Jan 22, 2019

> I would hate to see the day HN allowed any way to bold sections of text.

HN already has shitty italics (shitty in that it commonly matches and eats things you don't want to be italicised e.g. multiplications, pointers, … in part though not only because HN doesn't have inline code). "bold" can just be styled as italics, or it can be styled as a medium or semibold. It's not an issue, and even less worth it given how absolute garbage the current markup situation is.

yen223 · on Jan 22, 2019

For a site that's meant to target programmers, HN's handling of code blocks is pretty poor.

Just give me the triple-tilde code block syntax please!

masklinn · on Jan 23, 2019

> For a site that's meant to target programmers, HN's handling of code blocks is pretty poor.

Meh. It does literal code blocks, they work fine.

That's pretty much the only markup feature which does, which is impressively bad given HN only has two markup feature: literal code blocks and emphasis.

It's not like they're going to add code coloration or anything.

And while fenced code blocks are slightly more convenient (no need to indent), pasting a snippet in a text editor and indenting it is hardly a difficult task.

dahfizz · on Jan 22, 2019

How is that meaningfully different to italics in that regard?

K2L8M11N2 · on Jan 22, 2019

Bold text stands out when visually scanning the page, italics don't.

vorpalhex · on Jan 22, 2019

What is it with so many products (HN, Discord, Slack) building half assed markdown implementations that aren't actually markdown?

rhacker · on Jan 22, 2019

What's Markdown?

- Slack product development

(that was a joke but is likely the answer to your question)

oldmanhorton · on Jan 22, 2019

To be pedantic, there's debate about what is "actually markdown". No one would say it's the flavor HN implements, but the easiest way to win some games is to simply not play

quietbritishjim · on Jan 22, 2019

That would break any existing comments that happened to be using markdown syntax as punctuation. Although I suppose you could have a flag day for the changeover and format differently based on comment creation time.

But I think the very limited formatting is just fine anyway. For the above comment as an example, I agree the code formatting looks awful, especially on mobile. But the version with >'s is ok, and I don't think proper bullet points or a quote bar would have improved it dramatically.

Leace · on Jan 22, 2019

Conversations.im uses an interesting trick for rendering Markdown [0] - it leaves the syntax as is, so in the worst case you've got text with weird bold/italics, but the characters are 1:1 identical to what was sent.

[0]: Actually not Markdown but a subset but it's not important.

jolmg · on Jan 22, 2019

I agree with you on the max-width. I can't see whatever benefit it's supposed to provide outweigh the annoyance of having to scroll horizontally when there is a lot of empty space to the right that could be used to display more text.

I'm not too convinced on the wrapping of code, though.

kakarot · on Jan 22, 2019

white-space: pre-wrap on a code block could lead to confusion. Any change should be an optional setting in your user profile.

However you're definitely right about dropping the max-width property.

glenneroo · on Jan 22, 2019

Why is this or OP's even necessary? The bullet points are copied directly from the article.

koboll · on Jan 22, 2019

I'm sure I'm not the only one who opens the comments to skim them and quickly vet whether the article is worth reading or not

korla · on Jan 22, 2019

You are not. I made a chrome plugin to find the HN-discussions for an article thinking I'd use it primarily after I'd read an article, but I find that I more often than not use it as a benchmark whether I should spend the time to read it or not.

pdkl95 · on Jan 22, 2019

> The Unix philosophy is documented by Doug McIlroy as

TaoUP has a longer discussion[1] of the Unix philosophy, which includes Rob Pike's and Ken Thompson's comments on the philosophy.

[1] http://www.catb.org/esr/writings/taoup/html/ch01s06.html

"Those who don't understand Unix are condemned to reinvent it, poorly." (Henry Spencer)

anthk · on Jan 23, 2019

Avoid TAOUP, is really bad. Most of the lore have stolen from places as LISP and VAX communities. ESR is to Pike and Ken as alien as X11 itself.

Kodix · on Jan 23, 2019

I'm not sure I understand - is TAOUP bad because it stole information, or because the information it has is wrong?

Because I'd consider the latter to be far worse than the former.

leethargo · on Jan 23, 2019

I enjoyed it when I read it many years ago, but maybe that was because I was inexperienced and naive.

Could you recommend some "original sources" to learn from, instead? Ideally in book form?

anthk · on Jan 23, 2019

Not TAOUP specially, but the Jargon file is what ESR took loads of things as wrong or Unix related. Also, at TAOUP you have Emacs, which is the Anti-UNIX by definition. https://www.dourish.com/goodies/jargon.html

cben · on Jan 27, 2019

TAOUP had a chapter "a tale of 5 editors" discussing emacs, vi, and more, and does point out emacs is an outlier (and outsider) to many unix principles. It does quote Doug McIlroy speaking against it (but also against vi?). It attempts to generalize from discussing "The Right Size for an Editor" question to discussing how to think about "The Right Size of Software".

I don't know if it's possible to have impartially "fair" discussion of editors. Skimming now, I can see how vi lovers would hate some characterizations there. But it does try to learn interesting lessons from them.

It does NOT simply equate "Emacs has UNIX nature" so you can't just prove something like "TAOUP mentions Emacs, Emacs is GNU, Gnu is Not Unix => TAOUP is not UNIX, QED" ;-)

http://www.catb.org/esr/writings/taoup/html/ch13s02.html

bias disclaimers: I learnt most of what I know of unix from within Emacs, which I still use ~20 years later. I learnt more from Info pages than man pages (AIX had pretty bad man pages). I suspect you have a different picture of unix than I. And I now know better than arguing which editor is better ;)

But I found TAOUP articulated ideas I only learnt through osmosis. I'm looking forward to reading a better articulation if you know one.

Waterluvian · on Jan 22, 2019

The last one is especially interesting to me these days. On a macro scale it sure sounds a whole lot like the robot revolution taking unskilled jobs.

But of course that's probably not the author's intended context.

marcosdumay · on Jan 22, 2019

Programmers have been destroying programmer jobs since those jobs exist. Up to now it has meant we have enough productivity for going into more markets, but that will not last forever.

matchagaucho · on Jan 22, 2019

From 1978, and still applicable to microservices today.

Breza · on Jan 29, 2019

That's my reaction too! Microservices with a dash of agile.

nojvek · on Jan 22, 2019

I’m surprised JessFraz who is employed by Microsoft doesn’t talk about powershell pipes at all.

Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.

It makes working with pipes so much fun. In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`. In poweshell, ls outputs stream of file objects and you can get the field you want by piping to `Get-Item` or sum the file sizes, or filter only directories. It’s very expressive once you’re manipulating streams of objects with properties.

npongratz · on Jan 22, 2019

> In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`.

I'm guessing you've mentioned using `ls` as a simple, first-thing-that-comes-to-mind example, which is cool. I just wanted to point out that if a person is piping ls's output, there are probably other, far better alternatives, such as (per the link below) `find` and globs:

https://unix.stackexchange.com/a/247973

That has been my experience, at least.

sevensor · on Jan 22, 2019

If you're using cut and awk to get a field out of ls, maybe what you want is actually stat(1)?

nijaru · on Jan 22, 2019

She is actually employed at GitHub now.

https://twitter.com/jessfraz

lordgrenville · on Jan 22, 2019

Still MSFT, in a way!

schwartzworld · on Jan 22, 2019

Which is owned by...

paavoova · on Jan 23, 2019

The downside of objects is performance due to the e.g. increased overhead a file object carries. Plus the exact same can be argued with everything being bytes/text on Unix, in that it make everything simple but versatile.

> get some field out of `ls`

If you're parsing the output of ls, you're doing it wrong. Filenames can contain characters such as newlines, etc, and so you generally want to use globbing/other shell builtins, or utils like find with null as the separator instead of newline.

marpstar · on Jan 22, 2019

I've been following her on twitter for awhile. She went to MSFT only a couple years ago. From what I understand her expertise is in Linux.

flukus · on Jan 23, 2019

> Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.

Unix pipes can pipe objects, binary data, text, json encoded data, etc.

The problem is it adds a lot of complication and simply doesn't offer much in practice, so text is still widely used and binary data in a few specific cases.

iamwil · on Jan 22, 2019

What happens when there are upstream changes to the objects? Does everything downstream just need to change, and hence, the upstream objects returned by programs need to be changed with care? Or is it using something like protobuf, where fields are only additive, but never deleted, for backwards compatibility?

Or are the resulting chain of pipes so short lived, it doesn't matter?

dan-robertson · on Jan 22, 2019

What happens when you rely on non standard behaviour of unix tools, or just non-posix tools that change from beneath you?

I’m not saying that this makes powershell pipes better/worse, just that this problem isn’t unique. Microsoft tends to be reasonably committed to backwards compatibility but I don’t know the answer to the question

Pawamoy · on Jan 23, 2019

I think they were talking about objects at execution time, not version compatibility.

dan-robertson · on Jan 26, 2019

I don’t really understand what this means. The parent specifically writes “backwards compatibility.” The only other thing I think you might be referring to is if the reading process mutates the objects does that somehow affect the writing end of the pipe but I think that doesn’t make sense from a “sane way of doing inter-process communication” standpoint. Is there something else you are referring to? Could you elaborate please?

teyc · on Jan 23, 2019

Powershell is a shell language, everything is dynamic, and does not error when you access a non-existent property on an object.

hk__2 · on Jan 22, 2019

Don’t you have to rewrite every single program to make it able to read those “objects”?

ghusbands · on Jan 23, 2019

You don't have to rewrite them; just write them. Every new OS/language needs its tooling/libraries to be written.

anthk · on Jan 23, 2019

Never parse ls.

enriquto · on Jan 23, 2019

> Never parse ls.

I have heard this several times, but either I do not understand it or I disagree. Do you mean parsing the output of the ls program? Parsing ls output is not wrong, the program produces a text stream that is easy and useful to parse. There's nothing to be ashamed when doing it, even when you can do it in a different, even shorter way. I do grep over ls output daily, and I find it much more convenient than writing wildcards.

xelxebar · on Jan 24, 2019

One can certainly do fine by grepping ls output in one-off instances, but I'd be really hesitant to put that in a script.

For given paths, stat command essentially lets us directly access their inode structs, and invocations are nicely concise. The find util then lets us select files based on inode fields.

Both tools do take a bit of learning, but considerably less than grep and regexs. Anyway, I've personally found find and stat to be really nice and ergonomic after the initial learning period.

yaakushi · on Jan 22, 2019

I'm probably nitpicking, but if you're using cat to pipe a single file into the sdtin of another program, you most likely don't need the cat in the first place, you can just redirect the file to the process' stdin. Unless, of course, you're actually concatenating multiple files or maybe a file and stdin together.

Disclaimer: I do cat-piping myself quite a bit out of habit, so I'm not trying to look down at the author or anything like that! :)

arendtio · on Jan 22, 2019

In fact, I don't like people optimizing shell scripts for performance. I mean, shell scripts are slow by design and if you need something fast, you choose the wrong technology in the first place.

Instead, shell script should be optimized for readability and portability and I think it is much easier to understand something like 'read | change >write' than 'change <read >write'. So I like to write pipelines like this:

  cat foo.txt \
    | grep '^x' \
    | sed 's/a/b/g' \
    | awk '{print $2}' \
    | wc -l >bar.txt

It might be not the most efficient processing method, but I think it is quite readable.

For those who disagree with me: You might find the pure-bash-bible [1] valuable. While I admire their passion for shell scripts, I think they are optimizing to the wrong end. I would be more a fan of something along the lines of 'readable-POSIX-shell-bible' ;-)

[1]: https://github.com/dylanaraps/pure-bash-bible

GuB-42 · on Jan 22, 2019

IMHO, shell scripts are a minefield and if you want something readable and portable, this is also the wrong technology. They are convenient though. They are like the Excel macros of the UNIX world.

Now back to the topic of "cat", which is a great example of why shell scripts are minefields.

Replace "foo.txt" with a user supplied variable, let's call it "$F". It becomes cat $F | blah_blah... I mean cat "$F" | blah_blah, first trap, but everyone knows that.

Now, if F='-n', second trap. What you think is a file will be considered an option and cat will wait for user input, like when no file is given. Ok, so you need to do cat -- "$F" | blah_blah.

That should be OK in every case now, but remember that "cat" is just another executable, or maybe a builtin. For some reason, on your system "cat --" may not work, or some asshat may have added "." in your PATH and you may be in a directory with a file named "cat". Or maybe some alias that decides to add color.

There are other things to consider, like your locale that may mess up you output with comas instead of decimal points and unicode characters. For that reason, you need to be very careful every time you call a command and even more so if you pipe the output.

For that reason, I avoid using "cat" in scripts. It is an extra command call and all the associated headaches I can do without.

kemitche · on Jan 22, 2019

> Now, if F='-n', second trap

You're not wrong, but I think it's worth pointing out that's a trap that comes up any time you exec another program, whether it's from shell or python. I can't reasonably expect `subprocess.run(["cat", somevar])` to work if `somevar = "-n"`.

(Now, obviously, I'm not going to "cat" from python, but I might "kubectl" or something else that requires care around the arguments)

mirimir · on Jan 23, 2019

> Replace "foo.txt" with a user supplied variable, let's call it "$F". It becomes cat $F | blah_blah... I mean cat "$F" | blah_blah, first trap, but everyone knows that.

I think that you forgot to edit the "I mean" to "echo $F" :)

madmax96 · on Jan 22, 2019

I agree with the sentiment, but my critique applies so generally that it must be noted: if a command accepts a filename as a parameter, you should absolutely pass it as a parameter rather than `cat` it over stdin.

For example, you can write this pipeline as:

    grep '^x' foo.txt \
        | sed 's/a/b/g' \
        | awk '{print $2}' \
        | wc -l > bar.txt

This is by no means scientific, but I've got a LaTeX document open right now. A quick `time` says:

    $ time grep 'what' AoC.tex
    real    0m0.045s
    user    0m0.000s
    sys     0m0.000s

    $ time cat AoC.tex | grep what
    real    0m0.092s
    user    0m0.000s
    sys     0m0.047s

Anecdotally, I've witnessed small pipelines that absolutely make sense totally thrash a system because of inappropriate uses of `cat`. When you `cat` a file, the OS must (1) `fork` and `exec`, (2) copy the file to `cat`'s memory, (3) copy the contents of `cat`'s memory to the pipe, and (4) copy the contents of the pipe to `grep`'s memory. That's a whole lot of copying for large files -- especially when the first command grep in the sequence usually performs some major kind of reduction on the input data!

Poiesis · on Jan 22, 2019

In my opinion, it's perfectly fine either way unless you're worried about performance. I personally tend to try to use the more performant option when there's a choice, but a lot of times it just doesn't matter.

That said, I suspect the example would be much faster if you didn't use the pipeline, because a single tool could do it all (I'm leaving in the substitution and column print that are actually unused in the result):

    awk '/^x/{gsub("a","b");print $2; count++}END{print NR}' foo.txt

fooblat · on Jan 22, 2019

That syntax is very unusual from anything I've seen. I am also a fan of splitting pipelines with line breaks for readability, however I put the pipe on the end of each line and omit the backslash. In Bash, a line that ends with a pipe always continues on the next line.

In any case, it's probably just a matter of personal taste.

AdmiralAsshat · on Jan 22, 2019

That's actually very readable. I'm now regretting that I hadn't seen this about 3 months ago--I recently left a project that had a large number of shell scripts I had written or maintained for my team. This probably would've made it much easier for the rest of the team to figure out what the command was doing.

icedchai · on Jan 22, 2019

If the order is your concern, you can also put the <read at the beginning of the line. <file grep x works the same as: cat file | grep x

rabidrat · on Jan 23, 2019

I've been using unix for 25 years and I did not know that.

tejtm · on Jan 22, 2019

I dunno, You are bringing 5 cores to bear and there is no global interpreter lock which is not a bad start

michaelfeathers · on Jan 22, 2019

I like 'collection pipeline' code written in this style regardless of language. If we took away the pipe symbols (or the dots) and just used indentation we'd have something that looked like asm but with flow between steps rather than common global state.

I periodically think it would be a good idea to organize a language around.

anthk · on Jan 23, 2019

awk can do all of that except sed. And I am not sure about the last. No need to wc ($NF in AWK, if I can recall), no need for grep, you have the /match/ statement, with regex too.

Crestwave · on Jan 23, 2019

> except sed

Doesn't gsub(/a/, "b") do the same thing as s/a/b/g?

anthk · on Jan 23, 2019

Yes, I recall it hours ago.

msla · on Jan 22, 2019

I find something like this:

   grep '^x' < input | sed 's/foo/bar/g'

to be very readable, as the flow is still visually apparent based on punctuation.

OskarS · on Jan 22, 2019

I don't like this style at all. If you're following the pipeline, it starts in the middle with "input", goes to the left for the grep, then to the right (skipping over the middle part) to sed.

     cat input | grep '^x' | sed 's/foo/bar/g'

Is far more readable, in my opinion. In addition, it makes it trivial to change the input from a file to any kind of process.

I'm STRONGLY in favor of using "cat" for input. That "useless use of cat" article is pretty dumb, IMHO.

dearrifling · on Jan 22, 2019

Note that '<input grep | foo' is also valid.

kps · on Jan 22, 2019

In this particular example, ‘unnecessary use of cat’ is accompanied by ‘unnecessary use of grep’.

    cat input | grep '^x' | sed 's/foo/bar/g'

→

    sed '/^x/s/foo/bar/g' <input

irishsultan · on Jan 23, 2019

That's not the same thing. The sed output will still keep lines not starting with x (just not replacing foo with bar in those) where grep will filter those out.

kps · on Jan 23, 2019

Yeah, Muphry's law at work. Corrected version:

   sed -n '/^x/s/foo/bar/gp' <input

This may be an inadvertent argument for the ‘connect simpler tools’ philosophy.

sigjuice · on Jan 22, 2019

You can just remove the <

msla · on Jan 22, 2019

You can if input is a file. It might be a program with no arguments or something else.

sigjuice · on Jan 22, 2019

In your original command, how can 'input' be a program with no arguments?

msla · on Jan 22, 2019

Oh, damn. You're exactly right.

OK, to save some of my face, this will work:

    grep 'foo' <(input) | sed 's/baz/bar/g'

... at least in zsh and probably bash.

laumars · on Jan 22, 2019

I don’t like that at all. That creates a subshell and is also less readable than

    input | grep foo | sed ...

msla · on Jan 22, 2019

That specific example is less readable, but I do like being able to do this:

    diff <(prog1) <(prog2)

and get a sensible result.

And sometimes programs just refuse to read from stdin but do just fine with an unseekable file on the command line. True, you do have this:

    input | recalcitrant_program /dev/stdin

... but it's a bit of a tossup as to which one's more readable at this point. They're both relying on advanced shell functionality.

laumars · on Jan 22, 2019

> That specific example is less readable, but I do like being able to do this:

> diff <(prog1) <(prog2)

> and get a sensible result.

That is called process substitution and is exactly the kind of use case that it's designed for. So yes, process substitution does make sense there.

> input | recalcitrant_program /dev/stdin

> ... but it's a bit of a tossup as to which one's more readable at this point. They're both relying on advanced shell functionality.

There's no tossup at all. Process substitution is easily more readable than your second example because you're honouring the normal syntax of that particular command's parameters rather than kludging around it's lack of STDIN support.

Also I wouldn't say either example is using advanced shell functionalities either. Process substitution (your first example) is a pretty easy thing to learn and your second example is just using regular anonymous pipes (/dev/stdin isn't a shell function, it's a proper pseudo-device like /dev/random and /dev/null) thus the only thing the shell is doing is the same pipe described in this threads article (with UNIX / Linux then doing the clever stuff outside of the shell).

Hello71 · on Jan 22, 2019

This is a very silly way of writing it though. grep|sed can almost always be replaced with a simple awk: awk '/^x/ { sub("a", "b"); print $2; }' foo.txt. This way, the whole command fits on one line. If it doesn't, put your awk script in a separate file and simply call it with "awk -f myawkscript foo.txt".

ncallaway · on Jan 22, 2019

I would disagree that their way of writing it is silly.

It is instantly plainly obvious to me what each step of their shell script is doing.

While I can absolutely understand what your shell script does after parsing it, it's meaning doesn't leap out at me in the same way.

I would describe the prior shell script as more quickly readable than the one that you've listed.

So, perhaps it's not a question of one being more silly than the other—perhaps the author just has different priorities from you?

geofft · on Jan 22, 2019

I use awk in exactly this way personally, but, awk is not as commonly readable as grep and sed (in fact, that use of grep and sed should be pretty comprehensible to someone who just knows regular expressions from some programming languages and very briefly glances at the manpages, whereas it would be difficult to learn what that awk syntax means just from e.g. the GNU awk manpage). So, just as you could write a Perl one-liner but you shouldn't if you want other people to read the code, I'd probably advise against the awk one-liner too.

yesenadam · on Jan 23, 2019

Not sure why you say grep and sed are more readable than awk! (not sure what 'commonly readable' means). Or that even that particular line in awk is harder to understand than the grep and sed man pages. The awk manpage even has examples, including print $2. The sed manpages must be the most impenetrable manpages known to 'man', if you don't already understand sed. (People might already know s///g because 99% of the time, that's all sed is used for.)

yesenadam · on Jan 22, 2019

>sub("a", "b");

That should be gsub, shouldn't it? (sub only replaces the first occurrence)

Hello71 · on Jan 22, 2019

tyingq · on Jan 22, 2019

The "useless use of cat" was a repeated gripe on Usenet back in the day: http://porkmail.org/era/unix/award.html

Leace · on Jan 22, 2019

I actually think that cat makes it more obvious what's happening in some cases.

I had recently built a set of tools used primarily via pipes: (tool-a | tool-b | tool-c) and it looks clearer when I mock (for testing) one command (cat results | tool-b | tool-c) instead of re-flowing it just to avoid cat and use direct files.

empath75 · on Jan 22, 2019

People use cat to look at the file first, then hit up arrow, add a pipe, etc.

longwave · on Jan 22, 2019

Yes, this. Quite often I start writing out complex pipelines using head/tail to test with a small dataset and then switch it out for cat when I am done to run it on the full thing. And it's often not worth refactoring these things later unless you are really trying to squeeze performance out of them.

toxik · on Jan 22, 2019

I think it's also a grammatical wart of shell syntax. Things going into a command are usually on the left, but piping in a file goes on the right.

richardwhiuk · on Jan 22, 2019

   <file command | command | command

is perfectly fine.

nine_k · on Jan 22, 2019

The arrow now points backwards.

chrisfinazzo · on Jan 22, 2019

Of course, if any of your commands prompt for input, you'll be disappointed that's not always as easy as it appears on the surface.

Does anyone have a better way to do this kind of thing?

milkey_mouse · on Jan 22, 2019

The standard is expect [1]. There are also libraries for many programming languages which perform a similar task, such as pexpect [2].

[1] https://core.tcl.tk/expect/index [2] https://pexpect.readthedocs.io/en/stable/

richardwhiuk · on Jan 22, 2019

The better solution is to change the command so it expects programatic arguments / pass command line parameters.

i.e.

prefer `apt-get install -y` over `yes | apt-get install foo`

maximilianburke · on Jan 22, 2019

I can see how it's redundant. But I use cat-pipes because I once mistyped the redirection and nuked my carefully created input file :)

(Similarly, the first thing I used to do on Windows was set my prompt to [$p] because many years ago I also accidentally nuked a part of Visual Studio when I copied and pasted a command line that was prefixed with "C:\...>". Whoops.)

invsblduck · on Jan 22, 2019

Not nitpicking. Useless Use of Cat is an old thing: http://catb.org/jargon/html/U/UUOC.html

dan-robertson · on Jan 22, 2019

For interactive use, I would like to point out that even better than this use of cat is less. If you pipe less into something then it forgets it’s interactive behaviour and works like cat on a single file. So:

  $ less foo | bar

Is similar too:

  $ bar < foo

Except that less is typically more clever than that and might be more like:

  $ zcat foo | bar

Depending on the file type of foo.

chrisfinazzo · on Jan 22, 2019

I would be remiss if I did not point out that calling said program cat is a misnomer. Instead of 'string together in a series' (the actual dictionary definition, which coincidentally, pipes actually do) it quickly became 'print whatever I type to the screen.'

Of course, the example @arendtio uses is correct, because they obviously care about such things.

Sharlin · on Jan 22, 2019

Having separate commands for outputting the content of a single file and several files would, however, be an orthogonality violation. YMMV whether having a more descriptive name for the most common use of cat would be worth the drawback.

chrisfinazzo · on Jan 22, 2019

It would fit in the broader methodology of 'single purpose tools that do their job well' or 'small pieces, loosely joined', but yes, probably too annoying to bother with.

assafmo · on Jan 22, 2019

I usually replace cat with pv and gets a nice progress bar and ETA :-)

alkonaut · on Jan 22, 2019

I love the idea of simple things that can be connected in any way. I'm not so much a fan of "everything is a soup of bytes with unspecified encoding and unknown formatting".

It's an abstraction that held up quite well, but its starting to show its age.

jerf · on Jan 22, 2019

I fully agree... and yet... everyone who has tried to "fix" this has failed, at least in the sense of "attaining anything like shell's size and reach". Many have succeeded in the sense of producing working code that fixes this in some sense.

Powershell's probably the closest to success, because it could be pushed out unilaterally. Without that I'm not sure it would have gotten very far, not because it's bad, but again because nobody else seems to have gotten very far....

Jyaif · on Jan 22, 2019

100% agree. Having to extract information with regular expressions is a waste of time. If the structure of the data was available, you would have type safety / auto-completion. You could even have GUIs to compose programs.

bdamm · on Jan 22, 2019

Structured data flows in pipes too. Json can even be line oriented. GUI programming fails when you get past a few hundred “lines” of complexity. What I’d love to see is a revolution of shells and terminals to more easily work and pull from piped data.

jenscow · on Jan 22, 2019

I hear what you're saying.

However, how can you ensure the output type of one program matches the input type of another?

testvox · on Jan 22, 2019

Allow programs to specify the type of data they can consume and the type of the data they emit. This is how powershell does it (using the dotnet type system).

jenscow · on Jan 22, 2019

And the problem is how can you ensure the output type of one program matches the input type of another.

A program emits one type, and the other program accepts another.

Something will be needed to transform one type into another. Imagine doing that on the command line.

NikolaeVarius · on Jan 22, 2019

Cat file1 | convert | dest

hnick · on Jan 23, 2019

Having GUIs compose programs seems antithetical to the idea of shell scripts which are often thrown together quickly to get things done. Personally, I view shell scripting as a "good enough" and if you need more structure then you change your tools.

icebraining · on Jan 22, 2019

For an alternative view, don't forget to read the section on Pipes of The Unix-Haters Handbook: http://web.mit.edu/~simsong/www/ugh.pdf (page 198)

bagrow · on Jan 22, 2019

> When was the last time your Unix workstation was as useful as a Macintosh?

Some of that discussion has not aged well :)

pytester · on Jan 22, 2019

The core critique - that everything is stringly typed - still holds pretty well though.

>The receiving and sending processes must use a stream of bytes. Any object more complex than a byte cannot be sent until the object is first transmuted into a string of bytes that the receiving end knows how to reassemble. This means that you can’t send an object and the code for the class definition necessary to implement the object. You can’t send pointers into another process’s address space. You can’t send file handles or tcp connections or permissions to access particular files or resources.

majewsky · on Jan 22, 2019

> You can’t send pointers into another process’s address space.

Thank goodness.

jimktrains2 · on Jan 22, 2019

To be fair, the same critisim could be used for a socket? I think the issue is that some people want pipes to be something magical that connects their software, not a dumb connection between them.

__MatrixMan__ · on Jan 22, 2019

I don't want all my pipes to be magical all the time, but occasionally I do want to write a utility that is "pipeline aware" in some sense. For example, I'd like to pipe mysql to jq and have one utility or the other realize that a conversion to json is needed in the middle for it work.

Im working on a library for this kind of intra-pipeline negitiation. It's all drawing-board stuff right now but I coobbled together a proof of concept:

https://unix.stackexchange.com/a/495338/146169

Do you think this is a reasonable way to achieve the magic that some users want in their pipelines? Or are ancient Unix gods going to smite me for tampering with the functional consistency of tools by making their behavior different in different contexts?

cryptonector · on Jan 22, 2019

This is interesting, yes. If the shell could infer the content type of data demanded or output by each command in a pipeline, then it could automatically insert type coercion commands or alter the options of commands to produce the desired content types.

You're right that it is in fact possible for a command to find the preceding and following commands using /proc, and figure out what content types they produce / want, and do something sensible. But there won't always be just one way to convert between content types...

Me? I don't care for this kind of magic, except as a challenge! But others might like it. You might need to make a library out of this because when you have something like curl(1) as a data source, you need to know what Content-Type it is producing, and when you can know explicitly rather than having to taste the data, that's a plus. Dealing with curl(1) as a sink and somehow telling it what the content type is would be nice as well.

__MatrixMan__ · on Jan 23, 2019

My ultimate use case is a contrived environment where I have the luxury of ignoring otherwise blatant feature-gaps--such as compatibility with other tools (like curl). I've come to the same conclusions about why that might be tricky, so I'm calling it a version-two problem.

I notice that function composition notation; that is, the latter half of:

> f(g(x)) = (f o g)(x)

resembles bash pipeline syntax to a certain degree. The 'o' symbol can be taken to mean "following". If we introduce new notation where '|' means "followed by" then we can flip the whole thing around and get:

> f(g(x)) = (f o g)(x) = echo 'x' | g | f

I want to write some set of mathematically interesting functions so that they're incredibly friendly (like, they'll find and fix type mismatch errors where possible, and fail in very friendly ways when not). And then use the resulting environment to teach a course that would be a simultaneous intro into both category theory and UNIX.

All that to say--I agree about finding the magic a little distasteful, but if I play my cards right my students will only realize there was magic in play after they've taken the bait. At first it will all seem so easy...

cryptonector · on Jan 23, 2019

The magic /proc thing is a very interesting challenge. Trust me, since I read your comments I've thought about how to implement, though again, it's not the sort of thing I'd build for a production system, just a toy -- a damned interesting one. And as a tool for teaching how to find your way around an OS and get the information you need, it's very nice. There's three parts to this: a) finding who's before and after the adapter in the pipe, b) figuring out how to use that information to derive content types, c) match impedances. (b) feels mundane: you'll have a table-driven approach to that. Maybe you'll "taste" the data when you don't find a match in the table? (c) is not always obvious -- often the data is not structured. You might resort to using extended file attributes to store file content-type metadata (I've done this), and maybe you can find the stdin or other open files of the left-most command in a pipeline, then you might be able to guesstimate the content type in more cases. But obviously, a sed, awk, or cut, is going to ruin everything. Even something like jq will: you can't assume the output and input will be JSON.

At some point you just want a Haskell shell (there is one). Or a jq shell (there is something like it too).

As to the pipe symbol as function composition: yes, that's quite right.

jshen · on Jan 22, 2019

I wonder if something like HTTP’s content negotiation is a good model for this.

__MatrixMan__ · on Jan 22, 2019

That sounds reasonable, I'll look into it--thanks.

I was imagining an algorithm where each pipeline-aware utility can derive port numbers to use to talk/listen to its neighbors. I may be able to use http content negotiation wholesale in that context.

laumars · on Jan 23, 2019

I've been trying to solve the exact same problem with my shell too. It's pipes are typed and all the builtin commands can than automatically decode those data types via shared libraries. So commands don't need to worry about how to decode and re-encode the data. This means that JSON, YAML, TOML, CSV, Apache log files, S-Expressions and even tabulated data from `ps` (for example) can all be transparently handled the same way and converted from one to another without the tools ever needing to know how to marshal nor unmarshal that data. For example: you could take a JSON array that's not been formatted with cartridge returns and still grep through it item by item as if it was a multi-line string.

However the problem I face is how do you pass that data type information over a pipeline from tools that exist outside of my shell? It's all well and good having builtins that all follow that convention but what if someone else wants to write a tool?

My first thought was to use network sockets, but then you break piping over SSH, eg:

    local-command | ssh user@host "| remote-command"

My next thought was maybe this data should be in-lined - a bit like how ANSI escape sequences are in-lined and the terminals don't render them as printable characters. Maybe something like the following as a prefix to STDIN?

    <null>$SHELL<null>

But then you have the problem of tainting your data if any tools are sent that prefix in error.

I also wondered if setting environmental variables might work but that also wouldn't be reliable for SSH connections.

So as you can see, I'm yet to think up a robust way of achieving this goal. However in the case of builtin tools and shell scripts, I've got it working for the most part. A few bugs here and there but it's not a small project I've taken on.

If you fancy comparing notes on this further, I'm happy to oblige. I'm still hopeful we can find a suitable workaround to the problems described above.

__MatrixMan__ · on Jan 23, 2019

> ...with my shell too...

I was hoping to stick with bash or zsh, and just write processes that somehow communicate out of band, but I think we're still up against the same problem.

One idea I had was that there's a service running elsewhere which maintains this directed graph (nodes = types, edges = programs which take the type of their "from" node and return the type of their "two" node). When a pipeline is executed, each stage pauses until type matches are confirmed--and if there is a mismatch then some path finding algorithm is used to find the missing hops.

So the user can leave out otherwise necessary steps, and as long as there is only one path through the type graph which connects them, then the missing step can be "inserted". In the case of multiple paths, the error message can be quite friendly.

This means keeping your context small enough, and your types diverse enough, that the type graph isn't too heavily connected. (Maybe you'd have to swap out contexts to keep the noise down.) But if you have a layer that's modifying things before execution anyway, then perhaps you can have it notice the ssh call and modify it to set up a listener. Something like:

User Types:

    local-command | ssh user@host "remote-command"

Shell runs:

    local-command | ssh user@host "pull_metadata_from -r <caller's ip> | remote-command"

Where pull_metadata_from phones home to get the metadata, then passes along the data stream untouched.

Also, If you're writing the shell anyway then you can have the pipeline run each process in a subshell where vars like TYPE_REGISTRY_IP and METADATA_INBOUND_PORT are defined. If they're using the network to type-negotiate locally, then why not also use the network to type-negotiate through an ssh tunnel?

This idea is, of course, over-engineered as hell. But then again this whole pursuit is.

laumars · on Jan 24, 2019

> I was hoping to stick with bash or zsh, and just write processes that somehow communicate out of band, but I think we're still up against the same problem.

Yeah we have different starting points but very much similar problems.

tbh idea behind my shell wasn't originally to address typed pipelines, that was just something that evolved from it quite by accident.

Anyhow, your suggestion of overwriting / aliasing `ssh` is genius. Though I'm thinking rather than tunnelling a TCP connection, I could just spawn an instance of my shell on the remote server and then do everything through normal pipelines as I now control both ends of the pipe. It's arguably got less proverbial moving parts compared to a TCP listener (which might then require a central data type daemon et al) and I'd need my software running on the remote server for the data types to work anyway.

There is obviously a fair security concern some people might have about that but if we're open and honest about that and offer an "opt in/out" where opting out would disable support for piped types over SSH then I can't see people having an issue with it.

Coincidentally I used to do something similar in a previous job where I had a pretty feature rich .bashrc and no Puppet. So `ssh` was overwritten with a bash function to copy my .bashrc onto the remote box before starting the remote shell.

> This idea is, of course, over-engineered as hell. But then again this whole pursuit is.

Haha so true!

Thanks for your help. You may have just solved a problem I've been grappling with for over a year.

cryptonector · on Jan 22, 2019

I was thinking something similar, buried in a library that everyone could link. It seems... awfully awkward to build, much less portably.

This reminds me of how busted Linux is for not having a SO_PEERCRED. You can actually get that information by walking /proc/net/tcp or using AF_NETLINK sockets and inet_diag, but there is a race condition such that this isn't 100% reliable. SO_PEERCRED would [have to] be.

cryptonector · on Jan 22, 2019

The problem with that is that each command in the pipeline would have to somehow be modified to convey content type metadata. Perhaps we could have a way to send ancillary metadata (a la Unix domain sockets SCM_*).

AnIdiotOnTheNet · on Jan 22, 2019

Yes. The compromise of just using an untyped byte stream in a single linear pipeline was a fair tradeoff in the 70s, but it is nearly 2020 and we can do better.

laumars · on Jan 22, 2019

We have done better. The shell I'm writing is typed and I know I'm not the only person to do this (eg Powershell). The issue here is really more with POSIX compatibility but if you're willing to step away from that then you might find an alternative that better suits your needs.

Thankfully switching shells is as painless as switching text editors.

da_chicken · on Jan 22, 2019

> Thankfully switching shells is as painless as switching text editors.

So, somewhere between, "That wasn't as bad as I feared," and, "Sweet Jesus, what fresh new hell have I found myself in"?

laumars · on Jan 22, 2019

haha yes. I was thinking more about launching the shell but you're absolutely right that learning the syntax of a new shell is often non-trivial.

laumars · on Jan 22, 2019

I'm not going to argue that UNIX got everything right because I don't believe that to be the case either but I don't agree with those specific points:

> This means that you can’t send an object and the code for the class definition necessary to implement the object.

To some degree you can and I do just this with my own shell I've written. You just have to ensure that both ends of the pipe understands what is being sent (eg is it JSON, text, binary data, etc)? Even with typed terminals (such as Powershell), you still need both ends of the pipe to understand what to expect to some extent.

Having this whole thing happen automatically with a class definition is a little optimistic though. Not least of all because not every tool would be suited for every data format (eg a text processor wouldn't be able to do much with a GIF even if it has a class definition).

> You can’t send pointers into another process’s address space.

Good job too. That seems a very easy path for exploit. Thankfully these days it's less of an issue though because copying memory is comparatively quick and cheap compared to when that handbook was written.

> You can’t send file handles

Actually that's exactly how piping works as technically the standard streams are just files. So you could launch a program with STDIN being a different file from the previous processes STDOUT.

> or tcp connections

You can if you pass it as a UNIX socket (where you define a network connection as a file).

> or permissions to access particular files or resources.

This is a little ambiguous. For example you can pass strings that are credentials. However you cannot alter the running state of another program via it's pipeline (aside what files it has access to). To be honest I prefer the `sudo` type approach but I don't know how much of that is because it's better and how much of that is because it's what I am used to.

zwp · on Jan 22, 2019

>> You can’t send file handles

> Actually that's exactly how piping works

Also SCM_RIGHTS, which exists exactly for this purpose (see cmsg(3), unix(7) or https://blog.cloudflare.com/know-your-scm_rights/ for a gentler introduction and application).

That's been around since BSD 4.3, which predates the Hater's Handbook 1ed by 4 years or so.

gpderetta · on Jan 22, 2019

And that's how Unix is secretly a capability system

laumars · on Jan 22, 2019

Yeah I had mentioned UNIX domain sockets. However your post does add a lot of good detail on them which I had left off.

rumcajz · on Jan 22, 2019

Look at the alternatives though. Would you really want to use something like Spring in shell scripting?

pytester · on Jan 22, 2019

No. I typically use python as a drop in replacement for shell scripts > ~10 lines of code.

OliverJones · on Jan 22, 2019

MacOs is layered on a UNIX-like OS. You can use pipes in your command windows.

DavidWoof · on Jan 22, 2019

This comment makes me feel really old.

MacOs wasn't always layered on unix, and the unix-haters' handbook predates the switch to the unix-based MacOs X.

chrisfinazzo · on Jan 22, 2019

Of course not, but the switch to BSD fixed a bunch of the underpinnings in the OS and was a sane base to work off of.

Not to put too fine a point on it, but they found religion. Unlike Classic (and early versions of Windows for that matter), there was more to be gained by ceding some control to the broader community. Microsoft has gotten better (PowerShell - adapting UNIX tools to Windows, and later WSL, where they went all in)

Still, for Apple it meant they had to serve two masters for a while - old school Classic enthusiasts and UNIX nerds. Reading the back catalog of John Siracusa's (one of my personal nerd heroes) old macOS reviews gives you some sense of just how weird this transition was.

amdavidson · on Jan 22, 2019

The Unix Haters Handbook was published in 1994, when System 7 was decidedly not unix-like.

JdeBP · on Jan 22, 2019

You can also drop the "-like". (-:

* https://unix.stackexchange.com/questions/1489/

jcelerier · on Jan 22, 2019

... has it ? most people using macs never ever open a terminal.

Asooka · on Jan 22, 2019

The section on find after pipes has also not aged well. I can see why GNU and later GNU/Linux replaced most of the old Unices (I mean imagine having a find that doesn't follow symlinks!). If I may, a bit of code golf on the problem of "print all .el files without a matching .elc"

  find . -name '*.el' | while read el; do [ -f "${el}c" ] || echo $el; done

Of course this uses the dreaded pipes and doesn't support the extremely common filenames with a newline in them, so let's do it without them

  find . -name '*.el' -exec bash -c 'el=$0; elc="${el}c"; [ -f "$elc" ] || echo "$el"' '{}' ';'

wglb · on Jan 22, 2019

So the dreaded space-in-filenames is a problem when you pass the '{}' to a script.

The following works very nicely for me:

  * find . -name '*.el' -exec file {}c ';' 2>&1 | grep cannot

wglb · on Jan 23, 2019

I should have said "works very nicely for me, including on file names with spaces"

randallsquared · on Jan 22, 2019

Or p160 by internal numbering.

darrenf · on Jan 22, 2019

I twitched horribly at the final sentence, screaming inwardly "you don't pipe to /dev/null, you redirect to it". And now I feel like an arsehole.

benj111 · on Jan 22, 2019

Hmm well, the Unix shell seems to follow a plumbing metaphor.

You could direct, or redirect the flow to /dev/null. Or pipe to /dev/null. Or redirect the pipe to /dev/null?

So from a metaphor point of view either would fit.

Although of course you don't use the pipe construct to direct to a file. Which would suggest piping is wrong?

And then on the third hand, we all know what it means so what's the problem.

So I would say theres war, famine and injustice in the world. Don't worry about posix shell semantics. :)

analpaper · on Jan 22, 2019

redirect your feelings to /dev/null, because a pipe will just give us a Permission denied

benj111 · on Jan 22, 2019

Chmod +x /dev/null

(havent tried above, not sure I recommend that you do)

jgtrosh · on Jan 22, 2019

Well you can't read either from /dev/null, and I don't think that's just a question of permissions. I'm pretty sure it's impossible to get /dev/null to behave like an executable.

arghwhat · on Jan 22, 2019

You can read from /dev/null—it just behaves as a zero-length file, immediately returning EOF.

This makes /dev/null a valid source file in many languages, C included.

tripa · on Jan 22, 2019

/dev/null behaves like an empty file, which is (or used to be?) a valid executable.

Cf http://trillian.mit.edu/~jc/humor/ATT_Copyright_true.html or https://twitter.com/rob_pike/status/966896123548872705

jwilk · on Jan 22, 2019

In most contexts empty file is indeed a valid executable. Debian folks learned this the hard way recently:

https://bugs.debian.org/919341

However, executing /dev/null doesn't seem to work on Linux:

  $ sudo chmod 777 /dev/null 
  $ /dev/null
  bash: /dev/null: Permission denied

anthk · on Jan 23, 2019

http://peetm.com/blog/?p=55

benj111 · on Jan 22, 2019

Interesting question.

You could write a executable that accepts piped input and throws it away.

When would it exit though? Would it exit successfully at the end of the input stream? That sounds sensible.

That would be behaving like an executable wouldn't it?

majewsky · on Jan 22, 2019

> When would it exit though? Would it exit successfully at the end of the input stream?

A process that attempts to read from a closed pipe receives SIGPIPE. The default disposition for SIGPIPE is to terminate the program (similar to SIGTERM or SIGINT). So yeah, assuming that the previous program in the pipeline closes its stdout at some point (either explicitly, or implicitly by just exiting), then our program would die of SIGPIPE the when it tries to read() from stdin and the pipe's buffer has been depleted.

However, our program could also set SIGPIPE to ignored and ignore the EPIPE errors that read() would return in that case. In that case, it could run indefinitely. But at this point, you're way past normal behavior.

jacquesm · on Jan 22, 2019

As long as it does not exit you can still catch it through /proc/

crazygringo · on Jan 22, 2019

Pipes are awesome and infuriating.

Sometimes they work great -- being able to dump from MySQL into gzip sending across the wire via ssh into gunzip and into my local MySQL without ever touching a file feels nothing short of magic... although the command/incantation to do so took quite a while to finally get right.

But far too often they inexplicably fail. For example, I had an issue last year where piping curl to bunzip would just inexplicably stop after about 1GB, but it was at a different exact spot every time (between 1GB and 1.5GB). No error message, no exit, my network connection is fine, just an infinite timeout. (While curl by itself worked flawlessly every time.)

And I've got another 10 stories like this (I do a lot of data processing). Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not. And even more frustrating, they'll often work on your local machine but not on your server, or vice-versa. And I'm just running basic commodity macOS locally and out-of-the-box Ubuntu on my servers.

I don't know why, but many times I've had to rewrite a piped command as streams in a Python script to get it to work reliably.

invsblduck · on Jan 22, 2019

> Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not.

While this may be your experience, the mechanism of FIFO pipes in Unix (which is filehandles and buffers, basically), is an old one that is both elegant and robust; it doesn't "randomly" fail due to unreliability of the core algorithm or components. In 20 years, I never had an init script or bash command fail due to the pipe(3) call itself being unreliable.

If you misunderstand detailed behavior of the commands you are stitching together--or details of how you're transiting the network in case of an ssh remote command--then yes, things may go wrong. Especially if you are creating Hail Mary one-liners, which become unwieldy.

laumars · on Jan 22, 2019

If got to agree. I can’t recall a pipe ever failing due to unreliability.

One issue I did used to have (before I discovered ‘-o pipefail’[1]) was the annoyance that if an earlier command in a pipeline failed, all the other commands in the pipeline still ran albeit with no data or garbage data being piped to them.

[1] https://stackoverflow.com/questions/1550933/catching-error-c...

EE84M3i · on Jan 22, 2019

Perhaps your example was contrived, but why would you pipe into gzip instead of using transparent ssh compression?

crazygringo · on Jan 22, 2019

Because it simply never occurred to me to check if ssh would have compression built-in.

Because why would it? If the UNIX philosophy is to separate out tools and pipe them, then the UNIX philosophy should be to pipe through gzip and gunzip, not for ssh to provide its own redundant compression option, right?

acdha · on Jan 22, 2019

This is a good example of where that simple rule breaks down: piping it would only work when you are running a command and feeding its output to a different location whereas having it in SSH helps with everything so e.g. typing `ls` in a big directory, `cat`-ing a file, using `scp` on text, etc. benefits.

oweiler · on Jan 22, 2019

I've built a pipe which in very rare cases ran into a segfault. Never found out why.

jarpineh · on Jan 22, 2019

I recently came across Ramda CLI's interactive mode [1]

It essentially hijacks pipe's input and output into browser where you can play with the Ramda command. Then you just close browser tab and Ramda CLI applies your changed code in the pipe, resuming its operation.

Now I'm thinking all kinds ways I use pipe that I could "tee" through a browser app. I can use browser for interactive JSON manipulation, visualization and all around playing. I'm now looking for ways to generalize Ramda CLI's approach. Pipes, Unix files and HTTP don't seem directly compatible, but the promise is there. Unix tee command doesn't "pause" the pipe, but probably one could just introduce pause/resume output passthrough command into the pipe after it. Then web server tool can send the tee'd file to browser and catch output from there.

[1] https://github.com/raine/ramda-cli#interactive-mode

avodonosov · on Jan 22, 2019

You can just store the first pipeline results in a file, edit it, then use the file as an input for the second pipeline.

jarpineh · on Jan 22, 2019

Well, yes, but this kind defeats transient nature of data moving through pipe. Testing and debugging and operating on pipe based processing benefits from close feedback cycle. I’d rather keep that as much as possible.

mpweiher · on Jan 22, 2019

Yes, pipes are awesome, and the concepts actually translate well to in-process usage with structured data.

https://github.com/mpw/MPWFoundation/blob/master/Documentati...

One aspect is that the coordinating entity hooks up the pipeline and then gets out of the way, the pieces communicate amongst themselves, unlike FP simulations, which tend to have to come back to the coordinator.

This is very useful in "scripted-components" settings where you use a flexible/dynamic/slow scripting language to orchestrate fixed/fast components, without the slowness of the scripting language getting in the way. See sh :-)

Another aspect is error handling. Since results are actively passed on to the next filter, the error case is simply to not pass anything. Therefore the "happy path" simply doesn't have to deal with error cases at all, and you can deal with errors separately.

In call/return architectures (so: mostly everything), you have to return something, even in the error case. So we have nil, Maybe, Either, tuples or exceptions to get us out of Dodge. None of these is particularly good.

And of course | is such a perfect combinator because it is so sparse. It is obvious what each end does, all the components are forced to be uniform and at least syntactically composable/compatible.

Yay pipes.

cmsj · on Jan 22, 2019

pipe junkies might like to know about the following tools:

* vipe (part of https://joeyh.name/code/moreutils/ - lets you edit text part way through a complex series of piped commands)

* pv (http://www.ivarch.com/programs/pv.shtml - lets you visualise the flow of data through a pipe)

foreigner · on Jan 22, 2019

Yes! I love pv. Besides that and tee, can anyone else suggest some more general pipe tools?

olejorgenb · on Jan 22, 2019

http://joeyh.name/code/moreutils/ have a couple more:

* pee: tee standard input to pipes (`pee "some-command" "another-command"`)

* sponge: soak up standard input and write to a file

Though in zsh and bash you can create pee using tee: `tee >(some-command) >(another-command) >/dev/(null`

_whiteCaps_ · on Jan 22, 2019

pee

https://linux.die.net/man/1/pee

sudhirj · on Jan 22, 2019

Sanjay Ghemawat (the other less visible half of Jeff Dean) wrote a pipe library in Go, learnt quite a bit from it.

https://github.com/ghemawat/stream

Edit: Jeff Dean, not James Dean

timvisee · on Jan 22, 2019

Cool, the pipe command must be one of the most essential things in Unix/Linux based systems.

I would have loved to see some awesome pipe examples though.

YesThatTom2 · on Jan 22, 2019

Ok, here are some example pipelines:

A simple virus scanner in one line of pipe:

https://everythingsysadmin.com/2004/10/whos-infected.html

And a bunch of pipe tricks that are oh so wrong but oh so useful:

https://everythingsysadmin.com/2012/09/unorthodoxunix.html

bigger_cheese · on Jan 23, 2019

Back when I first started using Linux you could pipe random data to /dev/dsp and the speakers would emit various beeps. Used to be a pretty cool trick I think when ASLA came out it stopped working.

hoorayimhelping · on Jan 22, 2019

destroyallsoftware's screencasts have pretty good pipe usage / unix-fu.

benj111 · on Jan 22, 2019

Why isn't the pipe a construct that has caught on in 'proper' languages?

diggan · on Jan 22, 2019

Clojure has something that you could call a pipe almost. `->` passes the output from one form to the next one.

This example has a nested hash map where we try to get the "You got me!" string.

We can either use `:a` (keyword) as a function to get the value. Then we have to nest the function calls a bit unnaturally.

Or we can use the thread-first macro `->`, which is basically a unix pipe.

   user=> (def res {:a {:b {:c "You got me!"}}})
   #'user/res
   
   user=> res
   {:a {:b {:c "You got me!"}}}
   
   user=> (:c (:b (:a res)))
   "You got me!"
   
   user=> (-> res :a :b :c)
   "You got me!"

Thinking about it, Clojure advocates having small functions (similar to unix's "small programs / do one thing well") that you compose together to build bigger things.

lincpa · on Jan 22, 2019

Clojure: The Pure Function Pipeline Data Flow

https://github.com/linpengcheng/PurefunctionPipelineDataflow

jacquesm · on Jan 22, 2019

Why are you linking to the same github project over and over again?

lincpa · on Jan 22, 2019

Sorry, but the system does not support deletion now.

VladimirGolovin · on Jan 22, 2019

It has, in the form of function composition, as other replies show. However, the Unix pipe demonstrates a more interesting idea: composable programs on the level of the OS.

Nowadays, most of the user-facing desktop programs have GUIs, so the 'pipe' operator that composes programs is the user himself. Users compose programs by saving files from one program and opening them in another. The data being 'piped' through such program composition is sort-of typed, with the file types (PNG, TXT, etc) being the types and the loading modules of the programs being 'runtime typecheckers' that reject files with invalid format.

On the first sight, GUIs prevent program composition by requiring the user to serve as the 'pipe'. However, if GUIs were reflections / manifestations of some rich typed data (expressible in some really powerful type system, such as that of Idris), one could imagine the possibility of directly composing the programs together, bypassing the GUI or file-saving stages.

sideeffffect · on Jan 22, 2019

maybe I'm being overly pedantic, but people seem to be confused about this:

the pipes in your typical functional language (`|>`) is not a form of function composition, like

```

f >> g === x -> g(f(x))

```

but function application, like

```

f x |> g === g(f(x))

x |> f |> g // also works, has the same meaning

f |> g // just doesn't work, sorry :(

```

chriswarbo · on Jan 25, 2019

> the pipes in your typical functional language (`|>`) is not a form of function composition

What is a "typical functional language" in this case? I don't think I've come across this `|>` notation, or anything explicitly referred to as a "pipe", in the functional languages I tend to use (Haskell, Scheme, StandardML, Idris, Coq, Agda, ...); other than the Haskell "pipes" library, which I think is more elaborate than what you're talking about.

arianvanp · on Jan 22, 2019

It is! It's the main way of programming in lazy functional programming languages like Haskell

And many programming languages have libraries for something similar:. iterators in rust / c++, streams in java/c#, thinks like reactive

benj111 · on Jan 22, 2019

Haskell was the only possible I found.

Iterators don't really fully capture what a pipe is though? Theres no parallelism.

And streams don't have the conceptual simplicity of a pipe?

masklinn · on Jan 22, 2019

> Iterators don't really fully capture what a pipe is though? Theres no parallelism.

Pipes are concurrent, not necessarily parallel. Iterators are concurrent, and can be parallel (https://docs.rs/rayon/0.6.0/rayon/par_iter/index.html).

duckerude · on Jan 22, 2019

Many functional languages have |> for piping, but chained method calls are also a lot like pipelines. Data goes from left to right. This javascript expression:

  [1, 2, 3].map(n => n + 1).join(',').length

Is basically like this shell command:

  seq 3 | awk '{ print $1 + 1 }' | tr '\n' , | wc -c

(the shell version gives 6 instead of 5 because of a trailing newline, but close enough)

OJFord · on Jan 22, 2019

But each successive 'command' is a method on what's constructed so far; not an entirely different command to which we delegate processing of what we have so far.

The Python:

   length(','.join(map(lambda n: n+1, range(1, 4)))

is a bit closer, but the order's now reversed, and then jumbled by the map/lambda. (Though I suppose arguably awk does that too.)

duckerude · on Jan 22, 2019

That's true. It's far from being generally applicable. But it might be the most "mainstream" pipe-like processing notation around.

Nim has an interesting synthesis where a.f(b) is only another way to spell f(a, b), which (I think) matches the usual behavior of |> while still allowing familiar-looking method-style syntax. These are equivalent:

  [1, 2, 3].map(proc (n: int): int = n + 1).map(proc (n: int): string = $n).join(",").len
  
  len(join(map(map([1, 2, 3], proc (n: int): int = n + 1), proc (n: int): string = $n), ","))

The difference is purely cosmetic, but readability matters. It's easier to read from left to right than to have to jump around.

ticklemyelmo · on Jan 22, 2019

C# extension methods provide the same syntax, and it is used for all of its LINQ pipeline methods. It's amazing how effective syntactic sugar can be for readability.

msravi · on Jan 22, 2019

In Julia, this would be:

1:3 |> x->x+1 |> x->join(x,",") |> length

duckerude · on Jan 22, 2019

Small correction (or it won't run on my system):

  1:3 |> x -> map(y -> y+1, x) |> x -> join(x, ",") |> length

All those anonymous functions seem a bit distracting, though https://github.com/JuliaLang/julia/pull/24990 could help with that.

msravi · on Jan 22, 2019

Ok, in Julia 1.0+ you just have to use the dot operator:

1:3 |> x->x.+1 |> x->join(x,",") |> length

Note the dot in x.+1, that tells + to operate on each element of the array (x), and not the array itself.

msravi · on Jan 22, 2019

Ok... not sure which version of Julia you're using, but I'm on 0.5 and it works there... Maybe it changed in a later version

shele · on Jan 22, 2019

Maybe a bit nicer

    (1:3 .|> x->x+1) |> x->join(x,",") |> length

Ndymium · on Jan 22, 2019

Elixir also has a pipe construct:

    f |> g(1)

would be equivalent to

    g(f, 1)