The Unix philosophy is documented by Doug McIlroy as:
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.
Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.
I really like the last two, if you can do them in development then you are then you have a great dev culture
> Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”.
> Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
> Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.
> Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.
It seems "white-space: pre-wrap" on code block would solve most of the problem. There is also additional "max-width" on the pre that I think is not needed.
> I would hate to see the day HN allowed any way to bold sections of text.
HN already has shitty italics (shitty in that it commonly matches and eats things you don't want to be italicised e.g. multiplications, pointers, … in part though not only because HN doesn't have inline code). "bold" can just be styled as italics, or it can be styled as a medium or semibold. It's not an issue, and even less worth it given how absolute garbage the current markup situation is.
> For a site that's meant to target programmers, HN's handling of code blocks is pretty poor.
Meh. It does literal code blocks, they work fine.
That's pretty much the only markup feature which does, which is impressively bad given HN only has two markup feature: literal code blocks and emphasis.
It's not like they're going to add code coloration or anything.
And while fenced code blocks are slightly more convenient (no need to indent), pasting a snippet in a text editor and indenting it is hardly a difficult task.
To be pedantic, there's debate about what is "actually markdown". No one would say it's the flavor HN implements, but the easiest way to win some games is to simply not play
That would break any existing comments that happened to be using markdown syntax as punctuation. Although I suppose you could have a flag day for the changeover and format differently based on comment creation time.
But I think the very limited formatting is just fine anyway. For the above comment as an example, I agree the code formatting looks awful, especially on mobile. But the version with >'s is ok, and I don't think proper bullet points or a quote bar would have improved it dramatically.
Conversations.im uses an interesting trick for rendering Markdown [0] - it leaves the syntax as is, so in the worst case you've got text with weird bold/italics, but the characters are 1:1 identical to what was sent.
[0]: Actually not Markdown but a subset but it's not important.
I agree with you on the max-width. I can't see whatever benefit it's supposed to provide outweigh the annoyance of having to scroll horizontally when there is a lot of empty space to the right that could be used to display more text.
I'm not too convinced on the wrapping of code, though.
You are not. I made a chrome plugin to find the HN-discussions for an article thinking I'd use it primarily after I'd read an article, but I find that I more often than not use it as a benchmark whether I should spend the time to read it or not.
Not TAOUP specially, but the Jargon file is what ESR took loads of things as wrong or Unix related. Also, at TAOUP you have Emacs, which is the Anti-UNIX by definition.
https://www.dourish.com/goodies/jargon.html
TAOUP had a chapter "a tale of 5 editors" discussing emacs, vi, and more, and does point out emacs is an outlier (and outsider) to many unix principles. It does quote Doug McIlroy speaking against it (but also against vi?).
It attempts to generalize from discussing "The Right Size for an Editor" question to discussing how to think about "The Right Size of Software".
I don't know if it's possible to have impartially "fair" discussion of editors. Skimming now, I can see how vi lovers would hate some characterizations there. But it does try to learn interesting lessons from them.
It does NOT simply equate "Emacs has UNIX nature" so you can't just prove something like "TAOUP mentions Emacs, Emacs is GNU, Gnu is Not Unix => TAOUP is not UNIX, QED" ;-)
bias disclaimers: I learnt most of what I know of unix from within Emacs, which I still use ~20 years later. I learnt more from Info pages than man pages (AIX had pretty bad man pages). I suspect you have a different picture of unix than I. And I now know better than arguing which editor is better ;)
But I found TAOUP articulated ideas I only learnt through osmosis. I'm looking forward to reading a better articulation if you know one.
Programmers have been destroying programmer jobs since those jobs exist. Up to now it has meant we have enough productivity for going into more markets, but that will not last forever.
I’m surprised JessFraz who is employed by Microsoft doesn’t talk about powershell pipes at all.
Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.
It makes working with pipes so much fun. In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`. In poweshell, ls outputs stream of file objects and you can get the field you want by piping to `Get-Item` or sum the file sizes, or filter only directories. It’s very expressive once you’re manipulating streams of objects with properties.
> In Unix you have to cut, awk and do all sorts of parsing to get some field out of `ls`.
I'm guessing you've mentioned using `ls` as a simple, first-thing-that-comes-to-mind example, which is cool. I just wanted to point out that if a person is piping ls's output, there are probably other, far better alternatives, such as (per the link below) `find` and globs:
The downside of objects is performance due to the e.g. increased overhead a file object carries. Plus the exact same can be argued with everything being bytes/text on Unix, in that it make everything simple but versatile.
> get some field out of `ls`
If you're parsing the output of ls, you're doing it wrong. Filenames can contain characters such as newlines, etc, and so you generally want to use globbing/other shell builtins, or utils like find with null as the separator instead of newline.
> Powershell pipes are an extension over Unix pipes. Rather than just being able to pipe a stream of bytes, powershell can pipe a stream of objects.
Unix pipes can pipe objects, binary data, text, json encoded data, etc.
The problem is it adds a lot of complication and simply doesn't offer much in practice, so text is still widely used and binary data in a few specific cases.
What happens when there are upstream changes to the objects? Does everything downstream just need to change, and hence, the upstream objects returned by programs need to be changed with care? Or is it using something like protobuf, where fields are only additive, but never deleted, for backwards compatibility?
Or are the resulting chain of pipes so short lived, it doesn't matter?
What happens when you rely on non standard behaviour of unix tools, or just non-posix tools that change from beneath you?
I’m not saying that this makes powershell pipes better/worse, just that this problem isn’t unique. Microsoft tends to be reasonably committed to backwards compatibility but I don’t know the answer to the question
I don’t really understand what this means. The parent specifically writes “backwards compatibility.” The only other thing I think you might be referring to is if the reading process mutates the objects does that somehow affect the writing end of the pipe but I think that doesn’t make sense from a “sane way of doing inter-process communication” standpoint. Is there something else you are referring to? Could you elaborate please?
I have heard this several times, but either I do not understand it or I disagree. Do you mean parsing the output of the ls program? Parsing ls output is not wrong, the program produces a text stream that is easy and useful to parse. There's nothing to be ashamed when doing it, even when you can do it in a different, even shorter way. I do grep over ls output daily, and I find it much more convenient than writing wildcards.
One can certainly do fine by grepping ls output in one-off instances, but I'd be really hesitant to put that in a script.
For given paths, stat command essentially lets us directly access their inode structs, and invocations are nicely concise. The find util then lets us select files based on inode fields.
Both tools do take a bit of learning, but considerably less than grep and regexs. Anyway, I've personally found find and stat to be really nice and ergonomic after the initial learning period.
I'm probably nitpicking, but if you're using cat to pipe a single file into the sdtin of another program, you most likely don't need the cat in the first place, you can just redirect the file to the process' stdin. Unless, of course, you're actually concatenating multiple files or maybe a file and stdin together.
Disclaimer: I do cat-piping myself quite a bit out of habit, so I'm not trying to look down at the author or anything like that! :)
In fact, I don't like people optimizing shell scripts for performance. I mean, shell scripts are slow by design and if you need something fast, you choose the wrong technology in the first place.
Instead, shell script should be optimized for readability and portability and I think it is much easier to understand something like 'read | change >write' than 'change <read >write'. So I like to write pipelines like this:
It might be not the most efficient processing method, but I think it is quite readable.
For those who disagree with me: You might find the pure-bash-bible [1] valuable. While I admire their passion for shell scripts, I think they are optimizing to the wrong end. I would be more a fan of something along the lines of 'readable-POSIX-shell-bible' ;-)
IMHO, shell scripts are a minefield and if you want something readable and portable, this is also the wrong technology. They are convenient though. They are like the Excel macros of the UNIX world.
Now back to the topic of "cat", which is a great example of why shell scripts are minefields.
Replace "foo.txt" with a user supplied variable, let's call it "$F". It becomes cat $F | blah_blah... I mean cat "$F" | blah_blah, first trap, but everyone knows that.
Now, if F='-n', second trap. What you think is a file will be considered an option and cat will wait for user input, like when no file is given. Ok, so you need to do cat -- "$F" | blah_blah.
That should be OK in every case now, but remember that "cat" is just another executable, or maybe a builtin. For some reason, on your system "cat --" may not work, or some asshat may have added "." in your PATH and you may be in a directory with a file named "cat". Or maybe some alias that decides to add color.
There are other things to consider, like your locale that may mess up you output with comas instead of decimal points and unicode characters. For that reason, you need to be very careful every time you call a command and even more so if you pipe the output.
For that reason, I avoid using "cat" in scripts. It is an extra command call and all the associated headaches I can do without.
You're not wrong, but I think it's worth pointing out that's a trap that comes up any time you exec another program, whether it's from shell or python. I can't reasonably expect `subprocess.run(["cat", somevar])` to work if `somevar = "-n"`.
(Now, obviously, I'm not going to "cat" from python, but I might "kubectl" or something else that requires care around the arguments)
> Replace "foo.txt" with a user supplied variable, let's call it "$F". It becomes cat $F | blah_blah... I mean cat "$F" | blah_blah, first trap, but everyone knows that.
I think that you forgot to edit the "I mean" to "echo $F" :)
I agree with the sentiment, but my critique applies so generally that it must be noted: if a command accepts a filename as a parameter, you should absolutely pass it as a parameter rather than `cat` it over stdin.
This is by no means scientific, but I've got a LaTeX document open right now. A quick `time` says:
$ time grep 'what' AoC.tex
real 0m0.045s
user 0m0.000s
sys 0m0.000s
$ time cat AoC.tex | grep what
real 0m0.092s
user 0m0.000s
sys 0m0.047s
Anecdotally, I've witnessed small pipelines that absolutely make sense totally thrash a system because of inappropriate uses of `cat`. When you `cat` a file, the OS must (1) `fork` and `exec`, (2) copy the file to `cat`'s memory, (3) copy the contents of `cat`'s memory to the pipe, and (4) copy the contents of the pipe to `grep`'s memory. That's a whole lot of copying for large files -- especially when the first command grep in the sequence usually performs some major kind of reduction on the input data!
In my opinion, it's perfectly fine either way unless you're worried about performance. I personally tend to try to use the more performant option when there's a choice, but a lot of times it just doesn't matter.
That said, I suspect the example would be much faster if you didn't use the pipeline, because a single tool could do it all (I'm leaving in the substitution and column print that are actually unused in the result):
That syntax is very unusual from anything I've seen. I am also a fan of splitting pipelines with line breaks for readability, however I put the pipe on the end of each line and omit the backslash. In Bash, a line that ends with a pipe always continues on the next line.
In any case, it's probably just a matter of personal taste.
That's actually very readable. I'm now regretting that I hadn't seen this about 3 months ago--I recently left a project that had a large number of shell scripts I had written or maintained for my team. This probably would've made it much easier for the rest of the team to figure out what the command was doing.
I like 'collection pipeline' code written in this style regardless of language. If we took away the pipe symbols (or the dots) and just used indentation we'd have something that looked like asm but with flow between steps rather than common global state.
I periodically think it would be a good idea to organize a language around.
awk can do all of that except sed. And I am not sure about the last. No need to wc ($NF in AWK, if I can recall), no need for grep, you have the /match/ statement, with regex too.
I don't like this style at all. If you're following the pipeline, it starts in the middle with "input", goes to the left for the grep, then to the right (skipping over the middle part) to sed.
cat input | grep '^x' | sed 's/foo/bar/g'
Is far more readable, in my opinion. In addition, it makes it trivial to change the input from a file to any kind of process.
I'm STRONGLY in favor of using "cat" for input. That "useless use of cat" article is pretty dumb, IMHO.
That's not the same thing. The sed output will still keep lines not starting with x (just not replacing foo with bar in those) where grep will filter those out.
> That specific example is less readable, but I do like being able to do this:
> diff <(prog1) <(prog2)
> and get a sensible result.
That is called process substitution and is exactly the kind of use case that it's designed for. So yes, process substitution does make sense there.
> input | recalcitrant_program /dev/stdin
> ... but it's a bit of a tossup as to which one's more readable at this point. They're both relying on advanced shell functionality.
There's no tossup at all. Process substitution is easily more readable than your second example because you're honouring the normal syntax of that particular command's parameters rather than kludging around it's lack of STDIN support.
Also I wouldn't say either example is using advanced shell functionalities either. Process substitution (your first example) is a pretty easy thing to learn and your second example is just using regular anonymous pipes (/dev/stdin isn't a shell function, it's a proper pseudo-device like /dev/random and /dev/null) thus the only thing the shell is doing is the same pipe described in this threads article (with UNIX / Linux then doing the clever stuff outside of the shell).
This is a very silly way of writing it though. grep|sed can almost always be replaced with a simple awk: awk '/^x/ { sub("a", "b"); print $2; }' foo.txt. This way, the whole command fits on one line. If it doesn't, put your awk script in a separate file and simply call it with "awk -f myawkscript foo.txt".
I use awk in exactly this way personally, but, awk is not as commonly readable as grep and sed (in fact, that use of grep and sed should be pretty comprehensible to someone who just knows regular expressions from some programming languages and very briefly glances at the manpages, whereas it would be difficult to learn what that awk syntax means just from e.g. the GNU awk manpage). So, just as you could write a Perl one-liner but you shouldn't if you want other people to read the code, I'd probably advise against the awk one-liner too.
Not sure why you say grep and sed are more readable than awk! (not sure what 'commonly readable' means). Or that even that particular line in awk is harder to understand than the grep and sed man pages. The awk manpage even has examples, including print $2. The sed manpages must be the most impenetrable manpages known to 'man', if you don't already understand sed. (People might already know s///g because 99% of the time, that's all sed is used for.)
I actually think that cat makes it more obvious what's happening in some cases.
I had recently built a set of tools used primarily via pipes: (tool-a | tool-b | tool-c) and it looks clearer when I mock (for testing) one command (cat results | tool-b | tool-c) instead of re-flowing it just to avoid cat and use direct files.
Yes, this. Quite often I start writing out complex pipelines using head/tail to test with a small dataset and then switch it out for cat when I am done to run it on the full thing. And it's often not worth refactoring these things later unless you are really trying to squeeze performance out of them.
I can see how it's redundant. But I use cat-pipes because I once mistyped the redirection and nuked my carefully created input file :)
(Similarly, the first thing I used to do on Windows was set my prompt to [$p] because many years ago I also accidentally nuked a part of Visual Studio when I copied and pasted a command line that was prefixed with "C:\...>". Whoops.)
For interactive use, I would like to point out that even better than this use of cat is less. If you pipe less into something then it forgets it’s interactive behaviour and works like cat on a single file. So:
$ less foo | bar
Is similar too:
$ bar < foo
Except that less is typically more clever than that and might be more like:
I would be remiss if I did not point out that calling said program cat is a misnomer. Instead of 'string together in a series' (the actual dictionary definition, which coincidentally, pipes actually do) it quickly became 'print whatever I type to the screen.'
Of course, the example @arendtio uses is correct, because they obviously care about such things.
Having separate commands for outputting the content of a single file and several files would, however, be an orthogonality violation. YMMV whether having a more descriptive name for the most common use of cat would be worth the drawback.
It would fit in the broader methodology of 'single purpose tools that do their job well' or 'small pieces, loosely joined', but yes, probably too annoying to bother with.
I love the idea of simple things that can be connected in any way. I'm not so much a fan of "everything is a soup of bytes with unspecified encoding and unknown formatting".
It's an abstraction that held up quite well, but its starting to show its age.
I fully agree... and yet... everyone who has tried to "fix" this has failed, at least in the sense of "attaining anything like shell's size and reach". Many have succeeded in the sense of producing working code that fixes this in some sense.
Powershell's probably the closest to success, because it could be pushed out unilaterally. Without that I'm not sure it would have gotten very far, not because it's bad, but again because nobody else seems to have gotten very far....
100% agree.
Having to extract information with regular expressions is a waste of time.
If the structure of the data was available, you would have type safety / auto-completion. You could even have GUIs to compose programs.
Structured data flows in pipes too. Json can even be line oriented. GUI programming fails when you get past a few hundred “lines” of complexity. What I’d love to see is a revolution of shells and terminals to more easily work and pull from piped data.
Allow programs to specify the type of data they can consume and the type of the data they emit. This is how powershell does it (using the dotnet type system).
Having GUIs compose programs seems antithetical to the idea of shell scripts which are often thrown together quickly to get things done. Personally, I view shell scripting as a "good enough" and if you need more structure then you change your tools.
The core critique - that everything is stringly typed - still holds pretty well though.
>The receiving and sending processes must use a stream of bytes. Any object more complex than a byte cannot be sent until the object is first transmuted into a string of bytes that the receiving end knows how to reassemble. This means that you can’t
send an object and the code for the class definition necessary to implement the object. You can’t send pointers into another process’s address space. You can’t send file handles or tcp connections or permissions to access particular files or resources.
To be fair, the same critisim could be used for a socket? I think the issue is that some people want pipes to be something magical that connects their software, not a dumb connection between them.
I don't want all my pipes to be magical all the time, but occasionally I do want to write a utility that is "pipeline aware" in some sense. For example, I'd like to pipe mysql to jq and have one utility or the other realize that a conversion to json is needed in the middle for it work.
Im working on a library for this kind of intra-pipeline negitiation. It's all drawing-board stuff right now but I coobbled together a proof of concept:
Do you think this is a reasonable way to achieve the magic that some users want in their pipelines? Or are ancient Unix gods going to smite me for tampering with the functional consistency of tools by making their behavior different in different contexts?
This is interesting, yes. If the shell could infer the content type of data demanded or output by each command in a pipeline, then it could automatically insert type coercion commands or alter the options of commands to produce the desired content types.
You're right that it is in fact possible for a command to find the preceding and following commands using /proc, and figure out what content types they produce / want, and do something sensible. But there won't always be just one way to convert between content types...
Me? I don't care for this kind of magic, except as a challenge! But others might like it. You might need to make a library out of this because when you have something like curl(1) as a data source, you need to know what Content-Type it is producing, and when you can know explicitly rather than having to taste the data, that's a plus. Dealing with curl(1) as a sink and somehow telling it what the content type is would be nice as well.
My ultimate use case is a contrived environment where I have the luxury of ignoring otherwise blatant feature-gaps--such as compatibility with other tools (like curl). I've come to the same conclusions about why that might be tricky, so I'm calling it a version-two problem.
I notice that function composition notation; that is, the latter half of:
> f(g(x)) = (f o g)(x)
resembles bash pipeline syntax to a certain degree. The 'o' symbol can be taken to mean "following". If we introduce new notation where '|' means "followed by" then we can flip the whole thing around and get:
> f(g(x)) = (f o g)(x) = echo 'x' | g | f
I want to write some set of mathematically interesting functions so that they're incredibly friendly (like, they'll find and fix type mismatch errors where possible, and fail in very friendly ways when not). And then use the resulting environment to teach a course that would be a simultaneous intro into both category theory and UNIX.
All that to say--I agree about finding the magic a little distasteful, but if I play my cards right my students will only realize there was magic in play after they've taken the bait. At first it will all seem so easy...
The magic /proc thing is a very interesting challenge. Trust me, since I read your comments I've thought about how to implement, though again, it's not the sort of thing I'd build for a production system, just a toy -- a damned interesting one. And as a tool for teaching how to find your way around an OS and get the information you need, it's very nice. There's three parts to this: a) finding who's before and after the adapter in the pipe, b) figuring out how to use that information to derive content types, c) match impedances. (b) feels mundane: you'll have a table-driven approach to that. Maybe you'll "taste" the data when you don't find a match in the table? (c) is not always obvious -- often the data is not structured. You might resort to using extended file attributes to store file content-type metadata (I've done this), and maybe you can find the stdin or other open files of the left-most command in a pipeline, then you might be able to guesstimate the content type in more cases. But obviously, a sed, awk, or cut, is going to ruin everything. Even something like jq will: you can't assume the output and input will be JSON.
At some point you just want a Haskell shell (there is one). Or a jq shell (there is something like it too).
As to the pipe symbol as function composition: yes, that's quite right.
That sounds reasonable, I'll look into it--thanks.
I was imagining an algorithm where each pipeline-aware utility can derive port numbers to use to talk/listen to its neighbors. I may be able to use http content negotiation wholesale in that context.
I've been trying to solve the exact same problem with my shell too. It's pipes are typed and all the builtin commands can than automatically decode those data types via shared libraries. So commands don't need to worry about how to decode and re-encode the data. This means that JSON, YAML, TOML, CSV, Apache log files, S-Expressions and even tabulated data from `ps` (for example) can all be transparently handled the same way and converted from one to another without the tools ever needing to know how to marshal nor unmarshal that data. For example: you could take a JSON array that's not been formatted with cartridge returns and still grep through it item by item as if it was a multi-line string.
However the problem I face is how do you pass that data type information over a pipeline from tools that exist outside of my shell? It's all well and good having builtins that all follow that convention but what if someone else wants to write a tool?
My first thought was to use network sockets, but then you break piping over SSH, eg:
local-command | ssh user@host "| remote-command"
My next thought was maybe this data should be in-lined - a bit like how ANSI escape sequences are in-lined and the terminals don't render them as printable characters. Maybe something like the following as a prefix to STDIN?
<null>$SHELL<null>
But then you have the problem of tainting your data if any tools are sent that prefix in error.
I also wondered if setting environmental variables might work but that also wouldn't be reliable for SSH connections.
So as you can see, I'm yet to think up a robust way of achieving this goal. However in the case of builtin tools and shell scripts, I've got it working for the most part. A few bugs here and there but it's not a small project I've taken on.
If you fancy comparing notes on this further, I'm happy to oblige. I'm still hopeful we can find a suitable workaround to the problems described above.
I was hoping to stick with bash or zsh, and just write processes that somehow communicate out of band, but I think we're still up against the same problem.
One idea I had was that there's a service running elsewhere which maintains this directed graph (nodes = types, edges = programs which take the type of their "from" node and return the type of their "two" node). When a pipeline is executed, each stage pauses until type matches are confirmed--and if there is a mismatch then some path finding algorithm is used to find the missing hops.
So the user can leave out otherwise necessary steps, and as long as there is only one path through the type graph which connects them, then the missing step can be "inserted". In the case of multiple paths, the error message can be quite friendly.
This means keeping your context small enough, and your types diverse enough, that the type graph isn't too heavily connected. (Maybe you'd have to swap out contexts to keep the noise down.) But if you have a layer that's modifying things before execution anyway, then perhaps you can have it notice the ssh call and modify it to set up a listener. Something like:
Where pull_metadata_from phones home to get the metadata, then passes along the data stream untouched.
Also, If you're writing the shell anyway then you can have the pipeline run each process in a subshell where vars like TYPE_REGISTRY_IP and METADATA_INBOUND_PORT are defined. If they're using the network to type-negotiate locally, then why not also use the network to type-negotiate through an ssh tunnel?
This idea is, of course, over-engineered as hell. But then again this whole pursuit is.
> I was hoping to stick with bash or zsh, and just write processes that somehow communicate out of band, but I think we're still up against the same problem.
Yeah we have different starting points but very much similar problems.
tbh idea behind my shell wasn't originally to address typed pipelines, that was just something that evolved from it quite by accident.
Anyhow, your suggestion of overwriting / aliasing `ssh` is genius. Though I'm thinking rather than tunnelling a TCP connection, I could just spawn an instance of my shell on the remote server and then do everything through normal pipelines as I now control both ends of the pipe. It's arguably got less proverbial moving parts compared to a TCP listener (which might then require a central data type daemon et al) and I'd need my software running on the remote server for the data types to work anyway.
There is obviously a fair security concern some people might have about that but if we're open and honest about that and offer an "opt in/out" where opting out would disable support for piped types over SSH then I can't see people having an issue with it.
Coincidentally I used to do something similar in a previous job where I had a pretty feature rich .bashrc and no Puppet. So `ssh` was overwritten with a bash function to copy my .bashrc onto the remote box before starting the remote shell.
> This idea is, of course, over-engineered as hell. But then again this whole pursuit is.
Haha so true!
Thanks for your help. You may have just solved a problem I've been grappling with for over a year.
I was thinking something similar, buried in a library that everyone could link. It seems... awfully awkward to build, much less portably.
This reminds me of how busted Linux is for not having a SO_PEERCRED. You can actually get that information by walking /proc/net/tcp or using AF_NETLINK sockets and inet_diag, but there is a race condition such that this isn't 100% reliable. SO_PEERCRED would [have to] be.
The problem with that is that each command in the pipeline would have to somehow be modified to convey content type metadata. Perhaps we could have a way to send ancillary metadata (a la Unix domain sockets SCM_*).
Yes. The compromise of just using an untyped byte stream in a single linear pipeline was a fair tradeoff in the 70s, but it is nearly 2020 and we can do better.
We have done better. The shell I'm writing is typed and I know I'm not the only person to do this (eg Powershell). The issue here is really more with POSIX compatibility but if you're willing to step away from that then you might find an alternative that better suits your needs.
Thankfully switching shells is as painless as switching text editors.
I'm not going to argue that UNIX got everything right because I don't believe that to be the case either but I don't agree with those specific points:
> This means that you can’t send an object and the code for the class definition necessary to implement the object.
To some degree you can and I do just this with my own shell I've written. You just have to ensure that both ends of the pipe understands what is being sent (eg is it JSON, text, binary data, etc)? Even with typed terminals (such as Powershell), you still need both ends of the pipe to understand what to expect to some extent.
Having this whole thing happen automatically with a class definition is a little optimistic though. Not least of all because not every tool would be suited for every data format (eg a text processor wouldn't be able to do much with a GIF even if it has a class definition).
> You can’t send pointers into another process’s address space.
Good job too. That seems a very easy path for exploit. Thankfully these days it's less of an issue though because copying memory is comparatively quick and cheap compared to when that handbook was written.
> You can’t send file handles
Actually that's exactly how piping works as technically the standard streams are just files. So you could launch a program with STDIN being a different file from the previous processes STDOUT.
> or tcp connections
You can if you pass it as a UNIX socket (where you define a network connection as a file).
> or permissions to access particular files or resources.
This is a little ambiguous. For example you can pass strings that are credentials. However you cannot alter the running state of another program via it's pipeline (aside what files it has access to). To be honest I prefer the `sudo` type approach but I don't know how much of that is because it's better and how much of that is because it's what I am used to.
Of course not, but the switch to BSD fixed a bunch of the underpinnings in the OS and was a sane base to work off of.
Not to put too fine a point on it, but they found religion. Unlike Classic (and early versions of Windows for that matter), there was more to be gained by ceding some control to the broader community. Microsoft has gotten better (PowerShell - adapting UNIX tools to Windows, and later WSL, where they went all in)
Still, for Apple it meant they had to serve two masters for a while - old school Classic enthusiasts and UNIX nerds. Reading the back catalog of John Siracusa's (one of my personal nerd heroes) old macOS reviews gives you some sense of just how weird this transition was.
The section on find after pipes has also not aged well. I can see why GNU and later GNU/Linux replaced most of the old Unices (I mean imagine having a find that doesn't follow symlinks!). If I may, a bit of code golf on the problem of "print all .el files without a matching .elc"
find . -name '*.el' | while read el; do [ -f "${el}c" ] || echo $el; done
Of course this uses the dreaded pipes and doesn't support the extremely common filenames with a newline in them, so let's do it without them
Well you can't read either from /dev/null, and I don't think that's just a question of permissions. I'm pretty sure it's impossible to get /dev/null to behave like an executable.
> When would it exit though? Would it exit successfully at the end of the input stream?
A process that attempts to read from a closed pipe receives SIGPIPE. The default disposition for SIGPIPE is to terminate the program (similar to SIGTERM or SIGINT). So yeah, assuming that the previous program in the pipeline closes its stdout at some point (either explicitly, or implicitly by just exiting), then our program would die of SIGPIPE the when it tries to read() from stdin and the pipe's buffer has been depleted.
However, our program could also set SIGPIPE to ignored and ignore the EPIPE errors that read() would return in that case. In that case, it could run indefinitely. But at this point, you're way past normal behavior.
Sometimes they work great -- being able to dump from MySQL into gzip sending across the wire via ssh into gunzip and into my local MySQL without ever touching a file feels nothing short of magic... although the command/incantation to do so took quite a while to finally get right.
But far too often they inexplicably fail. For example, I had an issue last year where piping curl to bunzip would just inexplicably stop after about 1GB, but it was at a different exact spot every time (between 1GB and 1.5GB). No error message, no exit, my network connection is fine, just an infinite timeout. (While curl by itself worked flawlessly every time.)
And I've got another 10 stories like this (I do a lot of data processing). Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not. And even more frustrating, they'll often work on your local machine but not on your server, or vice-versa. And I'm just running basic commodity macOS locally and out-of-the-box Ubuntu on my servers.
I don't know why, but many times I've had to rewrite a piped command as streams in a Python script to get it to work reliably.
> Any given combination of pipe tools, there's a kind of random chance they'll actually work in the end or not.
While this may be your experience, the mechanism of FIFO pipes in Unix (which is filehandles and buffers, basically), is an old one that is both elegant and robust; it doesn't "randomly" fail due to unreliability of the core algorithm or components. In 20 years, I never had an init script or bash command fail due to the pipe(3) call itself being unreliable.
If you misunderstand detailed behavior of the commands you are stitching together--or details of how you're transiting the network in case of an ssh remote command--then yes, things may go wrong. Especially if you are creating Hail Mary one-liners, which become unwieldy.
If got to agree. I can’t recall a pipe ever failing due to unreliability.
One issue I did used to have (before I discovered ‘-o pipefail’[1]) was the annoyance that if an earlier command in a pipeline failed, all the other commands in the pipeline still ran albeit with no data or garbage data being piped to them.
Because it simply never occurred to me to check if ssh would have compression built-in.
Because why would it? If the UNIX philosophy is to separate out tools and pipe them, then the UNIX philosophy should be to pipe through gzip and gunzip, not for ssh to provide its own redundant compression option, right?
This is a good example of where that simple rule breaks down: piping it would only work when you are running a command and feeding its output to a different location whereas having it in SSH helps with everything so e.g. typing `ls` in a big directory, `cat`-ing a file, using `scp` on text, etc. benefits.
I recently came across Ramda CLI's interactive mode [1]
It essentially hijacks pipe's input and output into browser where you can play with the Ramda command. Then you just close browser tab and Ramda CLI applies your changed code in the pipe, resuming its operation.
Now I'm thinking all kinds ways I use pipe that I could "tee" through a browser app. I can use browser for interactive JSON manipulation, visualization and all around playing. I'm now looking for ways to generalize Ramda CLI's approach. Pipes, Unix files and HTTP don't seem directly compatible, but the promise is there. Unix tee command doesn't "pause" the pipe, but probably one could just introduce pause/resume output passthrough command into the pipe after it. Then web server tool can send the tee'd file to browser and catch output from there.
Well, yes, but this kind defeats transient nature of data moving through pipe. Testing and debugging and operating on pipe based processing benefits from close feedback cycle. I’d rather keep that as much as possible.
One aspect is that the coordinating entity hooks up the pipeline and then gets out of the way, the pieces communicate amongst themselves, unlike FP simulations, which tend to have to come back to the coordinator.
This is very useful in "scripted-components" settings where you use a flexible/dynamic/slow scripting language to orchestrate fixed/fast components, without the slowness of the scripting language getting in the way. See sh :-)
Another aspect is error handling. Since results are actively passed on to the next filter, the error case is simply to not pass anything. Therefore the "happy path" simply doesn't have to deal with error cases at all, and you can deal with errors separately.
In call/return architectures (so: mostly everything), you have to return something, even in the error case. So we have nil, Maybe, Either, tuples or exceptions to get us out of Dodge. None of these is particularly good.
And of course | is such a perfect combinator because it is so sparse. It is obvious what each end does, all the components are forced to be uniform and at least syntactically composable/compatible.
Back when I first started using Linux you could pipe random data to /dev/dsp and the speakers would emit various beeps. Used to be a pretty cool trick I think when ASLA came out it stopped working.
Thinking about it, Clojure advocates having small functions (similar to unix's "small programs / do one thing well") that you compose together to build bigger things.
It has, in the form of function composition, as other replies show. However, the Unix pipe demonstrates a more interesting idea: composable programs on the level of the OS.
Nowadays, most of the user-facing desktop programs have GUIs, so the 'pipe' operator that composes programs is the user himself. Users compose programs by saving files from one program and opening them in another. The data being 'piped' through such program composition is sort-of typed, with the file types (PNG, TXT, etc) being the types and the loading modules of the programs being 'runtime typecheckers' that reject files with invalid format.
On the first sight, GUIs prevent program composition by requiring the user to serve as the 'pipe'. However, if GUIs were reflections / manifestations of some rich typed data (expressible in some really powerful type system, such as that of Idris), one could imagine the possibility of directly composing the programs together, bypassing the GUI or file-saving stages.
> the pipes in your typical functional language (`|>`) is not a form of function composition
What is a "typical functional language" in this case? I don't think I've come across this `|>` notation, or anything explicitly referred to as a "pipe", in the functional languages I tend to use (Haskell, Scheme, StandardML, Idris, Coq, Agda, ...); other than the Haskell "pipes" library, which I think is more elaborate than what you're talking about.
Many functional languages have |> for piping, but chained method calls are also a lot like pipelines. Data goes from left to right. This javascript expression:
But each successive 'command' is a method on what's constructed so far; not an entirely different command to which we delegate processing of what we have so far.
The Python:
length(','.join(map(lambda n: n+1, range(1, 4)))
is a bit closer, but the order's now reversed, and then jumbled by the map/lambda. (Though I suppose arguably awk does that too.)
That's true. It's far from being generally applicable. But it might be the most "mainstream" pipe-like processing notation around.
Nim has an interesting synthesis where a.f(b) is only another way to spell f(a, b), which (I think) matches the usual behavior of |> while still allowing familiar-looking method-style syntax. These are equivalent:
[1, 2, 3].map(proc (n: int): int = n + 1).map(proc (n: int): string = $n).join(",").len
len(join(map(map([1, 2, 3], proc (n: int): int = n + 1), proc (n: int): string = $n), ","))
The difference is purely cosmetic, but readability matters. It's easier to read from left to right than to have to jump around.
C# extension methods provide the same syntax, and it is used for all of its LINQ pipeline methods. It's amazing how effective syntactic sugar can be for readability.
It actually somewhat changes the way you write code, because it enables chaining of calls.
It's worth noting there's nothing preventing this being done before the pipe operator using function calls.
x |> f |> g is by definition the same as (g (f x)).
In non-performance-sensitive code, I've found that what would be quite a complicated monolithic function in an imperative language often ends up as a composition of more modular functions piped together. As others have mentioned, there are similarities with the method chaining style in OO languages.
Also, I believe Clojure has piping in the form of the -> thread-first macro.
IMO the really useful part of pipes is less the operator and more the lazy, streaming, concurrent processing model.
So lazy collections / iterators, and HoFs working on those.
The pipe operator itself is mostly a way to denote the composition in reading order (left to right instead of right to left / inside to outside), which is convenient for readability but not exactly world-breaking.
To take any significant advantage of it you need to use data-driven, transformational approach of solving something. But funny thing is once you have that it's not really a big deal even if you don't have a pipe operator.
Monads are effectively pipes; the monad controls how data flows through the functions you put into the monad, but the functions individually are like individual programs in a pipe.
I'm not sure I agree with this. Function composition is more directly comparable to pipes, whereas I tend to think of monads as collapsing structure (i.e. `join :: m (m a) -> m a`)
I wouldn't make the same argument about the IO monad, which I think more in terms of a functional program which evaluates to an imperative program. But most monads are not like the IO monad, in my experience at least.
Forgive me if I'm misreading this syntax, but to me this looks like plain old function composition: a call to `select` (I assume that's like Haskell's `filter`?) composed with a call to `map`. No monad in sight.
As I mentioned, monads are more about collapsing structure. In the case of lists this could be done with `concat` (which is the list implementation of monad's `join`) or `concatMap` (which is the list implementation of monad's `bind` AKA `>>=`).
Nope, it's not. It's Ruby, and the list could be an eager iterator, an actual list, a lazy iterator, a Mabye (though it would be clumsy in Ruby), etc.
And monads are not "more about collapsing structure". They are just a design pattern that follows a handful of laws. It seems like you're mistaking their usefulness in Haskell for what they are. A lot of other languages have monads either baked in or an element of the design of libraries. Expand your mind out of the Haskell box :)
So we have a value called "list", we're calling its "select" method/function and then calling the "map" method/function of that result. That's just function composition; no monads in sight!
To clarify, we can rewrite your example in the following way:
list.select { |x| x.foo > 10 }.map { |x| x.bar }
# Define the anonymous functions/blocks elsewhere, for clarity
list.select(checkFoo).map(getBar)
# Turn methods into standalone functions
map(select(list, checkFoo), getBar)
# Swap argument positions
map(getBar, select(checkFoo, list))
# Curry "map" and "select"
map(getBar)(select(checkFoo)(list))
# Pull out definitions, for clarity
mapper = map(getBar)
selector = select(checkFoo)
mapper(selector(list))
This is function composition, which we could write:
go = compose(mapper, selector)
go(list)
The above argument is based solely on the structure of the code: it's function composition, regardless of whether we're using "map" and "select", or "plus" and "multiply", or any other functions.
To understand why "map" and "select" don't need monads, see below.
> the list could be an eager iterator, an actual list, a lazy iterator, a Maybe (though it would be clumsy in Ruby), etc.
Yes, that's because all of those things are functors (so we can "map" them) and collections (so we can "select" AKA filter them).
The interface for monad requires a "wrap" method (AKA "return"), which takes a single value and 'wraps it up' (e.g. for lists we return a single-element list). It also requires either a "bind" method ("concatMap" for lists) or, my preference, a "join" method ("concat" for lists).
I can show that your example doesn't involve any monads by defining another type which is not a monad, yet will still work with your example.
I'll call this type a "TaggedList", and it's a pair containing a single value of one type and a list of values of another type. We can implement "map" and "select" by applying them to the list; the single value just gets passed along unchanged. This obeys the functor laws (I encourage you to check this!), and whilst I don't know of any "select laws" I think we can say it behaves in a reasonable way.
In Haskell we'd write something like this (although Haskell uses different names, like "fmap" and "filter"):
data TaggedList t1 t2 = T t1 [t2]
instance Functor (TaggedList t1) where
map f (T x ys) = T x (map f ys)
instance Collection (TaggedList t1) where
select f (T x ys) = T x (select f ys)
In Ruby we'd write something like:
class TaggedList
def initialize(x, ys)
@x = x
@ys = ys
end
def map(f)
TaggedList.new(@x, @ys.map(f))
end
def select(f)
TaggedList.new(@x, @ys.select(f))
end
end
This type will work for your example, e.g. (in pseudo-Ruby, since I'm not so familiar with it):
myTaggedList = TaggedList.new("hello", [{foo: 1, bar: true}, {foo: 20, bar: false}])
result = myTaggedList.select { |x| x.foo > 10 }.map { |x| x.bar }
# This check will return true
result == TaggedList.new("hello", [false])
Yet "TaggedList" cannot be a monad! The reason is simple: there's no way for the "wrap" function (AKA "return") to know which value to pick for "@x"!
We could write a function which took two arguments, used one for "@x" and wrapped the other in a list for "@ys", but that's not what the monad interface requires.
Since Ruby's dynamically typed (AKA "unityped") we could write a function which picked a default value for "@x", like "nil"; yet that would break the monad laws. Specifically:
bind(m, wrap) == m
If "wrap" used a default value like "nil", then "bind(m, wrap)" would replace the "@x" value in "m" with "nil", and this would break the equation in almost all cases (i.e. except when "m" already contained "nil").
It doesn't look the same, bug go's up.Reader and io.Writer are the interfaces you implement if you want the equivalent of "reading from stdin"/"writing to stdout". Once implemented io.Copy is the actual piping operation.
It has, D's uniform functional syntax makes this as easy as auto foo = some_array.filter!(predicate).array.sort.uniq; for the unique elements of the sorted array that satisfy the predicate pred.
awk, grep, sort, and pipe. I'm always amazed at how well thought out, simple, functional, and fast the unix tools are. I still prefer to sift through and validate data using these tools rather than use excel or any full-fledged language.
Edit: Also "column" to format your output into a table.
Although I probably use it multiple times everyday, I hate column. At least the implementation I use has issues with empty fields and a fixed maximum line length.
The biggest issue is that pipes are unidirectional, while not all data flow is unidirectional.
Some functional programming styles are pipe-like in the sense that data-flow is unidirectional:
Foo(Bar(Baz(Bif(x))))
is analagous to:
cat x | Bif| Baz |Bar| Foo
Obviously the order of evaluation will depend on the semantics of the language used; most eager languages will fully evaluate each step before the next. (Actually this is one issue with Unix pipes; the flow-control semantics are tied to the concept of blocking I/O using a fixed-size buffer)
The idea of dataflow programming[1] is closely related to pipes and has existed for a long time, but it has mostly remained a niche, at least outside of hardware-design languages
I built an entire prototype ML system/pipeline using shell scripts that glued together two python scripts that did some heavy lifting not easily reproduced.
I got the whole thing working from training to prediction in about 3 weeks. What I love about Unix shell commands is that you simply can't abstract beyond the input/output paradigm. You aren't going to create classes, types classes, tests, etc. It's not possible or not worth it.
I'd like to see more devs use this approach, because it's a really nice way to get a project going in order to poke holes in it or see a general structure. I consider it a sketchpad of sorts.
My backup system at work is mostly bash scripts and some pipes.
If you write them cleanly they don’t suck and crucially for me, bash today works basically the same way as it did 10 years ago and likely in 10 years, that static nature is a big damn win.
I sometimes wish language vendors would just say ‘this language is complete, all future work will be bug fixes and libraries’ a static target for anything would be nice.
Elixir did say that recently except for one last major change which moved it straight up my list of things to look at in future.
- My IRC notification system is a shell script with entr, notify-send and dunst.
- My mail setup uses NMH, everything is automated. I can mime-encode a directory and send the resulting mail in a breeze.
- GF's photos from IG are being backed up with a python script and crontab. Non IG ones are geotagged too with a script. I just fire up some cli GPS tools if we hike some mountain route, and gpscorrelate runs on the GPX file.
- Music is almost everything chiptunes, I felt interesent on any mainstream music since 2003-4. I mirror a site with wget and it's done. If they offered rsync...
- Hell, even my podcasts are being fetch via cron(8).
- My setup is CWM/cli based, except for mpv, emulators, links+ and vimb for JS needed sites. Noice is my fm, or the pure shell. find(1) and mpg123/xmp generate my music playlist. Street View is maybe the only service I use on vimb...
The more you automate, the less tasks you need to do. I am starting to avoid even taskwarrior/timew, because I am almost task free as I don't have to track a trivial <5m script, and spt https://github.com/pickfire/spt is everything I need.
Also, now I can't stand any classical desktop, I find bloat on everything.
Which are in a way comparable, but I'd say that pipes are more like the arrow ->.
Transducers are compostable algorithmic transformations that, in a way, generalize map, filter and friends but they can be used anywhere where you transform data. Transducers have to be invoked and handled in a way that pipes do not.
Anyone interested should check out Hickeys talks about them. They are generally a lot more efficient than chaining higher order list processing functions and since they don't build.intermediate results they have a lot better GC performance.
Fully agree, pipes are awesome, only downside is the potential duplicate serialization/deserialization overhead.
Streams in most decent languages closely adhere to this idea.
I especially like how node does it, in my opinion one of the best things in node. Where you can simply create cli programs that have backpressure the same as you would work with binary/file streams, while also supporting object streams.
Node streams are excellent, but unfortunately don't get as much fanfare as Promises/async+await. A number of times I have gotten asked "how come my node script runs out of memory" -- due to the dev using await and storing the entirety of what is essentially streaming data in memory in between processing steps.
Pipes have been a game-changer for me in R with the tidyverse suite of packages. Base R doesn't have pipes, requiring a bit more saving of objects or a compromise on code readability.
One criticism would be that ggplot2 uses the "+" to add more graph features, whereas the rest of tidyverse uses "%>%" as its pipe, when ideally ggplot2 would also use it. One of my most common errors with ggplot2 is not utilizing the + or the %>% in the right places.
Unix's philosophy of “do one thing well” and “expect the output of every program to become the input to another” is living with "microservices" in nowadays.
If you want to see what the endgame of this is when taking the reasoning to the maximum, look at visual dataflow languages such as Max/MSP, PureData, Reaktor, LabVIEW...
No silver bullet guys, sorry. If you take out the complexity of the actual blocks to have multiple small blocks then you just put that complexity at another layer in the system. Same for microservices, same for actor programming, same for JS callback hell...
That is not actually simple because the data is flowing across two completely different message passing paradigms. Many users of Max/MSP and Pd don't understand the rules for such dataflow, even though it is deterministic and laid out in the manual IIRC.
The "silver bullet" in Max/MSP would be to only use the DSP message passing paradigm. There, all objects are guaranteed to receive their input before they compute their output.
However, that would make a special case out of GUI building/looping/branching. For a visual language designed to accommodate non-programmers, the ease of handling larger amounts of complexity with impunity would not be worth the cost of a learning curve that excludes 99% of the userbase.
Instead, Pd and Max/MSP has the objects with thin line connections. They are essentially little Rube Goldberg machines that end up being about as readable. But they can be used to do branching/looping/recursion/GUI building. So users typically end up writing as little DSP as they can get away with then uses thin line spaghetti to fill in the rest. That turns out to be much cheaper than paying a professional programmer to re-implement their prototype at scale.
But that's a design decision in the language, not some natural law that visual programming languages are doomed to generate spaghetti.
> The "silver bullet" in Max/MSP would be to only use the DSP message passing paradigm. There, all objects are guaranteed to receive their input before they compute their output.
in one hand, this simplifies the semantics (and it's the approach I've been using in my visual language (https://ossia.io)), but in the other it tanks performances if you have large numbers of nodes... I've worked on Max patches with thousands and thousands of objects - if they were all called in a synchronous way as it's the case for the DSP objects you couldn't have as much ; the message-oriented objects are very useful when you want to react to user input for instance because they will not have to execute nearly as often as the DSP objects, especially if you want a low latency.
That is certainly true. My point is that this is a drawback to the implementation of one set of visual programming languages, not necessarily a drawback of visual programming languages.
I can't remember the name of it, but there's a Pd-based compiler that can take patches and compile them down to a binary that performs perhaps an order of magnitude faster. I can't remember if it was JIT or not. Regardless, there's no conceptual blocker to such a JIT-compiled design. In fact there's a version of [expr] that has such a JIT-compiler backing it-- the user takes a small latency hit at instantiation time, but after that there's a big performance increase.
The main blocker as you probably know is time and money. :)
I made a simulation and video which shows why the pipe is so powerful: https://www.youtube.com/watch?v=3Ea3pkTCYx4
I showed it to Doug McIlroy, and while he thought there was more to the story, he didn't disagree with it.
I love writing little tools and scripts that use pipes, I've accumulated a lot of them over the years and some are daily drivers.
It's a great learning tool for learning a new programming language as well as the interface between the boundaries of the program are very simple.
For example I wrote this recently https://github.com/djhworld/zipit - I'm fully aware you could probably whip up some awk script to do the same, or chain some existing commands together, or someone else has written the same thing, but I've enjoyed the process of writing it and it's something to throw in the tool box - even if it's just for me!
This is cool and useful, but not all unix programs follow this convention:
* find
* cal
* vi
* emacs
* ls
These don't use one of standard input/standard output. (edited) and are not fully pipeable.
I don't recall seeing a list of programs--tools, in the original description--that distinguish between pipeable and not-pipeable programs.
Also, none of the corrective cat/grep code in these threads point out that grep in fact takes file names, so "cat foo | grep stuff" is just a silly no-op.
Well find, cal, and ls don't take input at all. And vi (well vim) in fact does. If you invoke it with a - for the file name argument, it will read standard input into the buffer. I can't comment on emacs as I don't use it much.
I use vim as a visual pipe or pipe debugger with undo when I need to perform a series of transformations (grep/sort/cut/lookup/map/run other data tool). Obviously geared towards text files because it is vim.
The ! command sends the selection through external command(s) as STDIN and then replaces the selection with STDOUT from the command(s). For example, grep or sort, but can be any command that works with pipes. Buffer is replaced with output (sorted file for example). Undo with U to go back to original data. Redo with R to go forward to transformed data. Command line history is available to add more commmands or correct when you type ! again.
Edit a file. Select block (Visual mode shift-V) and type ! or use the whole file with gg!G command. Type in the commands you need to run.
Vim also reads stdin if you give - as the filename, like “ls -l | vim -“ so you we can use it at the end of a pipe instead of redirecting to a file.
Like I said, I use it as an interactive debugger to assemble pipelines and see the results.
With some redirection magic it can be made to, but I find that magic is always too much stuffing around to make it useful. Inside scripts though I like to pipe to vim and have it output to a file:
pipeline | vim - +"file $mytempfile"
Then if you quit vim without saving $mytempfile won't exist, if you save and quit it will and further processing can be done. I've got a view scripts like this to to things like viewing csv files after being piped through column, after saving they're converted back to csv.
- Cal can be parsed (it's shows in the Unix Programming Environment, from 1983)
- Vi is a visual editor, it can be used as a front-end for ed/ex commands for I/O anyway. Kinda like the acme(1) of its day. You have both :w and :r. And, hint: it can input text from external pipes.
- Emacs is not Unix
- ls(1) is not meant to be parsed on files' content, that's the shell globbing for.
FWIW, POSIX specifies the ls output format for -l and several other flags. For example,
If the -l option is specified, the following information
shall be written for files other than character special and
block special files:
"%s %u %s %s %u %s %s\n", <file mode>, <number of links>,
<owner name>, <group name>, <size>, <date and time>,
<pathname>
There's no stat command in POSIX. More practically, the BSD and GNU versions of stat are completely incompatible. If you want to query a file's size or other metadata from a portable shell script, you need to parse the output of ls.
It's true that `ls` and `find` don't take stdin, but their stdout is often piped: consider `ls | wc -l` or even something like (more complicated than) `find . -name \*.txt | xargs vi`.
I think the Unix pipeline works because this it has this moldable and expressive text substance that is exchanged. Gstreamer also has pipelines, but the result in my opinion is quite awkward because events of many types are exchanged. Windows Powershell also has a pipeline where objects are exchanged, but it also somehow failed to become a huge success.
i think the unix pipeline concept doesn't quite scale to other domains that try to exchange a different unit of information between the pipeline elements.
ffmpeg took a different approach (sort of like a single program/system with many command line options). Some say its easier to use (at least easier than gst-launch)
I find the ffmpeg options dizzying whereas the single syntax for gst-launch and parse_bin_from_description is pretty neat. But then I guess you've still got a selection of randomly named properties to discover and correctly set.
In Unix, everything is a file. /dev/null is a file but not an executable so it can't even become a process. So you can't pipe anything into it.
If you consider this last sentence as a "figure of speech" then you should have probably avoided the use of "pipe" vs "redirect" and used send them to /dev/null.
Smalltalk's "method cascading" (the `;` operator) did roughly that. I don't think you're forking an entire sequence of methods though, it just allows performing operations on the same root object e.g.
a foo
; bar
; baz
would sequentially send "foo", "bar" then "baz" to "a", then would return the result of the last call.
The ability to fork iterators (which may be the semantics you're actually interested in) also exists in some languages e.g. Python (itertools.tee) or Rust (some iterators are clonable)
I don't think the commandline is well suited to constructing or reading anything more complicated than a linear pipe. What you'd want for that kind of workflow is a 2d layout.
If only there was some standard format to interchange structured data over pipes, other than plain text delimited with various (incompatible) combinations of whitespaces, using various (incompatible) escaping schemes.
That's one of the design goals of Powershell, you don't pass streams of text between cmdlets, you pass objects, which have a type, properties and methods you can use in standard ways (they may also be strings)
A while back I made a PoC that used pipes to create bidirectional json-speaking connections between applications and then distribute the communication between distributed nodes. The idea being, don't just distribute TCP streams between distributed processes, but give 'em objects to make data exchange more expressive. I don't know where my PoC code went, but it wasn't very stable anyway. Just figured it was something we should have adopted by now (but that Plan9 probably natively supports)
Using the input and output characteristics of pure functions, pure functions are used as pipelines. Dataflow is formed by a series of pure functions in series. A dataflow code block as a function, equivalent to an integrated circuit element (or board)。 A complete integrated system is formed by serial or parallel dataflow.
data-flow is current-flow, function is chip, thread macro (->>, -> etc.) is a wire, and the entire system is an integrated circuit that is energized.
Yes, there is a strong connection between pipes and functional programming, they all transform what passes through them and should not retain state when implemented properly.
They also both rely on very generic data types as input and output. Which is why I was surprised to see the warning about avoiding columnar data. Tabular data is a basic way of expressing data and their relationships.
Anyone else disappointed that this wasn't some in-depth, Docker related pipe usage post?
For those of us that know who Jess is, the post is a little lacking. I went there to read about some cool things she's doing with pipe, but was disappointed at a post that contained nothing.
Fair enough. I'm a little surprised at the response. I didn't expect many people here aren't already intimately familiar with pipes, and for those that aren't *nix users, I'm not sure the content is of interest.
I guess being honest around here is the wrong thing.
For what it's worth, I had no issue with your opinion and also didn't downvote either of your comments.
And I also didn't upvote the article, because it is indeed quite basic. Though apperantly enough people did find it interesting enough to make it to the frontpage ^^
I think you're being downvoted because of: "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
Having said that, it does seem like the author's posts seem to get upvoted no matter what by a contingent of readers. I also found TFA to be quite basic, and it couldn't have taken very much effort to write (I'm assuming the author is already well familiar with pipes based on her background). The article would have been more interesting with some more technical details or experimental results - or perhaps some novel information that most people aren't aware of.
It's unfortunate. I generally tend not to comment on things I want to criticise. I love Jess's work but saw zero value come from the time spent reading that.
I think my point was more valid, in this case, given that this isn't some random person trying to contribute something, I'd have been less critical in that case and not commented, but this is Jess Frazelle. Those of us that know the name know she's an extremely skilled engineer, and _that_ is why I was disappointed at the post.
The Unix philosophy is documented by Doug McIlroy as:
I really like the last two, if you can do them in development then you are then you have a great dev culture