Hacker News new | past | comments | ask | show | jobs | submit login
Where the Unix philosophy breaks down (johndcook.com)
83 points by tianyicui on March 16, 2011 | hide | past | favorite | 55 comments



I'm loathe to hand credit to Microsoft, but I think they're onto something with Powershell. Rather than being limited to passing text streams around, Powershell commands pass full-featured objects back and forth (with callbacks and everything!).

This flexibility really seems to break down the transaction costs that the original article talks about, when compared to traditional Unix pipes.

I'm still trying to come up with a Linux-y way to do this. Microsoft can mandate that all Powershell commands run under the CLR (and therefore can talk to arbitrary implementations of some CLR class) but any FOSS equivalent is going to have to deal with people wanting to interface programs written for a dozen different runtimes (and probably even some C code running on the bare metal).


I'm loathe to hand credit to Microsoft, but I think they're onto something with Powershell. Rather than being limited to passing text streams around, Powershell commands pass full-featured objects back and forth (with callbacks and everything!).

Not to state the obvious, but this step was taken by major scripting languages decades ago with the minor restriction that your program had to run within one executable. Furthermore the idea of encapsulating objects in a way that could be passed between programs is not much younger - in the early 90s you had (still used) protocols for that like CORBA and PDO. And Microsoft was not far behind with its ever evolving OLE/COM/DCOM protocols.

I have no idea how good Microsoft's current implementation is. (Given how many times they have tried to tackle it, it wouldn't surprise me if it was pretty good by now.) But it isn't as original as it may seem.


Much of the simplicity which renders multi-command, single-line pipe chains facile to the posix shell user would be lost if the interchange format was any more complex than plain text.

If you need fields, then separate them with a text delimiter, if you need something more complicated, then maybe you should be using a programming language, or piping between scripts in a language with good text serialization.


Putting aside the fancy words, I agree.

Given the power of any computer (even those 30 years old), it's enormously powerful to be able to drop one segment of a pipe-chain and look at the output. Funny that I'm facing a similar issue now that some of the server world is going Javascript: "console.log(X);" doesn't necessarily tell you the whole story about X. Exchanging objects is 14% too complicated and anything more than 0% too complicated is too complicated.


> it's enormously powerful to be able to drop one segment of a pipe-chain and look at the output.

But having more complex objects instead of text doesn't stop us from having that. The objects just need to have some kind of to-string method that the shell uses.


It does hinder you, since string serializations will usually be an incomplete (or worse, inconsistent) view of an object. With Unix pipes, their string serialization is all that it is.


besides the toString if we could get a decent amount of the objects to be public we might have simpler formatting rules.


What simplicity? http://news.ycombinator.com/item?id=2229833. Structured data will give much more simplicity as everything (yes everything) requires some form of structure to parse. As a snarky example: for f in /usr/bin; do $f --help; done now tell me that these don't need structure. Also how is structured data any less parsable than unstructured data? It's all text one has a known format the other doesn't


I think this depends a lot on how you define simplicity.

If we define simplicity as: a plain text interchange format, then clearly by definition a text based shell is simpler.

Piping objects instead of text allows a single command to do more, as it can operate on any of the object's properties that it understands. From a user's perspective, I believe, that this is simpler - however I will acknowledge that this is somewhat subjective. For example: The command ls | sort LastWriteTime would print a list of files sorted by their LastWriteTime. The command ls | sort Length would print a list of files sorted by their size. Doing this in bash is more difficult as it requires additional commands to parse the output of ls.


> Much of the simplicity

It's not simple, it's primitive rounded in the corner.

If all you have is raw blobs of data in pipes, you can't grow up when you suddenly realize need for something more complex.

See the story with adding UTF-8 BOM support before #! in Linux kernel.


There's an experiment in Squeak to let plain Unix pipes intermingle with piping between Smalltalk/Squeak objects: http://wiki.squeak.org/squeak/1914. Not sure if it could be generalized, or how usable it ends up being.

To quote from a random mailing list post (http://www.mail-archive.com/pharo-project@lists.gforge.inria...):

The syntax is a mashup of Smalltalk and unix shell conventions. The pipes are OS pipes where necessary to interact with external programs, or an object that minics OS pipe behavior if the "commands" being connected are Smalltalk expressions or command objects.


I would suggest that the Linux-y way to do this would be to pipe JSON between applications. The even-more-Linuxy-way would be to have an option to use YAML instead.


This article doesn't really cover new ground. Software engineers have known about this for many years and Microsoft's OLE and Apple's ill-fated OpenDoc[1] were developed to address some of these issues.

[1] https://secure.wikimedia.org/wikipedia/en/wiki/Opendoc


"I'm still trying to come up with a Linux-y way to do this."

How about the interactive python shell?

True that it locks out all other runtimes, but given the massive set of libraries, that may be a small price to pay? Same applies to <your favorite runtime with lots of libraries>.


Of course you're not going to rewrite Word by piping data around, for opening and saving and doing spell check.

Scripting languages are the next extension of unix - you take all those little programs that do something well, put them together with some control structure and you can do big things, often in an automated manner, which is difficult to do on a GUI.

That said, showing the GUI interface to someone and having them make selections is much more suited to other programming methods (MVC,etc.), often which can just run scripts on the back end.

The point of unix isn't having a bunch of programs that do different things - it's combining them like legos to do complex tasks via scripting.


You're limiting your view on operating systems to the stone age and that's exactly what Linux, Windows... etc are - stone age operating systems (midori is an exception if ever what I read is released). If the operating system is simply a vm the interface is the vm, what do I mean by this is there's no separation between low level and high level thus no barriers.


Out of curiousity, what is a currently functional non-stone-age operating system?


I haven't used any exotic OSes (like the Lisp Machines?), but I believe that Unix's "do one thing and do it well" philosophy was never serious, just an (ignored) guideline to programmers. Ignored partly because Unix isn't that conducive to it.

The Unix Way is to operate on lines of text with tools like sed/awk. These lines of text are poorly-serialized object representations.

Programs are big monolithic procedures (which take a lot of optional params as input — switches), which you're supposed to glue together under extremely weak programming environments (like bash).

I abuse Unix to get out of this paradigm as much as appropriate. There is a good idea hidden inside communicating with serialized objects (which I'm sure Erlang users understand better than I do), but it's a very primitive version.

(Maybe I'm wrong and someone will finally enlighten me.)


"Of course you're not going to rewrite Word by piping data around, for opening and saving and doing spell check."

No, but Apple had a project called OpenDoc that did something similar. The idea was that you'd have a UI centered around the document, and you could pull in spell check and editing and all kinds of other components instead of having a program like Word. It didn't work out, for various reasons, many of which were unrelated to the merit of the idea itself.

http://en.wikipedia.org/wiki/OpenDoc


In the free software world, the functional boundaries between programs are, in fact, usually respected, even with interactive programs, where his "transaction costs" would apply. Emacs edits, awk does its thing, etc.

I think the reason programs creep in functionality is NOT these technical sounding "transaction costs", but in order to sustain vendor lock in. In a world where nobody worries about lock in, there is no reason to bundle an editor with a calculator with an email program.

In the Adobe world, the boundaries of the programs are pretty tight, with a little bit of bleed over. Counter example? Maybe.

That was not a great article, if you ask me -- I think the analysis was trite, and MS Office software probably isn't a good example of anything.


You're citing Emacs as an example of a program that does just one thing and respects "functional boundaries" unlike (say) the way that Word does some spreadsheet-y things and Excel some databasical things?

A standard Emacs installation contains, among other things: a Lisp interpreter, an adventure game, a mail program, a Usenet newsreader, a calendar, a calculator with symbolic-algebra features, and an implementation of Conway's Life.


Emacs is really more of a programming platform than a standard single-function application. It doesn't make sense to compare it to Unix apps like wc or ls. It's more similar to Bash, which allows you to combine built in functions and a whole host of separate programs (awk, sed etc).

Put another way, Emacs makes it easy to process text files, combining the built-in features and modes of Emacs with just about any Unix command-line tool (via M-x shell-command-on-region, or simply by processing the file from a terminal and passing the result back to Emacs). Word, on the other hand, makes it easy to do fancy word processing and associated formatting tasks inside Word, but makes it very difficult to share the resulting file with other programs (even different versions of Word).


Don't true emacs users live in emacs?


Like Theseus's ship, their bodies & minds are gradually replaced by Lisp functions.


Honest question: does the Unix philosophy apply to all kinds of software?

I am curious to know if it's fair to compare typical Unix software (CLI-based, text-based --- usually system software is what comes to mind) with application software (Word processing, enterprise software, etc).


It does to the examples you gave.

You get better output quicker -- a neatly formatted PDF -- with CLI tools like LaTeX or troff (they work as a pipe of filters), than with a WYSIWYG like MS Word.

You get more relevant data quicker with SQL querying a Data Warehouse than via some GUI front-end that only lets you use some opaque, pre-defined queries.

GUI is meant to flatten the learning curve; CLI is meant to let you unleash power of tools you know well.

GUI may shine when you do a task once in a blue moon, but CLI is the choice for frequent use.

On the other hand, I've never seen any good CLI approach to browsing the web ;-)


External libraries and function calls that encapsulate functionality are the equivalent concept in GUI land.

For example, it would totally suck to have to write the entire GUI for file selection, and nobody does it if they're smart - they just call a function, and get back a file handle or path.

The problem is programs that try to reinvent the wheel. How many file selectors have you seen that are "custom" in some way and thus break convention?


I think composition turns into extensibility when you start working on larger programs. For example, you cannot pipe the output of Vim or Emacs into another program, but you can write 100 lines of Python/Lisp/Lua/Perl/etc. that will let one application talk to the other.

Other applications see themselves as a walled garden, and may allow extension within the program, but make it difficult for communication to occur between programs, often by not exposing enough state, not documenting the interface, or not providing any scripting at all.


While not a literal "pipe", the $EDITOR and $VISUAL shell conventions are pretty much the same concept applied to larger programs.

And of course, which I think you were saying with your comment on extensibility, vim and emacs are well-designed for sending content out to a shell for processing (spell-check is a common example) and retrieving the results. ("!!" in vim)


It's about writing modules/extensions for an existing system.

Traditional Unix programs make your shell environment work better.

Adblock Plus, Perspectives, Tree Style Tab, etc, make my Iceweasel work better.

mod_rewrite, mod_php, etc, make Apache work better.

The "text streams" part is specific to Unix, because files and pipes are the defining feature of that environment. The other parts are probably universal across any extensible system.


I couldn't help but read this article as a demonstration that the Unix philosophy still holds strong.

Even the MS Office example goes that way: people who want a nice graph and a computing sheet in a Word document will embed an Excel component, which is virtually Excel running and piping its output into the Word document. The same goes for reaching Access datasets and queries to produce graphs in Excel: Access pipes its data to Excel, which in turn produces a graph.

"programs gain[ing] overlapping features over time" is a result of feature creep and certainly a lack of architecture foresight. The author refers to spreadsheet features in Word and database features in Excel. I can only take a guess at what he refers to but, assuming he's not talking about object embedding (which I took care of above), being able to draw tables graphically in Word is not equal to having a sheet able to compute formulas and render graphs, and being able to sort and filter columns in Excel is worlds apart to queries on a structured, relational table.

The core of the article seems to boil down to the fact that software inevitably brings duplicate functionality, and that this is good because it alleviates the effort of bridging applications. Yet duplicating functionality puts a burden on the developers that might be better spent elsewhere (including providing easy access to the "real deal"). What's more it suddenly becomes begrudgingly effortful for the user to achieve a task escaping the subset of duplicated functionality. In short, it might save you a little work, but it doesn't scale. That's why properly designed technology allows for trivial connection of software (Unix pipes) and description of contracts (sane text output) or abstracts the contract part entirely (e.g including an Excel spreadsheet in a Word document).


Lately, I've been thinking a lot about interfaces, operating systems and procedure composition and this are some of my "ruminations":

- None of the modern operating systems is really orthogonal: neither linux nor the BSDs embrace the real "Unix philosophy"

- CLIs are monodimensional, and suffer from the same problems that dataflow programming has: it's easy to split the flow of data into two pipes, but is really hard (semantically) to rejoin them.

- GUIs are imperative in nature, and that hurts composability.

- Shells and REPLs have too many overlapping features (like OSs and ProgrammingLanguages+Libraries): the de facto standard for shells, bash, has a syntax that, for a programmer, feels really unclear. Every non trivial bash script that I wrote has turned really fast into escaping hell.


> - CLIs are monodimensional, and suffer from the same problems that dataflow programming has: it's easy to split the flow of data into two pipes, but is really hard (semantically) to rejoin them.

That's not hard if the text streams are line-oriented

* append one to another, if order is not important

* append one to another and then sort again, if order is important,

* JOIN the streams using the `join' command (it's a standalone text utility, does pretty much what SQL JOIN does)

To compute sum (in set theory sense), use ` | sort | uniq', to take intersection, use ` | sort | uniq -d'. Or use the `comm' utility.

If using Bourne shell derivative, you may end up using temporary files. With the Rc shell, you may use the <{COMMAND} syntax, like:

  cat <{foo | bar | baz} <{frob | knob | bob} <{some | more}  | subsequent | processing
to catenate output of three pipes and do subsequent processing.


On the contrary I think whenever you get into a situation where there is an overlap in functionality, that means it needs to be further broken down into more smaller independent units. I see tremendous value in adhering to this philosophy, and usefulness of Unix is a testament to it. That does not means there are no shortcoming in the overall Unix approach but as far as this particular philosophy is concerned, I think it is very smart strategy to adhere to. Look at nature, our different body parts are specialized for specific functions(for example eyes, ears etc) and they do those functions remarkably well and when you combine these, we can recognizing the physical world.


For quite a few years now it seems like a lot of Unix-style software has gone beyond text streams, to include a back channel connection to external data and processes (databases, http services, etc).

I generally enjoy sharing context as a data structure with a file-system-like interface (e.g. REST) between processes, and only more so when the command/control interface is a simple text stream. Of course, this reminds me of Plan 9, but there's also 1060 Research, and other examples I'm sure.

There is quite a lot of head room left in the Plan 9 paradigm, and you do yourself no disservice by taking some time to figure it out. It really is more Unix than Unix.


Uhhmmm... am I the only one that noticed that the sole example in the article of "programs gain overlapping features over time" is of Microsoft software ... which, like, has nothing to do with Unix? My unix/linux box doesn't have overlapping software. Every piece of software is a specialist and does what it does well. It is easily replaceable and does not interfere with other software installed on the box. The unix/linux philosophy works extremely well. That's why it's popular.


Just as a for example:

  cat -n
overlaps with:

  awk '{ print NR "\t" $0 }'
http://harmful.cat-v.org/cat-v/unix_prog_design.pdf


Reminds me of http://www.feep.net/~roth/geek-humor/unix/unix-crash-recover... , which talks a bit about the pains used to work around the lack of ls(1).


I see the overlapping features bit pop up almost everywhere that people build GUIs.

Your word processor features a WYSIWYG text editor (if you're on Linux this is probably AbiWord, OO.o Writer, or KWord). Meanwhile, if you have a modern web browser, it probably has a WYSIWYG formatted text editor as well. (Anything based on a recent version of Webkit or Gecko certainly will; on your Linux box that might be Firefox, Chrome, Epiphany, or Konqueror.)

If truly none of your software features overlapping features, I would bet that either you don't have X11 installed, or you have Solaris or HP-UX installed with a bunch of ancient Motif applications. :-)


A WYSIWYG HTML editor (as in Writely/Gmail) isn't quote the same as a general WYSIWYG document editor, though.

Unix programs, even GUI ones, share a ton of code, as you can easily see if you try to install an interesting GUI application like Shotwell or Digikam.

Heck, I've never written an Instant Messenger application, but I know that "libpurple" is the name for the IM component that most Linux chat programs use.

And hey, that's exactly how it is advertised:

http://developer.pidgin.im/wiki/WhatIsLibpurple

What is libpurple?

libpurple is intended to be the core of an IM program. When using libpurple, you'll basically be writing a UI for this core chunk of code. Pidgin is a GTK+ frontend to libpurple, Finch is an ncurses frontend, and Adium is a Cocoa frontend.


> My unix/linux box doesn't have overlapping software

Not really true at all. find is especially heinous. "find ... -delete" is equivalent to "find ... -print0 | xargs -0 rm --"


To me that does not count as a duplicate, but as a usability and scalability feature.

* "find ... -delete" probably just calls unlink, just as rm does, which is a libc feature. No subshell spawned thus lean and fast.

* "find ... -exec 'rm {}'" allows to run a command without xargs, but does not allows you to filter and transform the stream. Propably will spawn a subshell for each call.

* "find ... -print0 | sed 'bar' | grep -v foo | frobznicate qux | xargs -0 rm --" is where the power lies. A few more subshells created.

This, to me, shows that 'find' actually does what it should do, and allows you to choose the best way you can do it WRT the task at hand.


and massive gui programs are?


I would had thought a much simpler and more direct example of functional overlap would have been 'more' and 'less'...


'less' is actually an improved reimplementation of 'more'.

On my Mac, for example, 'less' and 'more‘ are hardlinked:

       $ find /usr/bin -samefile /usr/bin/less
       /usr/bin/less
       /usr/bin/more
And invoking 'less' as 'more' most probably makes less drop into a compatibility mode.

Also, 'man more' brings you to the less(1) page, which contains the following snippet:

       Less is a program similar to more (1), but which allows backward  movement
       in the file as well as forward movement.  Also, less does not have to read
       the entire input file before starting, so with large input files it starts
       up  faster  than text editors like vi (1).  Less uses termcap (or terminfo
       on some systems), so it can run on a variety of terminals.  There is  even
       limited  support  for  hardcopy terminals.  (On a hardcopy terminal, lines
       which should be printed at the top of  the  screen  are  prefixed  with  a
       caret.)


> For example, think of the Microsoft Office suite.

Beneath the covers, don't the Office applications share a lot of common code?


Yes, but I would not think about it as code sharing. MSI (the installation technology used by Office and most Windows programs) supports the notion of "Shared Components," which allows applications to share common resources. A component is a discrete installation unit, and in the case of large products such as the Office suite, many components make up any one product (Word, Excel, etc.). Rather than sharing actual source code or bundling the same DLLs with each Office product, components are shared between the products. You can envision that rather than each product containing the same code used to generate tables, they all use the same piece of code when generating tables. There are a variety of advantages, of which perfect interop between pieces of software is by far the greatest.

More info on shared components: http://blogs.msdn.com/b/heaths/archive/2009/12/21/about-shar...


This (ha! Always wanted to say that; always hated those who say it...).

The Unix philosophy breaks down where it should: where you aren't trying to get something done in a few keystrokes at the command line. Even when you aren't, the Unix philosophy works pretty well servicing low-level portions of a program. For example, any spell checker is Unix-y and many programs have a spell checker.


I've heard conflicting stories about this, that the teams share code and that they're fiercely independent. I imagine there are examples of both.


Employee transaction costs are not lower than contractors. Employers typically pay vacation, sick pay, workers comp, health care, 1/2 social security plus other payroll taxes, equipment, office space, and incur certain liabilities for the employee.


All those things still have to be paid for by someone when you hire contractors. Which is probably why I always hear about contractors getting seemingly-absurd high hourly rates, if it didn't all come out about even, either we'd all be contractors or nobody would choose to be a contractor.

The transaction costs here are things like all the negotiation that has to go in to getting anything done with contractors every time you want something done.


It can break down, but it's a damn good place start, especially if you don't have a good philosophy to replace it.

Does it need stating how useful and powerful text based output can be?


50-minute talk on simplicity (simple meaning not compound) as a place to start (and a little bit on Clojure): http://clojure.blip.tv/file/4824610/


> Piping the output of a simple shell command to another shell command is easy. But as tasks become more complex, more and more work goes into preparing the output of one program to be the input of the next program. Users want to be able to do more in each program to avoid having to switch to another program to get their work done.

I see no logic here. Can anybody explain that?


We can handle objects in CSV/TSV format with commands sort, cut, join, paste, awk, perl, python, etc.

We can handle XML objects with commands xml_grep, xmlpath, xmlpatterns, perl, python, js (JavaScript has builtin support for XML/XPath), etc.

JSON format in new, but we can handle it with js (native format), perl, python. CouchDB already uses JSON over pipe to communicate with tools.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: