More shell, less egg (2011)

Animats · on Aug 19, 2015

"Literate programming" is to some extent an artifact of Stanford's intellectual property rules for faculty at the time. If a faculty member wrote a program, that was the property of the university, a work for hire. But Stanford did not insist on owning the rights to published books and papers authored by faculty employees.

"Literate programming" was thus a way to monetize software if you were on the Stanford faculty. (This was before professors started doing startups.)

blt · on Aug 19, 2015

It's hard to figure out McIlroy's motivation for responding this way. It's like Bentley asked Knuth to demonstrate long division, and McIlroy then wrote an article criticizing Knuth for not using a calculator.

Yes, Knuth could have used a calculator, but the calculator still needs a division algorithm, and `sort` probably needs a complex data structure or algorithm. (At least if it's to be fast and work on large files.)

The only thing I can imagine, is that he wanted to advertise UNIX pipes (I would too!) and saw an opportunity to get attention by criticizing someone famous.

lisper · on Aug 19, 2015

This is not exactly a fair comparison. The shell version works by using a rich library. The Pascal version presumably worked ab initio. If the source code for all the programs invoked by the shell scripts were included, Knuth's solution might look considerably better by comparison.

vessenes · on Aug 19, 2015

I think that's the full and whole point of the article: The unix command line tools can be combined by a mediocre systems engineer to do the work of Knuth in less time and with easily assessable correctness.

This is a form of wonderful magic, available only because of the discipline and vision of those who wrote the original tools.

fenomas · on Aug 19, 2015

But Bentley chose the program. He chose a task that's suited to a few lines of shell and asked Knuth to apply Literate Programming to it. Complaining that the result is overworked seems disingenuous - like asking someone to apply the MVC pattern to fizzbuzz and then asking why they used such a verbose approach to such a simple problem.

reagency · on Aug 19, 2015

OK, now tell whose technique is more likely to lead to TeX or Google Brain: Knuth's Literate Programming, or McIlroy's "everything worth doing is already a Unix built-in"?

gnaritas · on Aug 19, 2015

McIlroy's, because he'll finish; that's kind of the point, good engineers ship and reuse as much as possible. Look at all the work of DJB[1], nearly everything he does is just a bunch of well tied together shell scripts.

[1] https://en.wikipedia.org/wiki/Daniel_J._Bernstein

lisper · on Aug 19, 2015

It's possible that Pascal libraries, if they existed, could be combined in a similar fashion. It's possible that Knuth's code was in fact written in that way so that if you really compared apples and apples, the unix solution would be seem like a horrible hack because it had to use two different programming languages.

vessenes · on Aug 19, 2015

Maybe, but pascal didn't actually come with a batteries included piping concept. You'd need to work one up. Well, you could abuse the GNU pipes library and get your piping that way maybe by running it through the OS, but that would definitely not be how Knuth rolls.

As a side note, I think the shell script version would have been multicore friendly, at least up to six cores. I am sure Knuth's was single threaded.

Finally, to your point on horrible hacks, maybe Knuth's version dealt with a whole bunch of edge cases in each, like for instance dealing with dos/unix/mac endlines. But, probably not -- it was a toy program written as evangelism and out of courtesy. sort and uniq et al embed many programmer lifetimes of experience and bug fixing at this point. Getting all of that nearly for free is really, really great.

lisper · on Aug 19, 2015

> pascal didn't actually come with a batteries included piping concept

No, it has function calls instead.

> I think the shell script version would have been multicore friendly

Not unless 'sort' is implemented very cleverly.

> maybe Knuth's version

We'll never know unless someone manages to dredge up the source code.

But the specifics on Knuth's version are not really the point. Someone else might have been able to do better. And someone using a less brain damaged language that Pascal might have been able to do even better. In fact, here it is in four lines of Common Lisp (using Ergolib - https://github.com/rongarret/ergolib):

    (defun histogram (path)
      (bb l (split (file-contents path) t :test (fn (a b) (whitespacep b)))
          l (sort 'string< (mapcar 'string-upcase (remove "" l)))
          (for item in (remove-duplicates l) collect (list item (count item l)))))

Writing a pure CL version is left as an exercise. My guess is it would be 10-20 LOC.

pascal_cuoq · on Aug 19, 2015

> As a side note, I think the shell script version would have been multicore friendly, at least up to six cores.

That's not how the sort command, in lines 3 and 5, works.

regularfry · on Aug 19, 2015

It is how pipelines work, though. 6 commands, 6 processes all started at once. They're each allowed a core to themselves, if the kernel deems it so.

lisper · on Aug 19, 2015

That's true, but it doesn't really count if only one of the threads is eligible to run at any one time and all the others are blocked.

davidshepherd7 · on Aug 19, 2015

But `sort` has to read all the input before writing any output (I think...). So you get some parallelism but not as much as you might hope.

vinceguidry · on Aug 19, 2015

That argument only works if you don't see Pascal as an already-heavily developed language. Pascal itself also evolves. Perhaps you could run Knuth's code on modern Delphi, perhaps it would break. The proof is in the pudding, as they say, and the shell version is much more likely to be used by anyone solving that particular problem.

reagency · on Aug 19, 2015

The shell version completely fails on Unicode or even Latin ASCII.

philh · on Aug 19, 2015

So does Knuth's, which embeds a case conversion table directly in the code. http://onesixtythree.com/literate/literate2.pdf

The shell script is easy enough to approximately fix:

    tr -cs '[:alpha:]' '\n' |
    tr '[:upper:]' '[:lower:]' |
    sort |
    uniq -c |
    sort -rn |
    sed ${1}q

(In my tests, it doesn't turn ẞ into ß, but it does turn É into é. "straße" is recognized as all one word, but not as the same word as "strasse" - but it's not clear whether it should be or not. I'm not sure how it'll handle normalization, if the same character is represented in two different ways.)

fenomas · on Aug 19, 2015

How to eviscerate a programming technique:

1. Think of a task that can be done in six lines of shell

2. Ask person pushing a new programming technique to apply their idea to said task

3. Reply that it would have been better to just use six lines of shell

Call me cynical but it strikes me that Bentley's (edit: McIlroy's, see replies) reply could easily have been written without seeing Knuth's code. I don't know anything whatsoever about Literate Programming, but did Knuth mean it to be applied to such simple tasks?

jewel · on Aug 19, 2015

I think you're misunderstanding what happened, there are three people in the story. Bentley asked Knuth to write a literate program to solve the problem. He then asked McIlroy to critique the solution. McIlroy was the one that provided the shell script code.

It's impossible to know if Bentley intentionally picked the problem that would be easy to solve in shell script, but I don't think it's likely.

A lot of text processing problems are solved really well by the standard set of unix tools, I write similar scripts probably about once a month to extract counts out of log files.

fenomas · on Aug 19, 2015

Oops, you're right - I joined them into one person. With that said, I didn't really mean to suggest the exercise was set up to snipe at Knuth, just that the criticism (McIlroy's) seems off the mark.

I mean, one presumes that when Knuth wants to count the words in a text file he uses shell scripts, right? It would follow that he's only applying Literate Programming to a toy problem for purposes of illustration. As such, saying he's wrong to use LP for this task seems to miss the point (and doesn't say much about LP).

KC8ZKF · on Aug 19, 2015

I agree with you as long as we are talking about this blog post and not about Knuth's article or McIlroy's critique, which I don't have access to.

Knuth republished both, so they must have some things to say about Literate Programming.

FeepingCreature · on Aug 19, 2015

4. Improve the programming language until it is competitive with the shell.

mwsherman · on Aug 19, 2015

The key bit of why we do things like this: “useful for testing the value of the answers and for smoking out follow-on questions”.

That statement is not about Knuth’s presumably beautiful work. It’s that the least deterministic (and most important) part of programming is discerning the value of doing the thing at all.

sthatipamala · on Aug 19, 2015

It's pretty rare in my day-to-day work to deal with flat, string-delimited text files.

This advice would be more actionable for me if there were more shell tools that operated on data structures (e.g. objects represented as JSON).

conradev · on Aug 19, 2015

Something like this? https://stedolan.github.io/jq/

sthatipamala · on Aug 19, 2015

Yeah, I see JQ as the grep/sed of JSON. If there were a curl-analogue that constructed HTTP requests from its output, it would take me a lot of the way there.

But more generally, I'm dealing all the time with a hodgepodge of json, csv, yaml, POM files, git logs, whatever. We need more tools that understand the semantics of data.

atsaloli · on Aug 19, 2015

Or this: https://github.com/trentm/json (npm install -g json).

im3w1l · on Aug 19, 2015

Knuth’s code, and the criticism of it: http://onesixtythree.com/literate/literate2.pdf