Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Lit – a modern literate programming tool (github.com/cdosborn)
95 points by cdosborn on Aug 16, 2014 | hide | past | favorite | 61 comments



I don't know if this counts as literate programming, but one of my favourite outputs for learning about code is Docco: http://jashkenas.github.io/docco/

Sample output: http://backbonejs.org/docs/backbone.html


Literate programming requires that there is freedom in ordering content in the literate file. Otherwise, you are restricted to the execution order of the computer. The intent of literate programming is to circumvent that requisite. Otherwise the tool is just a pretty printer of sorts.


Yep, he pretty much applied literate programming and first made it practical


I think the greatest motivation for literate programming is providing context. After five minutes of reading, you can have a very good high level understanding of a complex program. This just isn't achievable in the same amount of time otherwise.


In that vein, you should consider the ability to coordinate a project using literate programming. It is hard enough to understand a code-base in a single file, but when a project has several different directories with a variety of files in each, having a starting document that describes all this structure is extremely important. And one can write the compiler to use that starting document itself to generate that complex structure.


I love literate programming, however my main issue with most tools like this one is that they do not properly support compiler/debugger error messages. This is a huge, often unstated caveat. It is very important to be able to read error messages and quickly locate the offending line in the literate source.

For languages that support file/line pragmas like C, literate programming works very well. Alternately, if the language supports goto and can sensibly unravel these statements, one can cobble a weave/tangle script to still have a 1-1 mapping between program and source line numbers.

Otherwise, I find that I need to program in a very functional style in order to force my tangled program to have the exact same line numbers as the literate source. This is possible, but it also negates the advantages of having natural language macro names, since they are essentially equivalent to the function names. In this case, tangle becomes an identity mapping.


This is a feature I'm looking to add. It was designed to be included. When I parse the lit file, I record the line no. which will be represented in comments in the generated source file. Hopefully, this will mitigate some of the extra workflow.

The granularity will be for every macro defn, so there will be some ambiguity.


The main issue I had with LP in the past is that sometimes you want to modify the code directly, not the literate document and synchronize them. I made such a tool: https://github.com/aryx/syncweb


Neat. Do you find that you use that a lot? I have thought about it implementing it in the literate-programming tool I wrote, but so far I have found a split screen editor (vim for me) with search to be fairly effective.

On the opposite side of things, I rather enjoy mucking about with the compiled code to diagnose it, and then recompiling to get rid of the diagnostic debris.


I use syncweb all the time. I can't go back to just using noweb. Many of the subprojects in pfff have a literate document that is synchronized with the code using syncweb (e.g https://github.com/facebook/pfff/blob/master/h_visualization... ). When I have errors in my code, or when I debug my code, I do it on the generated code, not the tex document, so of course if there is a fix it's easier to fix the code directly and far later to synchronize.


My mind was almost literally blown when I found that org-mode in emacs has this. I believe you have to be putting in the noweb comments, but there is a detangle function.


Do you have a link explaining this org-mode feature?


Apologies. I don't right off. If you can't find it, let me know. I should be able to look for it tomorrow AM.


Use the inbuilt info reader.

    (info "Org")
And then look for babel


Wow nice. Found http://comments.gmane.org/gmane.emacs.orgmode/32814 but I'm not sure the workflow is as convenient as syncweb. Syncweb has a unison interface to synchronize which allows to very quickly synchronize many chunks automatically. Syncweb can automatically know in which direction things need to be synchronize. If one modify the code, then the synchro will update the org, if one modify the org, then the code will be regenerated. If one modify the org, and the code at the same place, then a conflict will be detected.


Telling from the number of recent implementations, Literate Programming enjoys a bigger following than I had expected. Here is my implementation of a similar tool in OCaml: https://github.com/lindig/lipsum. I have some projects using it on GitHub like https://github.com/lindig/ocaml-hyphenate, which implements Knuth's algorithm for hyphenation. The README is basically the literate program of the project.

Question for the author: Are the spaces around `<<`, `>>`, and `>>=` mandatory?


No they are not


From the description in the README I feared that the entire source-code of the program would appear twice in the resulting document; first under the definition of the "" macro and again wherever each code-fragment was defined. Looking at the contents of the "examples" directory, however, I can see that the "" macro works more like a table-of-contents.

That's reasonable, in a minimalist kind of way, but it's a bit unfortunate that Lit syntax winds up unmodified in the output document; I'd wind up having to put a paragraph at the top of each document explaining what Lit was and why all the "<< >>" tokens throughout the code weren't actually part of the code.

Also, the resulting HTML doesn't actually validate: http://validator.nu/?doc=https%3A%2F%2Fraw.githubusercontent...


The project is still early, and the html generation is rudimentary. Will be working soon to validate.

Not sure what you mean by appearing twice. I debated removing the "<< >>" syntax in the generated html, but I don't think it's bad. It's essentially the syntax for macros, which aid in providing context. It's useful in html, because you can quickly refer to the defn of a macro by following the anchor.


The readme says:

lit only has two valid constructs: A macro definition: << ... >>= and a macro reference: << ... >>

...from my experience with other macro systems, I assumed that a "macro reference" would be replaced with the content of the macro definition, leading the code-block to appear at the top (under the star macro) and also in the macro definition. I'm pleased to see that's not the case.

I'd rather not have the "<< >>" syntax in the resulting HTML, because it's the syntax for Lit macros... if I'm writing a document in human language A to explain concept B in programming language C, that's already a lot of context, and requiring the reader to also be familiar with literate-programming-tool D is a drawback.

Linking to macro definitions is definitely a useful feature, but I'd rather those links were distinguished with a CSS class so that I could define their location and appearance in the stylesheet, rather than giving them specific text and markup.


There should be an HTML class. In the resulting docs I still need a way to separate them from the underlying language (that could be anything). Initially I thought `<<>>` looked archaic, but its rather unique and probably the syntax which clashes with the least number of programming langs.


validator passes, thanks.


I really really want to like this. Org mode for emacs has really expanded what I could want from a literate programming tool, though. Specifically, being able to execute small segments and have the output immediately usable is rather nice. That you can also feed tables of data directly into the language is also nice.

Granted, I do feel I have a fair bit to learn in how to structure code for others to read. Myself included. I am learning the rather obvious point that keeping a narrative to the code is not easy. And, of course, in the couple of attempts I've done recently, I end with a fairly large dump of "and here is the boring stuff" at the end.[1]

Also, I can't recommend reading straight from Knuth's site heavily enough. His programs are rather interesting by modern aesthetics, but they are all still runnable.

[1] http://taeric.github.io/DancingLinks.html


My literate-programming tool, https://github.com/jostylr/literate-programming, has this feature. It really is nice to be able to pipe bits of code into various other functions for compiling (or evaling).

The notion I am working towards is actually more of a literate-project. Something that can do all the grunt tasks, such as linting and testing code, importing data, etc. and weave together bits from multiple literate documents.


Do check out org-mode, then. It is surprising how far that tool has been extended.

Many of these features are actually under the label "reproducible research" nowdays. There was a really good talk given on this at a pypy convention a couple of years ago.[1]

[1] https://www.youtube.com/watch?v=1-dUkyn_fZA


I've been playing with literate programming for a while, and I don't think having boring stuff is a bad thing. The strategy I've settled on is to try to show a cleaned up history of the evolution of a codebase -- and a key part of that is including tests in the literate program. http://akkartik.name/post/wart-layers


Sorry, my point was that I didn't even attempt to explain the boring stuff other than a "this was all incidental stuff that I needed."

Even without literate programming, this is the harder part of the program to document, to me. To really give it credit here, I would split these out into different files. But, I have pretty low interest in really delving into some of the stuff there. Hence, "and this is the support code for the figures I did."


Yeah, I think I understood that. My point -- at the risk of repeating myself -- is that it becomes less important to document low-level details in great gory narrative detail if calls to such functions are close to their implementations, and you can see tests that exercise them along with their callers. If you can do that, pervasive narration even seems to hit diminishing returns. I can understand what a simple function is and why it's needed just by seeing why its caller is needed, often-times.


Makes sense. I think your last point is why these are the harder functions to document, for me. Often they really are self evident either from their name or with their use. Trying to document them all "appropriately" often leads to more documentation than code. With neither really providing value to the other.

And, ultimately, this is why I do a "this is the boring stuff" sections. Bugs in that area should be easy and obvious to both diagnose and to fix. Bugs in the other areas are often neither.


Nice! What do people use literate programming for? I've yet to, but I'm sure I'm missing some cool benefits.


It really is a great way to introduce a narrative into the code. Too much of reading programs feels like taking a look at either just a plain convoluted mess, or an index that someone put together for another larger work. (This is especially true for pieces where someone makes a lot of single use functions. Sure, the pieces may be "self documenting," but it is almost akin to just seeing a bag of screws. You know what they do individually, but you don't know why they are there.)

If you have the time, take a look at any of Knuth's programs[1]. Obviously, they are not all immediately approachable, but render them to pdf and give them a try. You'll hopefully be surprised just how much you do pick up.

The only downside, to me, is that I become less concerned with modern trends of long variable names and abstractions that "self document." These are still great, of course, but they don't go nearly as far as knowing the narrative of why a piece of code was written.

[1] http://www-cs-faculty.stanford.edu/~uno/programs.html


I personally have been using it to document the more complex code that I write -- that is, the code that I have the most trouble explaining to others. I'm quite happy with the results for my collision detection library. [1] The library itself serves as a basic tutorial on collision detection, and I've used it myself for reference when I step away from the code for a while.

[1] https://noonat.github.io/intersect/


As I said elsewhere in this thread, I've been exploring using it as a replacement for my hack of reading git logs to better understand codebases. Git logs are immutable, whereas a literate format can provide a cleaned-up history of the evolution of a codebase. http://akkartik.name/post/wart-layers


Wouldn't literate programming encourage a very linear style of programming with long methods? It seems like it would encourage the opposite of single-responsibility-principle, since it's really hard to read a narrative that jumps all over the place.


I think the catch is the the majority of programs are more linear than the index of methods would give credit to. Sure, some folks are making utility data structures that don't really have much of a narrative to them, I suppose.

I would think for most programs there is a very direct sense of linearity in getting a task done. Sure, you will often have a lot of support methods. But seeing them just directly listed can be as much of a distraction as anything else.

Note, also, for nontrivial applications, you will likely have a collection of literate programs. Not just one giant literate program.


The narrative doesn't have to jump all over the place. If anything, defining macros allows you greater flexibility. Also, using macros allows for defining different levels of languages that encapsulate a "single-responsibility". Could you be more specific?


Literate Coffeescript comes out of the box with Coffeescript. I kind of like it, especially for complicated algorithms where the ammount of comments exceed the lines of code. Here's an example [1] of a kd-tree implementation in literate coffee. It reads beautifully on Github as well, since it's Markdown!

[1] https://github.com/axiomzen/Look-Alike/blob/master/coffee/kd...


Literate coffescript falls under one one of the pseudo literate tools, i.e. just nice comments. Not to discredit that approach, its just a compromise for most of the benefit. Many of those tools avoid the extra step of having to expand macros and such, before compilation/being interpreted. I ended up borrowing the watch feature, which aids the workflow


I have found that writing code in arbitrary order is a key feature that is missing in comment-flipping. Not worrying about the order of compilation is defended by saying we can just use functions to do that, but I often find the little stuff such as error tests, etc., are best moved out of the way and replacing them with function calls is problematic if your checking on sane input.


Erlang uses <<stuff>> to express binary strings. In other literate programming environments, e.g., org mode and noweb, I have to break up the string like

    <<"word"
    >>
...which is ugly. I've tried other workarounds like defining a macro

    -define (bs(X), <<X
    >>
and then use

    bs("word")
...which is still ugly. Any better suggestions?


My program has only the two constructs `<<>>` and `<<>>=` which you could rather easily alter. src/Processing.he and src/Parse.hs


My gut reaction is that I don't want to be creating a table of contents list of sections by hand.

I feel it's a design error to require the sections be listed at the top and named below. If I want to change a name below, then I have to remember to make the identical change above.

Unless there's something obvious I'm missing about how literate programming works, it seems like a "generate table of contents" macro would be vastly superior to having to maintain it by hand.


Its no different then defining a function and calling it later (and proceeding to change the function name in several places). There are tools that generate a TOC based on markdown headers. I'm not sure I understand. There is also no mandated order, of where macros are defined except in the << * >>= which specifies the root to eventually expand the macros into a code file.


NB: I've only read the README.md.

My first thought was to use emacs org-mode to manage these .lit files.

My second thought was to consider: do I want name.lang.org.lit or name.lang.lit.org? ... I am leaning towards the latter.


If you are using org-mode you should consider org-babel which does all you need for litrate programming and more.


I've used org-babel, but I'm still interested in Lit. The 'watch directory' feature interests me, as does the 'proper and complete' aura of Haskell.

I'd previously played around with getting org to do stuff "on commit", via git hooks, but I didn't make much progress (and given the environment I was committing from, I was sure that I was going about it backwards anyway).

Org still doesn't have a universal character escape mechanism!


Here is an updated example, because a few people were asking for a more complex example. This is the literate version of the root of the program. *Note that the markdown doesn't include links (in the macro names) like the html, it's just the most convenient to share.

https://github.com/cdosborn/lit/blob/master/examples/lit.hs....


What does Lit offer beyond the standard tools?

http://www.haskell.org/haskellwiki/Literate_programming http://www.haskell.org/haskellwiki/Haddock

In the README, the example shows a "literate" comment above a Haskell comment -- what's the point of having two kinds of comment?


I really need a more mature example to show, which will be coming soon. However, many tools exist which are just eesentially auto documentation (java docs, literate Haskell, docco). Mine is the simplest iteration of a true literate programming tool which allows for code to be restructured in the way best fit for explanation. If you look at cweb, funnel-web, noweb, mine attempts to be a modern version that still implements macro functionality in a modern context (markdown, HTML, easy to build). Literate programming is more than just comments.


Great to see more tools like this out there. I wrote https://github.com/jostylr/literate-programming which is similarly based on markdown. I use _"section name" to insert code from the named section and use markdown headings to delineate the sections.


I'm interested - but literate programming like this, as far as I understand from README is not a lot more than having good docstrings in each function.

So maybe I am missing something but sphinx / python seems as good a replacement as can be - although the easy mixing of real code and docs is nice


Let's say you broke down your code in good small functions with comments on its purpose, params etc. You also commented blocks of the code with good comments.

I know people often say LP is not just "good commenting". But, really, is above so different than LP?


This is my opinion https://news.ycombinator.com/item?id=8185142

In some cases LP won't provide a benefit over commenting. But the idea is that you step outside of the rules of the language (or more appropriately the interpreter/compiler) and write a story where the order is defined by the ease of explanation, not the order mandated by the machine.


Though in Haskell's case, the compiler leaves enough freedom in most cases anyway.


Would like to see some more complex examples. I like the philosophy behind it though.


I need to read more of Knuth on literate programming. The plan is to finish the source in literate style.


You definitely should. It is fantastic stuff. His primes example program is good to do.

You should also consider writing up some easy programming example such as FizzBuzz.

Bootstrapping your code into your own literate programming environment is a must to sit at the lit table :)


Not bad. Not sure it'd come close to replacing org-habits for me.


Litescript is literate from the start. https://github.com/luciotato/LiteScript


I like it but if you can give some more detailed examples on complex issues, it would be much better. :)


If you follow the project, I'm currently writing up the src in literate style :).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: