“Typesetting Markdown” blog post series by a technical writer

thangalin · on April 14, 2020

Author here.

Typesetting Markdown is a multi-part series that describes an ecosystem for typesetting Markdown documents using pandoc and ConTeXt. Here's a brief walk-through:

Parts 1, 2, & 3 are about the ecosystem: writing reusable and user-friendly bash scripts that autogenerate PDF files when content in the source documents change. They introduce the reader to ConTeXt, pandoc, and scripting.

Part 4 enhances the ecosystem a bit more before describing one way to apply colours and other presentation in the output document, independently from the Markdown source.

Part 5 demonstrates what I feel is missing from almost all editors, Markdown or otherwise: interpolated variables. I've posted an architecture diagram at https://i.imgur.com/8IMpAkN.png for the curious. This part describes how to externalize variables (e.g., in YAML) and reference them within Markdown documents.

Part 6 adds R into the mix, including how to generate then embed graphical output into R Markdown documents.

Part 7 uses R to generate a high-level return-on-investment document outlining how long it will take to pay off an electric vehicle versus maintaining a gas guzzler. This part also describes how to have plain TeX (not LaTeX) typeset in the final document.

Part 8 (soon!) revisits themes to demonstrate how different styles can be isolated and applied to novels. It will show ways of setting up directory structures, how to use ConTeXt "modes", and, by the end, will show a way to typeset famous (public domain) novels by applying a couple different styles.

vehemenz · on April 14, 2020

This is very cool. Thanks for posting.

How does this set of tools compare to something like Prince XML? I have put together several user manuals with it, and I've personally found it very easy to use.

sandGorgon · on April 14, 2020

did you explore alternatives like asciidoc ? just curious if that ecosystem does not solve a lot of these problems.

thangalin · on April 14, 2020

AsciiDoc is amazing. Take another look at the architecture diagram. AsciiDoc is listed under the Proposed Text Edit: https://i.imgur.com/8IMpAkN.png

Pandoc can convert between Markdown and AsciiDoc, so it really isn't an either-or decision. For much of the blog you can tell pandoc to use AsciiDoc as the source document instead of Markdown. The conversion to ConTeXt will (should?) generate the same output.

One feature of AsciiDoc that's quite appealing is cross-references. The main reason I gravitated towards Markdown was because of its simple and terse syntax.

sandGorgon · on April 15, 2020

Thanks for your reply.

Well I'm asking about retrospective. Today you have to do a whole lot to markdown submit to your toolchain. Would asciidoc not have been better ?

Is the syntax of asciidoc that much worse ? It's still human readable

thangalin · on April 15, 2020

> Today you have to do a whole lot to markdown submit to your toolchain

The brunt of transcoding from Markdown to AsciiDoc can be accomplished in a single line, something like:

    find . -type f -name '*.md' -exec pandoc -f markdown -t asciidoc {} \;

AsciiDoc is a superset of Markdown, so pandoc should be able to do a nearly flawless job at converting:

https://asciidoctor.org/docs/asciidoc-vs-markdown/

At that point it's a matter of updating a few build scripts to pass AsciiDoc files to pandoc instead of Markdown. I wouldn't say starting with AsciiDoc would have been better or worse, just different. Starting with Microsoft Word would have been worse.

j2kun · on April 14, 2020

R + pandoc + yaml + markdown + quite a few custom build scripts... it seems like LaTeX is simpler in the end.

Could you summarize "why not just use LaTeX?"

thangalin · on April 15, 2020

If you prefer to use LaTeX, use LaTeX. My preference is to write content in Markdown---which is almost as technology-agnostic and ubiquitous as plain text itself---then figure out the typesetting independently.

For what it's worth, even though the blog discusses Markdown and ConTeXt, much of it applies to Markdown and LaTeX.

By the time you reach the R section, there are only three scripts: build-template, which is reused by the other scripts, the build script, and intarsia. You could also use R Studio, as well, but then you wouldn't have the ease of interweaving interpolated variables from YAML files.

If you find LaTeX's packages and syntax to be simpler than writing (or downloading) a few bash scripts, then go for it. I certainly cannot tell anyone what they'll find simpler to use. For me, Markdown is simpler than LaTeX; I find I can focus on content far easier without having to write or look at macros. À chacun son goût!

phillc73 · on April 15, 2020

I'm really pleased to see you're using Pander[1] in your tool chain. I think this is one of the most useful literate programming tools in the R ecosystem. Although, having read Part 6 (briefly), it looks like you might only be using it for tables and plots. I think Pander's literate programming functionality, in conjunction with Brew, is worth further exploration[2].

Also, if you're using R-base plots, there's an excellent tutorial[3] and theme library[4] which can help make those plots look really excellent.

[1] http://rapporter.github.io/pander/

[2] https://github.com/Rapporter/pander/tree/master/inst/example...

[3] http://karolis.koncevicius.lt/posts/r_base_plotting_without_...

[4] https://github.com/KKPMW/basetheme

quickthrower2 · on April 15, 2020

I didn't find LaTeX simpler for creating a book with code comments. It's a whole load of stuff to learn, akin to a "Just Learn Kubernetes" level of brain load, which seems OTT when you just want to experiment and create a book.

danso · on April 14, 2020

This is an impressive and detailed writeup, so thank you for sharing not only your process but the thinking and reasoning behind it. I especially appreciate the list of tools [0] and book design resources [1]. I hope my comments/observations don't seem overly harsh:

First, a question: is there a link to example source code, e.g. a folder of Markdown+.text files, the bash scripts, and the resulting PDF?

Speaking as someone who has put a decent amount of time in looking at plain-text-publishing systems (such as Leanpub's Markdown system, Hadley Wickham's RMarkdown books [2] and u/munificent's custom build script for "Crafting Interpreters" [3]), I have an idea of my current pain points, and I want to see via-example of whether your process deals with them, before I get into your technical details.

Without such an example (and if I'm being honest, even with great examples), I can't bring myself to do more than skim your material, because the steps seem so complicated, and I can't easily discern how they improve how I currently use Pandoc to do the Markdown->PDF process. For example, I'm currently trying to use Python's Sphinx [4] – which, besides being great at creating an online reference site, can also output s compiled file that I can pass into pandoc with a pre-defined Makefile. So all the steps (again, just based on my skimming) you detail in your "Part 1: Build Script", which include writing the argument parser in Bash, seem orthogonal to the plaintext-to-PDF process?

[0] https://dave.autonoma.ca/blog/2019/05/29/typesetting-markdow...

[1] https://dave.autonoma.ca/blog/2020/04/11/interior-book-desig...

[2] https://github.com/hadley/r4ds

[3] https://news.ycombinator.com/item?id=22788738

[4] https://pedrokroger.net/using-sphinx-write-technical-books/

thangalin · on April 14, 2020

> which include writing the argument parser in Bash, seem orthogonal to the plaintext-to-PDF process?

You are on-point. For people unfamiliar with bash, the explanatory steps provide an idea of what the scripts do, the rationale for crafting them that way, and how to modify them.

Feel free to skip over the bash-bits and download all the code. A few of the posts have a download section that you can jump to (#download) to grab a copy of the completed scripts in a zip file. Once you've extracted the scripts, you can start using them and follow along with the remainder of the steps that interest you.

The scripts form part of a continuous development process. The scripts regenerate the PDF whenever a source file changes.

It sounds like from the Style section of Part 4 and onward (https://dave.autonoma.ca/blog/2019/06/23/typesetting-markdow...) you may find more helpful, possibly even starting at Part 5.

Thank you for the feedback! Drop me an email if you'd care to share your pain points at greater lengths.

mcguire · on April 14, 2020

"This series describes a way to typeset Markdown content using the powerful typesetting engine ConTeXt."

Ok, I'm hooked.

grawlinson · on April 14, 2020

This is amazingly well written and goes into enough depth without being too verbose. I like it!

rodgerd · on April 14, 2020

I find it hilarious that the author throws shade at Word for having complex interpolation at https://dave.autonoma.ca/blog/2019/07/06/typesetting-markdow... but then has to veer into which of the multiple different standards and runtimes can be used with Markdown to achieve the same outcome.

Like, it's a good overview of how to achieve these things with Markdown, but maybe let's not hang our hats on "simpler" here, huh?

thangalin · on April 14, 2020

I wrote an editor (based on the aforementioned architecture diagram) that makes the integration of interpolated variables much simpler than what the blog describes:

https://github.com/DaveJarvis/scrivenvar/

Here's the money shot:

https://raw.githubusercontent.com/DaveJarvis/scrivenvar/mast...

Variables are visible on the leftmost panel. You can insert a variable by typing in any partial value followed by pressing Ctrl+Space (two steps instead of over a dozen). The editor inserts the variable name and the preview panel updates accordingly.

That definitely isn't clear from the blog post, though.

Aside, can Word perform string interpolation on variables?

captn3m0 · on April 14, 2020

I've used a mishmash of Liquid/Jekyll and Pandoc to achieve this in the past - but this approach looks nice, thanks for sharing.

emj · on April 14, 2020

I feel this is not only about markdown vs Word, but a way to show that the problem exists and that you can solve it by using the Unix approach. So it's simple in the sense used in Toki Pona: "good or fixable".

_wjtv · on April 14, 2020

Give me asciidoctor any day. Markdown just isn't expressive enough to work without relying on a janky network of external tools.

sivakon · on April 15, 2020

Does asciidoctor support tikz diagrams?

Symbiote · on April 15, 2020

Yes, when using Asciidoctor at least.

https://github.com/asciidoctor/asciidoctor-diagram/issues/18...

gwern · on April 14, 2020

Are all of the images in https://dave.autonoma.ca/blog/2020/04/30/typesetting-markdow... supposed to be broken?

thangalin · on April 14, 2020

They are fixed, but part 8 isn't supposed to be online yet. Shh.

gwern · on April 15, 2020

Well, it was linked in the index, so I assumed it was supposed to be there. :)

rozab · on April 14, 2020

Looks like a really great resource. How snappy is the build process? I like markdown because it's lightweight (of course this isn't always possible for professional typesetting)

thangalin · on April 14, 2020

The build process depends on the complexity of the document. If you have an image-heavy document, or a lot of computed results, then going from Markdown to PDF could take several seconds or even minutes. Of course, the advantage of writing the text in Markdown is that you focus on the content and can worry the presentation later.

My Impacts Project (https://impacts.to/) takes up to 10 minutes to build because of all the high-resolution images. I can use a different theme, though, that puts in placeholders for the images, and that builds within a few seconds.

gotorazor · on April 15, 2020

@thangalin, do you think you can use the comments from HN so far and make it into an FAQ?