Laying Out a Print Book with CSS

cubefox · on March 21, 2023

The author is actually underselling CSS for rendering PDF layouts. Of course, if you just do a static book, you are probably better off with software like InDesign. But if you need PDFs with dynamic and complex content (e.g. info documents with graphs, tables, dynamic text) then HTML/CSS is actually pretty great.

As he mentions, unfortunately the CSS3 Paged Media spec is not very well-supported by major browsers, though other tools do implement it.

I actually started to love CSS much more when I had to use another code-based solution: LaTeX. LaTeX is great for equations and automatic bibliography, but not for much else. If you ever tried to merely style a minimally complex table in LaTeX, you know what I mean. You actually see in major research papers that people sometimes just give up and include a screenshot of their table. LaTeX has no inherent distinction between structure and design, like HTML+CSS, which generates incredibly inelegant bloat code, like writing and styling a website in pure HTML without any CSS. Moreover, there often is no consistent syntax in LaTeX, and most everything can only be achieved with some sort of hackish package someone put together years ago without proper documentation. HTML and CSS are a breeze in comparison. CSS features like Flexbox or Grid let you do complex dynamic layouts in a few minutes, which are so difficult in LaTeX that most users would just throw the towel.

Upvoter33 · on March 21, 2023

The problem with this: LaTeX (or actually TeX) typesetting is way better. It is hard to get all the spacing, kerning, hyphenation, etc., right so that the text looks good. There is no way a CSS-based solution comes close.

WillAdams · on March 21, 2023

Moreover, one needs controls for:

- widows/orphans

- running paragraphs one or more line longer or shorter

- setting a page spread to be one (or possibly two) lines deeper or shallower

- adjusting the size/cropping of an image/figure/table so as to influence pagination

- adjusting the placement of an image/figure/table so as to influence pagination

In my decades of doing typesetting, I've had exactly _one_ chapter come out perfectly on import w/ no need to adjust _anything_ --- fastest 40 minutes of my life.

luckystarr · on March 21, 2023

TeX not only handles hyphenation, but also decides on where in a certain word it should hyphenate to optimize the current paragraph as a whole, balancing the hyphenation position in the word (some positions are better than others) against the aesthetics of the paragraph.

smulloni · on March 21, 2023

I've formatted ~100 non-scientific books with LaTeX, including poetry and experimental/avant-garde fiction that makes considerable layout demands. I have yet to run into a problem that could not be solved. It isn't always the best tool for the job — for layouts with lots of images that need to be precisely placed with text, something like Affinity Publisher or InDesign would be a shorter path to the right result — but it's an amazing and remarkably painless system. If you are producing "incredibly inelegant bloat code" with it, I'd hazard that something other than LaTeX itself is responsible.

starkparker · on March 21, 2023

I mean, good for you and congrats, so you must be right. Something other than LaTeX must be responsible for so many non-academics and non-technical people bouncing so hard off of it for almost any other tool. It can't possibly be anything related to LaTeX because if you're so good at it, anyone can be.

ericwood · on March 21, 2023

The table thing is so true. I was insistent on using LaTeX in school and on lab reports I actually ended up building a web-based tool that would convert xlsx files into LaTeX tables I could easily embed with my data. It was pretty janky but worked well enough to save me some time. I haven't much if anything to it since about 2011 and it still sees a decent amount of traffic which tells me there's still not great solutions to the problem!

thesuitonym · on March 21, 2023

CSS definitely has some (a lot of?) benefits over LaTeX in terms of styling, but what I like about LaTeX is that, when you don't care about the specifics, it's insanely easy to get a good looking document. About half of this article is trying to finagle CSS into producing styling that LaTeX would produce automatically, by default.

cubefox · on March 21, 2023

That's only the case if you have very simple documents though, like the novel in the blog post. And those aren't usually dynamic anyway. Any more complex layouts, charts, flexible boxes are a major problem for LaTeX. Even seemingly simple tables can be a pain.

ognarb · on March 21, 2023

For those interested in this topic, you might want to checked paged.js https://pagedjs.org/documentation/

It's an polyfil for the unimplemented CSS paged specification, which is exactly that is needed for laying out pages. https://www.w3.org/TR/css-page-3/

bayesian_horse · on March 21, 2023

At the moment, the combo of pagedjs with a headless browser is my first recommendation when you need something to output pdf automatically.

Weasyprint looks nice, I have yet to use it. It probably has the advantage of not using an actual browser document (with all the relayouting problems), and may use fewer resources in terms of time and memory (I'd have to check though). The disadvantage is also it's not an actual browser document, meaning you need to render the html and css in some form before passing it to weasyprint.

Using a headless browser makes it easier in general to do some fancy stuff like music sheets or embed js chart libraries and so on.

yojo · on March 21, 2023

> meaning you need to render the html and css in some form before passing it to weasyprint

I'm not sure I understand this part. WeasyPrint is a rendering engine; it takes an HTML file as input and outputs a PDF.

I can't speak to resource usage, since this was very ad-hoc. For my needs, it was very simple to set up, and worked on the first try. I was testing with about five pages of content, and on my M1 MBP it rendered out the PDF pretty close to instantaneously, which was nice for the edit/refresh cycle. Preview on MacOS actually live reloaded when the PDF changed, which was a pleasant surprise.

bayesian_horse · on March 21, 2023

With "render" I meant you need to first create the html and css before handing it over to weasyprint. You can't use browser-based chart libraries unless you convert their output to svg or other image formats first.

With a headless browser you can use for example react to generate the document, you can just insert any chart library and so on.

And regarding resource usage I had scenarios of "pdf as a service" in mind, where you need to generate pdfs dynamically. Reporting, invoices, what have you.

yojo · on March 21, 2023

Ah, gotcha. I was focused on my use case of static text.

If your charting library runs server-side and outputs HTML/CSS, then it'd work fine, but most of that stuff requires javascript, and yeah, WeasyPrint doesn't have an interpreter. Once you're already paying the overhead of a browser, you're probably better off using a JS polyfill than piping through another tool, unless paged.js is an insane resource hog (which seems unlikely).

Semaphor · on March 21, 2023

And print-css.rocks [0] for a comparison of tools, with support matrices, and tutorials. We use Weasyprint [1] to create PDF ebooks with HTML and CSS.

[0]: https://www.print-css.rocks/

[1]: https://doc.courtbouillon.org/weasyprint/stable/

yojo · on March 21, 2023

Thanks for pointing this out. I came across paged.js but for some reason thought it wasn't implementing the margin boxes – I think I just searched their documentation page for @top-center and didn't find it. Looking again, I see that they definitely do.

Something to check out next time I do this.

batmaniam · on March 21, 2023

If I wanted to write a book, how does this translate to the publisher's standards when it comes time to hand in the design? Do publishers accept css files? How does that whole publishing pipeline work with this css framework?

ics · on March 21, 2023

That would depend very much on your publisher... that could be anywhere from "use this page size and send a PDF" to "manuscripts only and our designer/typesetter/etc. will work on the rest". As someone who has used CSS in the way described here, I think it's beneficial even if you end up having to redo it in InDesign. It may not be a perfect 1:1 but simply coming up with the rules will make setting up styles and master pages in publishing software much more straightforward.

yojo · on March 21, 2023

If you're self publishing (like I was), your publisher is probably Amazon, and they take a PDF, which is the output of weasyprint.

I haven't looked at wider distribution, but I believe most of the print on demand publishers accept a PDF file. I think some (all?) also take InDesign, which is Adobe's thing.

vivegi · on March 21, 2023

Publishers (traditional ones like Random House, HarperCollins) would take a Microsoft Word document manuscript. Each publisher has guidelines for their manuscript.

They will handle the typesetting themselves.

You need to do it all only if you are self-publishing.

brettermeier · on March 21, 2023

I thought the publisher gets a PDF.

yawnxyz · on March 21, 2023

I used PagedJS to print many abstract books for conferences — it's worked brilliantly!

DeinFreund · on March 21, 2023

This article completely misses the point. Proper typesetting is about achieving a consistent appearance starting from ligatures and kerning, then word spacing and hyphenation, up to entire paragraphs and pages. Each character has to be typeset considering what character came before, how many words are in the same line and other lines in the same paragraph. The choice of font is also questionable, but not the main issue except for capitalized sections.

This example from the article says it all: https://iangmcdowell.com/blog/book_breaks.png

I understand that LaTeX is convoluted and other software is expensive, but when offering an alternative, there should at least be a comparison based on the quality of typesetting.

mprovost · on March 21, 2023

I chose to use InDesign to typeset my self-published book, and I have to say the quality it produces from just placing some imported text with no other changes looks 1000 times better than this. It's easy to go down a rabbit hole and start really tweaking the layout but if all you're doing is applying some CSS anyway then you can produce better output with InDesign's defaults in about 5 minutes. I've been writing the book in Google Docs and printing chapters for review as I go, and it's nice to see a physical copy of your work. But I was blown away when I made my first PDF copy with InDesign how much better the typesetting is. It really does feel like a proper book and not an essay that I printed out for a school assignment.

rhn_mk1 · on March 21, 2023

I think the article stated the point clearly: the author wanted to hold some paper carrying the HTML contents. From what I can see, they hit the point completely.

Perfect is the enemy of the good, they say.

thaumasiotes · on March 21, 2023

> Proper typesetting is about achieving a consistent appearance starting from ligatures and kerning, then word spacing and hyphenation, up to entire paragraphs and pages.

I looked into CSS hyphenation because I was curious about controlling hyphenation in ebooks. CSS hyphenation appears to be based around the principle that you manually insert HTML entities representing potential hyphen locations into every word, which is obviously insane.

    hy&shy;phen&shy;ation
    hy&shy;phen&shy;ation
    hy&shy;phen&shy;ation
    hy&shy;phen&shy;ation

The TeX way is to provide a set of hyphenation points once, such that, if a word needs to be hyphenated, that list can be consulted and the word can be hyphenated appropriately without each instance of it needing to be entered with several invisible HTML entities.

    \hyphenation{hy-phen-ation}
    hyphenation
    hyphenation
    hyphenation
    hyphenation

I can't think of any use case where I'd prefer what is apparently the only way to do it in CSS. Do epubs have a bespoke non-CSS way to handle this?

(I looked into this in the first place because the Kindle mobile app is godawful at hyphenating words, and I wanted to know what it would take to write an ebook that didn't have obvious howlers. The errors occur in foreign words, but they violate the rules of hyphenating English -- I've seen the app place a hyphen before the first vowel in a word, and I've seen it render "Qis-han", which is exactly equivalent to a hypothetical English "fis-hing" -- so it seems like they must be using a lookup table for all English words and falling back on something completely ridiculous for words that aren't in the table.)

winety · on March 21, 2023

It’s crazy, and that’s why hyphenation doesn’t really work that way. Both TeX and web browsers use Liang’s algorithm to split words. [1] It uses so-called patterns, which are short substrings of words in which numbers indicate how to divide the word. For example, the pattern “s1h” indicates that in the word “fishing”, a divider can be inserted between “s” and “h”. Patterns compete and can override each other, and the whole thing is quite complicated. As for your example with Qishan — the “s-h” probably overrides the “i-s” pattern. (There have been a number of articles in TeX journals that explain the algorithm, such as [2].)

In CSS, automatic hyphenation must be explicitly turned on, see [3].

In TeX and in CSS, hyphenation points can be marked explicitly: in TeX with the \- macro and in CSS with the  or U+00AD character. In TeX you can also override the automatic division with \hyphenation{}.

The splitting algorithm in CSS is worse than the one in TeX, because it has to work in real time and because (good) splitting patterns are often missing.

[1]: https://www.tug.org/docs/liang/

[2]: https://www.fi.muni.cz/usr/sojka/papers/euro01.pdf

[3]: https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens

thaumasiotes · on March 22, 2023

It seems very clear that Amazon's default approach is to insert hyphens based on a whitelist of correct hyphenation points.

And that is what the algorithm you refer to does! Your links [1] and [2] speak specifically in terms of the patterns being a form of data compression that is applied to lighten the storage requirements of a big list of correct hyphenation points. The hyphenation algorithm is just that you check the word you want to hyphenate against the Master List Of All Words and learn where hyphenation is allowed. The patterns are a form of data preprocessing that makes that algorithm more efficient (here, in terms of space requirements) without changing the output.

So what we need is a way to extend the set of precomputed rules whenever we want to use a word that wasn't in the original dictionary. As noted, TeX provides this with the \hyphenation{} command. Why is this not available in CSS?

Suppose I want to write an ebook that doesn't make mistakes on the level of "fis-hing" and "f-orest". [Another example I'm not making up; the Kindle app is convinced that "Ts-inghua" is correct hyphenation.] How do I include the hyphenation information in my document?

Karellen · on March 21, 2023

> The splitting algorithm in CSS is worse than the one in TeX, because it has to work in real time and because (good) splitting patterns are often missing.

Surely that's only the case for real-time renderers like web browsers.

If you're creating a layout engine for printed media that uses CSS as the way for authors/setters to specify style, couldn't it implement a better, slower splitting algorithm? Using an internal (or pluggable?) dictionary of hyphenations?

winety · on March 21, 2023

You could and that's basically what TeX does, just without the CSS. There are even typesetting systems similar to (La)TeX, that can take XML as input, see Context [1] or Sile [2]. They’re just a step away from using HTML + CSS. Why isn’t there such system? I do not know.

[1]: https://wiki.contextgarden.net/XML

[2]: https://sile-typesetter.org/

eat_veggies · on March 21, 2023

You could probably write some JS to reimplement the better algorithm and insert the  hyphenation hints

edent · on March 21, 2023

> This example from the article says it all: https://iangmcdowell.com/blog/book_breaks.png

What does it say? I read lots of book and... it looks like every other book. What am I missing?

mikeryan · on March 21, 2023

So properly justifying text for a print layout is hard. For example on the right page, the sentence that starts with “Marcos wondered”.

You start to see how in the first line it’s fairly widely spaced but the second line starts to look like the line was crammed in to fit the line.

It’s subtle but those are the kinds of things that proper print layout tools help with. It’s weird looking at one page in a vacuum you won’t notice as much but after a few pages it starts getting “harder” to read. I’ve noticed this with some E-books and started to pick up on it.

Izkata · on March 21, 2023

Those are just like (almost) all books. "Proper" typesetting like LaTeX does seems to only be a thing in a very small subset.

lelanthran · on March 21, 2023

See the RHS of the page, see the column of space after "detected gunfire" and "counterinsurgents".

In fact, the spaces are all over the place.

Brajeshwar · on March 21, 2023

I love reading printed books and blog posts. For those interested, a starting point for printable CSS is Gutenberg[1]. This has been my go-to for the print media in CSS. I believe I stopped looking for another one with my lack of deeper involvement in development of styles.

I'm also guilty of picking Baskerville as the default classic look to printable font-family and even had resorted to it for screen.

Besides the tools mentioned in the article (Vellum and Atticus), Ulysses also has a good option for print-ready output.

1. https://github.com/BafS/Gutenberg

cardamomo · on March 21, 2023

As a former freelance book compositor, I can say there are typically many more details that go into laying out a book. To take this layout up a notch, I would want to create a baseline grid (set the line heights and paragraph margins to align text of different sizes). I would also want to control widows and orphans, which is built into CSS.

yojo · on March 21, 2023

Good point. My goal was definitely just getting to an output that satisfied me as a reader, but there are certainly people out there with more refined aesthetic sensibilities that are going to spot the places I fell down.

The part that seemed hardest to get just right with CSS was text justification. I can specify inter-word justification, but there are cases where getting the spacing just right is impossible without breaking a long word using a hyphen. As far as I can tell, there's no programmatic way to do this with just CSS.

I addressed widows and orphans in my ePub layout, but for some reason forgot it in the print layout (d'oh). The nice thing about print on demand is it's easy to make updates, which I will. So thanks for pointing that one out! I'll also read up on using a baseline grid, I suspect CSS will work fine for that, since my content is all text with little variation between sizes.

cardamomo · on March 22, 2023

I also suspect you'll find a simple CSS solution to establishing a baseline grid! Good luck!

snoopen · on March 21, 2023

Does CSS support baseline grids? Since pagination isn't really a thing in CSS in guessing not?

stasmo · on March 21, 2023

Pagination really is a thing in CSS, it’s just a part of the spec that’s not well supported by browsers right now. Use pagedjs to polyfill the missing spec features and you can control how content is split up between pages, add content to page margins, add blank pages and solve all sorts of print-related problems with CSS only.

cardamomo · on March 21, 2023

Not to my knowledge. Here's a write-up from 2012 that still seems to be accurate today. https://www.smashingmagazine.com/2012/12/css-baseline-the-go...

ritzaco · on March 21, 2023

I printed a book recently (https://makejsgames.com/) as a test and was surprised how straight-forward it was using LeanPub for the layout and Lulu for the printing.

Leanpub accepts standard Markdown (though does have its own markdown format now) and gives you an "export for printing" button that you can upload to Lulu. The cover was kind of tricky and I got some professional design help both to design it and to get the margins and bleeds exactly as Lulu wanted them, but the result was way better than I expected.

I haven't tried the other professional tools mentioned in this article so I can't really compare quality or ease, but might be worth a look if anyone is looking for something in between "do it all yourself" and "pay for fancy tools". (I have a lifetime LeanPub license from back when they still offered them but the monthly subscription fees are a lot lower than some of the tools mentioned here and you can easily get a printing done in a single month).

thunderbong · on March 21, 2023

Awesome writeup! I enjoyed the tongue-in-cheek style along with the technical stuff! And CSS is like he mentions - "One he'll of a hammer"

Also, towards the end, this gave me a chuckle!

> It’s like Linux: free so long as your time has no value.

noufalibrahim · on March 21, 2023

Yeah. I was going to echo this sentiment but it looks like you pre emptively followed me.

The article has the air of a hack done just for the fun of it because the author can. This was popular once upon a time but these days, it's mostly stuff written to promote certain things or as marketing copy.

TylerE · on March 21, 2023

This is marketing, don’t kid yourself.

yojo · on March 21, 2023

Author here: I'd like to think it's a little bit of both, but mostly "here's a silly hack." I tried to separate the marketing from the content as cleanly and honestly as possible.

I'll give you a peek inside my head, and you can decide if you trust me to be telling the truth.

I did this silly hack because I wanted to hold a copy of my book. I knew it was crazy when I was doing it, and that there were simpler/better ways of accomplishing the same goal. But I was surprised that (1) it worked at all, and (2) it came out as well as it did – though you can certainly find people in the comments here who can identify the flaws.

I did not do this silly hack because I wanted to use it to write a marketing post. That would be an extraordinarily convoluted way of going about things, even for me.

Anyhow, I like writing, and I like sharing my writing, so I put together this post about something I did that I found interesting. I don't expect everyone to enjoy it, but it seems like some people found it entertaining and/or learned a little bit about CSS.

And yes, I included two links to my book. Given that the existence of the book is central to my motivation to pursue the project, it would be silly not to at least mention it. So I did, one link at the top where I think it's relevant context, and an actual "pitch" at the bottom, separated from the other content/called out explicitly so anyone allergic to marketing could bail. I'm not being snarky; I personally dislike marketing and hate most of the ways it's done.

And even still, I do feel conflicted. I grew up in the 90s, and I miss that old, pre-commerce internet populated by hobbyists. Maybe I should have left the pitch out.

TylerE · on March 21, 2023

I'm not condemning your actions, just pointing out naivety. Some of those most influenced by marketing are those who think they are above it all.

noufalibrahim · on March 22, 2023

Yeah. I've been on the net long enough to develop a sixth sense for when a post or article is just a lead in to a book. Either the OP did a good job hiding that intention if it was there or if no such intention was there, all the better.

My evaluation was based on my own subjective feeling so it could be wrong but I still don't feel any different even after reading the counters to my initial post.

bryanrasmussen · on March 21, 2023

yeah, it seems a bit weird to be marketing your techno thriller book with a post about css. I mean I guess there is some overlap between the two subjects, as there is likely be overlap between anything in a world with complicated people.

TylerE · on March 21, 2023

It. Is. Marketing. The very first sentence is just a link to Amazon to buy the thing.

noufalibrahim · on March 21, 2023

Yup. Well written enough for me to forget that part. Just enjoyed the post .

codetrotter · on March 21, 2023

The question then is, is it effective marketing? How many sales resulted from this blog post? Zero? A handful? Tens? Hundreds?

I would guess a handful at most.

And I am not saying that to criticise the author. Marketing is genuinely difficult. Hell I even did a marketing campaign with paid ads on Reddit to get more people to use one of my pieces of open source, free of charge, pieces of software and even with a bunch of ad impressions and clicks I can still count the amount of new stars on just a few hands

yojo · on March 21, 2023

I'll answer the question: a handful at most

7 ebooks ordered as of 10:15 PST, and 10 subscribers for the free novella.

I have no idea how many people read the post, I assume lots? I'm hosting on Github pages and haven't added any tracking. It seems like anecdotally a front-page post gets ~20k views, so some tiny fraction of a percent of readers went on to purchase.

Honestly, this small number of sales is still far more than I was expecting. I wrote a longer response on my motivations higher in the thread, but my primary impulse was sharing, not marketing. I tacked a marketing pitch at the end, figuring I might as well, but I did not expect the post to spend any time at all on the front page of HN.

It is also probably true that this "marketing" was worse than ineffective, it was likely detrimental. I'm guessing that substantially more people clicked through to Amazon to see what the book was, despite being a poorly targeted audience that is unlikely to convert. From Amazon's point of view, the book just got a ton of traffic and very few sales, which is probably treated as the signal "this book is not a good seller, don't show it to people."

Similarly, the members of this poorly targeted audience that did buy the book are unlikely to be typical readers in the genre. This will degrade the "also bought" signal for the book, and to the extent that Amazon does organically show my book to other readers, it will likely show it to the wrong readers, further hurting organic performance.

bryanrasmussen · on March 21, 2023

>Similarly, the members of this poorly targeted audience that did buy the book are unlikely to be typical readers in the genre.

I would assume they were typical readers in the genre, which obviously some percentage of readers from HN would have to be.

A post of mine on criticism https://news.ycombinator.com/item?id=32392070 went to front page for a bit and got 4.7K hits according to medium.

I would think if Amazon has any sort of decent traffic analysis ranking it would have the understanding of unknown/random spikes and discount negative data from that as noise (unless it led to a bunch of negative reviews etc. in that spike) but maybe I'm just too hopeful about stuff.

Wish I could sell some books though but as I don't ever it seems unlikely I know what I'm talking about anyway. :)

yojo · on March 21, 2023

It's possible the only people who bought were technothriller enthusiasts, but I think it's likely that at least some of them are not.

Traffic spikes are actually a common occurrence in book sales. My understanding is that some of the most effective marketing is through paid newsletter inclusion – sites like BookBub/Freebooksy/Fussy Librarian. These newsletters absolutely drive big spikes in traffic, and the conversion rate is going to correlate well with how well the book will sell overall. It's possible Amazon does something with the referrer to try to segment these kinds of traffic, but impossible to know from the outside.

If you just want to move copies, you should look at the Facebook group 20BooksTo50k. It has lots of informative posts by self-published authors doing six figures in annual sales. I can distill it down for you though. The people finding "quit your job" levels of success generally:

- write to market

- in a consistent genre

- for several years

- and publish five or more books per year, mostly in a series.

Some people reach that level of success faster, though they tend to be in the largest genres (mainly romance), or publishing at truly breakneck speed (a book or more per month).

Personally, I write things that don't slot quite cleanly into a genre, and I have a tendency to genre hop. I know it's sub-optimal, but I'm pretty sure I'd just burn out trying to do it the other way.

I'm also not yet at the point where I can write work I'm proud of at that velocity; the last book I wrote took me two months to get a first draft, and it's probably going to take another two months to get it to a "finished" state. So I'm on a "three books a year" pace, and it already feels exhausting/I may need to slow down.

dkjaudyeqooe · on March 21, 2023

Marketing that entertains (and informs) is perfectly ok.

TylerE · on March 21, 2023

I didn’t say it wasn’t. Just naive to pretend it is intended as purely informational.

SigmundurM · on March 21, 2023

Fun fact: EPUBs, the global standard for digital books, is based on just HTML, CSS, and JS

https://www.w3.org/publishing/epub3/epub-contentdocs.html

blackoil · on March 21, 2023

That is meant for e-books and not printable books.

SigmundurM · on March 21, 2023

Yup. It would be interesting to explore using Fixed Layout EPUBs to print physical books.

WillAdams · on March 22, 2023

Yes, and they get a bye because they are constantly re-flowed --- no one would gladly accept that in print.

nayuki · on March 21, 2023

Excellent work. This is good time to remind web front-end developers that you don't need to click a link to bring up a "printable" version of a page. You can make every HTML page print-compatible through stylesheet rules wrapped in @media print { ... } or <link rel="stylesheet" href="my.css" media="print">.

ThePhysicist · on March 21, 2023

Weasyprint has gotten really good, I use it to generate PDFs for invoices but you can easily use it to write multi-page content as well. PrinceXML was still a bit better last I checked, but it's a closed-source, commercial solution.

nforgerit · on March 21, 2023

Another method I haven't seen mentioned yet and that I grew fond of is to use Apache FOP ("Formatting Object Processor") which is still under active development and comes with this beautifully retro Apache open source homepages[0]:

(1) It has a pretty good idea about Layouts, Pages and other elements one would need for printing

(2) It supported all the graphics formats that I ever needed in the past and automatically fetches images by URL when it stumbles upon them.

(3) It is easily embeddable into some custom Java app.

(4) It is based on W3C open standard (which though doesn't seem to be developed anymore)

(5) It does PDF/A.

(6) XML/XSLT is a bit annoying to write for your template, so I used a common templating engine (e.g. Thymeleaf) to render the XML/XSLT that I passed on to FOP. Worked like a breeze and gives you the possibility to create your own little DSL.

(7) I'm still happily generating my invoices through FOP put behind an API that takes input data and uploads the generated documents into a bucket.

(8) I don't have the exact numbers but last time I checked and compared output sizes of browser printed pdfs vs "generated by Fop", I saw smaller PDFs with Fop. It doesn't mangle the input images though

All in all, I trust FOP better to generate printable documents than any CSS/Headless Browser thing. It comes with an arcane template format which, as said, can be abstracted away but therefore has much smaller infrastructure footprint than spawning headless browsers.

[0] https://xmlgraphics.apache.org/

pbhjpbhj · on March 21, 2023

Why not Scribus?

It's FOSS, mature, and actively used for book layouts by others. It comes up in searches for "free InDesign alternative" and "foss book layout" (it's mentioned in every search result that I viewed [not many]).

I mean, perhaps it didn't fit the story that showed the blog post to be used to trail the book, but I'd have thought it would be mentioned.

Moru · on March 21, 2023

He also states the Hammer and Nail situation. If your only tool is a hammer, every problem becomes a nail. I get the impression he wanted to use something he knew instead of spending time learning something new.

colonwqbang · on March 21, 2023

Do you have experience with it yourself? I haven't tried it for several years, but it was not at all usable last time I did.

For a project like this, I think it's a good choice to go with something more declarative over a WYSIWYG tool. Latex would have been a very good choice, but only if you already happen to know it or have an interest in learning.

pbhjpbhj · on March 21, 2023

Yes, I've used it -- along with Inkscape and The GIMP -- for advertising [layout of flyers, for print magazine adverts].

Probably about 3 years since I used it last for this type of work.

I follow one of their support groups and people mention they publish newsletters, pamphlets, and books with it.

benoliver999 · on March 21, 2023

I use it for graphics heavy stuff like product packaging, instruction manuals etc.

For a book I'd just throw it in latex for sure.

eynsham · on March 23, 2023

1.5 onwards I think has been entirely usable; before that I found it bizarrely unstable.

pauljonas · on March 21, 2023

A few years back, I wrote a web based publishing tool that used `wkhtmltopdf` to generate PDF from database content along with web images (& even more complicated with API access & Google Maps API). It took a lot of tinkering but it worked. At times, extra fiddling needed to generate accurately the Table of Contents (basic, uses `H1-6` elements in document) with proper page references and formatting. But it worked, even with allotting for stuff like "bleed" (it was being printed a massive color printer we leased that could bind or staple.

Hackbraten · on March 21, 2023

Thank you for wkhtmltopdf. If I recall correctly, Pandoc uses it under the hood for generating PDF?

jopsen · on March 21, 2023

If the objective was not spend on design software, why not use latex?

TylerE · on March 21, 2023

Because conTeXt exists and is really much better at that sort of thing, having full native support for things like color, figures that GO WHERE YOU PUT (glares at latex floats). It can all kinda of fancy sidebars, infoboxes, all that InDesigny sort of stuff.

You can tell it was built from the ground up for books, and not scholarly publications. In that respect it’s actually closer to plain TeX.

It doesn’t try to be overly semantic because books are complex beasts, and there really is no useful-but-wrong simple model that doesn’t handicap you.

Like, not all books have chapters. Some have hundred of very short chapters, with multiple on a single page.

It’s also much more “batteries included”.

Basically; trying to do a book in LaTeX you end up fighting the defaults, and for some there are NO good reliable overrides.

Take a look at the manual (http://www.pragma-ade.nl/general/manuals/ma-cb-en.pdf) page 83 (numbered) 87 in the pdf. Imagine trying To the that in LaTeX.

cbolton · on March 21, 2023

You probably know that but for clarity's sake: OP probably meant why not LaTeX instead of the CSS+Chrome solution used by the author of the blog post.

I agree that ConTeXt would be a better fit than LaTeX.

yojo · on March 21, 2023

The objective was to get a print book while minimizing time and money spent. I happen to have a lot of experience with CSS, which is why I reached for the tool. I've never used latex, but the impression I have of it is that there's a big learning curve and it would take me many hours to produce output I was satisfied with. OTOH, it would add another useful skill in my toolbox.

sondr3 · on March 21, 2023

Wouldn't surprise me if it was easier for the author to hack this together in CSS since he already knows it quite well compared to learning LaTeX. I love LaTeX, but the learning curve is more like a cliff.

lelanthran · on March 22, 2023

> Wouldn't surprise me if it was easier for the author to hack this together in CSS since he already knows it quite well compared to learning LaTeX. I love LaTeX, but the learning curve is more like a cliff.

Yeah, agree, but a work of fiction doesn't usually have tables, figures, references, equations, etc which can be painful in LaTeX.

It does have kerning, ligatures, orphan/widow management, runs/ladder management, page numbering, chapter numbering, margins (larger closer to the spine), rules, struts, etc that all come for free without the author having to know any of that stuff.

IOW, for a work of fiction, the learning curve for LaTeX is likely limited to setting the font, setting the size, setting the output type (book, article, etc) and using bold, emphasis, underline and possible verbatim.

The learning curve for LaTeX is a cliff if you're writing a thesis, it's simply markup if you're not.

WolfOliver · on March 21, 2023

In my opinion, markdown, LaTeX, HTML, ... is all a cognitive overhead distracting you form writing your content. That is why I started to work on MonsterWriter [1]

[1] https://www.monsterwriter.app/

lelanthran · on March 22, 2023

Weird, I had the same thought two weeks ago - I just wanna tell a story, and everything else is simply a distraction. I thought "Why not a simple webapp that presents nothing but a textarea/contenteditable" :-)

Anyway, I am happier creating native (desktop) applications than webapps, except that I don't think that there is a market anymore for them. How well is your native desktop app doing? Is there a large enough market for applications that need to be installed locally only? How do you market these?

pixelfarmer · on March 21, 2023

I just scrolled through the page. I think https://iangmcdowell.com/blog/book_breaks.png says it all, quite openly. For anyone who delved a bit deeper into the topic, this is horrible to look at.

pixelfarmer · on March 21, 2023

Seems like I have to add some explaining words here ...

Book layout is a very complicated topic. If you look at that image and step away a bit it leaves a chaotic visual impression. Reason behind that is that if you want a visually pleasing look, all the paragraphs have to have a similar "density". That is defined by how long the words are, the spacing in the words (kerning, ligatures, font), and the spacing between the words. Latter is all across the board, even changes within the same paragraph! LaTeX has all that typesetting knowledge included, and it will actively hyphenate words to accomplish its goal. I see exactly one word being hyphenated there.

Another thing that stuck out immediately was the font selection for the all uppercase paragraph. Some fonts are just not made to be used that way and this is one of them. These curvy uppercase letters are nice to start words, but used together it is ... ugh.

Last is that leftover sentence at the top of the right side. It "validates" the separator image, so this is somewhat of a pro/contra mixed bag situation.

Professionals could probably say even more about all this, I just used LaTeX for a couple years more often and these are things I learned (to see) on the way.

junon · on March 21, 2023

> Affinity Publisher

Didn't know Serif had made a third product. Be wary of them - Designer and Photo are both buggy minefields and their support forum is full of snobby jerks. Crashes have been reported and present for years, with little or no response from Serif. Their developers all seem either disgruntled or burnt out. Some of the employer review sites confirm this.

They're cheap but you certainly get what you pay for. Save often, back up often. I say this as an avid Designer user.

dagorenouf · on March 21, 2023

Never tried them but is it worse than adobe products?

junon · on March 21, 2023

Bug-wise yes, both Photo and Designer are very very buggy.

Functionality, also yes, but not by much. Only thing I miss from Photoshop in Affinity Photo is content aware tooling. I don't remember anything I miss from Illustrator in Affinity Design and I actually think they have better features than Illustrator (admittedly haven't used Illustrator in almost a decade so maybe they've caught up).

You really can't beat Affinity's price though unless you use GIMP/Inkscape of course, but I like neither of those programs.

akpa1 · on March 21, 2023

To chip in to on this, I've been using Affinity Photo for at least 5 years and I don't remember having many problems at all. That is to say, YMMV.

junon · on March 21, 2023

Yeah photo is leagues less buggy than designer. That said, I've had random crashes with Photo in the past too.

rcarmo · on March 21, 2023

I would also recommend https://weasyprint.org if you are generating PDFs. Generating invoices, resumes, reports, flyers, etc. from plain Markdown is a two-line affair:

https://taoofmac.com/space/notes/2023/03/19/1849#tuesday-202...

jcul · on March 21, 2023

Thanks for the link, weasyprint for converting markdown to some presentation format looks cool, similar to pandoc but maybe more customizable?

Just to point out, in the original post, the author does end up using weasyprint, as the tools he began with did not support the required CSS features.

rob74 · on March 21, 2023

True story: I once had to update an OpenOffice document containing mostly photos and insert newer photos. How hard can it be? Finally, I got so frustrated with OpenOffice constantly breaking the layout and simply refusing to fit the images I wanted to fit on a page on a page (needless to say that they did fit on a page in the old version of the document) that I gave up, built an HTML page with the images and printed it to PDF. Done!

sn41 · on March 21, 2023

I once had to generate about 170K+ pdf files from latex. Took a while (~5 hours) on a desktop, so had to get a server to do it and ran pdflatex with gnu parallel. The layout was a pain, since some strings in the databases were quite long.

Then I realized that I could just have generated HTML and printed it via some headless chrome script. The rendering was more consistent, and the printing was much faster (IIRC it took about 1/3 the time.)

yencabulator · on March 28, 2023

Typst is new but at this point I'd explore that, myself. Much more pleasant than LaTeX, and quite capable:

https://github.com/typst/typst

https://news.ycombinator.com/item?id=35250210

lloydatkinson · on March 21, 2023

I can hear the “make everything even your to do lists and shopping list in Latex” cultists seething from over here!

grecy · on March 21, 2023

For my print books I've used LaTeX for the print pdf, then I run that through Pandoc to create an epub from that same source files with a script. I'm extremely happy with the final results. Here's how I did it all [1]

Using macros and defining your own functions in LaTeX is a game changer, and when it came time to make my second book, I was up and running typing content in about five minutes, and it has the precise same look and feel.

[1] http://theroadchoseme.com/how-i-self-published-a-professiona...

martopix · on March 21, 2023

I keep thinking that what I'd really like is a substitute of latex that is based on markdown and CSS. This is a hint of what would be possible, although of course it would take decades to get to the power of latex.

maegul · on March 21, 2023

I’ve wondered about this, and can’t help but feel that what’s missing are good solid foundations from which webdev and general programming skills could build out everything we’d need to migrate from latex. I would bet the community would move pretty fast if the foundations were there, but I’m not sure there’s anyone with the requisite expertise and will and resources to build it.

Shame really, because IMO a web-aligned document writing tool chain would be awesome.

giraffe_lady · on March 21, 2023

Pollen is a good intermediate for this if you like lisp.

2b3a51 · on March 21, 2023

I've learned about a few pieces of software through the article. I'm just wondering if the FocusWriter odt file could not be processed through pandoc in some way with a style sheet to make a pdf.

z3t4 · on March 21, 2023

> It’s like Linux: free so long as your time has no value.

A better phrase would be: Free software is "free" as long as you value your free time, more then you value your time at work.

madduci · on March 21, 2023

Interesting read, I find asciidoctor pdf and its theming guide extremely good to obtain nice results for books and articles, customizing it with CSS, fonts and more

pixelfarmer · on March 21, 2023

Asciidoctor PDF works as long as you are ok with what you could call the general "template" it uses. You can modify details within that, but if you want to change it, other approaches are required. Last time I had to deal with it I went with DocBook5 and then used a conversion to LaTeX to "break out". I agree with the asciidoctor guys about not offering more flexibility with that "template"; they have other things to focus on and it works well for cases where no specific layout is required.

KodingKitty · on March 21, 2023

I have written several books. I used Markdown for the first one.

Then someone recommended AsciiDoc and AsciiDoctor. It was a much better experience compared to Markdown. Sure, not for pixel perfect tweaking. Still enough parameters to play with. Easy to create PDFs and EPUBs.

senthilnayagam · on March 21, 2023

I like asciidoctor because it is a text file and easy to version control, I have written couple of books with it and made into pdf, but I am still looking for actively maintained themes, so for self publishing and KDP, I am using pandoc and bunch of tools to convert it into various formats(docbook, md, docx etc). My current workflow is convert into docx which is imported in kindle create and optimise it with kindle previewer.

nye2k · on March 21, 2023

I remember my excitement when CSS3 became reality in modern browsers. I was following this project closely, which attempted to use recreate Robert Bringhurst’s The Elements of Typographic Style in CSS. That project made it clear just how far we were from a legible, printable web.

http://webtypography.net/

topcat31 · on March 21, 2023

What about something like bindery.js?

https://bindery.info/

beej71 · on March 21, 2023

I'm pleased to see it come this far. I still need footnotes, table of contents, and indexing, but maybe someday.

stakhanov · on March 21, 2023

Weasyprint does support all of these things actually, including footnotes. A table of contents would use references to CSS counters, and if you generate the input-HTML for weasyprint from pandoc, then pandoc can actually generate those for you. The only nontrivial thing on your list would be an index, assuming you would want to autogenerate it the way latex does, but even that seems kind of doable with a little bit of custom hacking. Pandoc can also do bibliographies through citeproc/csl. So pandoc+weasyprint is absolutely a practicable tech stack these days for science publishing, assuming you don't need latex-math and are not targeting any of those publishing outlets that give you a word template and latex style and force you to use it.

danpalmer · on March 21, 2023

I had a fun time implementing shipping labels a few years ago. They were just a printable web page, rendered by Django. I was also able to fairly directly translate the sizes/padding in the spec to CSS using physical units. Surprisingly straightforward!

foxbee · on March 21, 2023

This is awesome! Jost is a really nice font. Some css i like to add to my fonts to improve readability:

-webkit-font-smoothing: antialiased; text-rendering: optimizeLegibility; font-variant-ligatures: contextual common-ligatures;

Give them a try and let me know if they help :-)

blue1 · on March 21, 2023

One serious problem is that CSS do not support CMYK colors.

chrismorgan · on March 21, 2023

CSS Color Module Level 5 <https://drafts.csswg.org/css-color-5/> adds @color-profile for ICC profiles (which may cover CMYK), and device-cmyk() for uncalibrated CMYK colours.

Sure, browser support isn’t there yet, and this is still draft work, but it’s coming.

somat · on March 21, 2023

Might be a joke, humor can be difficult to convey in forum post form, And I am that stereotypical sort who often does not get humor. But I will bite.

It is a printed book... there is no color in the process.

danpalmer · on March 21, 2023

I don't think it's a joke, for the general case of print this is indeed a problem as the whole print industry runs on CMYK. You could go further and say that because CSS/Weasyprint doesn't have a Pantone plugin, it'll be difficult to control colour accuracy throughout the production process. You're right this isn't necessary for most novels, but it's a very real concern for print media.

blue1 · on March 21, 2023

It is not a joke. Color print exists, you know. For example, try obtaining a pure cyan from RGB color...

RandomWorker · on March 21, 2023

One thing it also needs is variable margins with margins based on page side. That way the text doesn’t get too close to the spine.

anotheryou · on March 21, 2023

now try making your page numbers start after the index pages

that's where I gave up.

chadlavi · on March 21, 2023

This person hasn't heard of pandoc I guess

mproud · on March 21, 2023

LaTeX is also free.