Question for the author: Why to markdown? It seems to me the hard part of this t...

Finnucane · on Dec 1, 2023

You would want to have some kind fo markup that preserves structural markup as much as possible. I manage ebooks for a university press, and we have a deep backlist waiting for conversion, a lot of which only exists as page scans of old print volumes. I want to be able to offer them as epubs, which means I need to know where there are chapter breaks, heads, tables, charts, math, blockquotes, and so on and so forth. I have vendors that can do this for me, but it costs more than we'd get for some of these books in sales. I'd love to be able to do soem of this myself.

carschno · on Dec 1, 2023

I agree, the intermediate format should be plain text that could optionally be converted to any other format. I suppose that Markdown, however, is used as intermediate format here. It is close to plain text while it can preserve simple layout information.

In practice, I would use the Markdown output and plug it into any tool that converts that into the desired final output format.

sertbdfgbnfgsd · on Dec 1, 2023

That sounds reasonable. I might explore pdf -> markdown -> epub.

I wonder if this could somehow be used directly by calibre. I think calibre's pdf->epub conversion isn't amazing. In particular, tables often end up broken.

vikp · on Dec 1, 2023

I chose markdown because I wanted to preserve equations (fenced by $/$$), tables, bold/italic information, and headers. I haven't looked into epub output, but this ruled out plain text.

da39a3ee · on Dec 2, 2023

Why not choose an unambiguously parseable output format such as JSON, and then convert JSON to markdown/ html / etc when needed?