Hacker News new | past | comments | ask | show | jobs | submit login
What's new in TeX (lwn.net)
95 points by leephillips on Oct 29, 2015 | hide | past | favorite | 59 comments



Good HTML output for TeX does exist in various shapes and forms, and in fact Pandoc is one of the weaker converters out there in therms of TeX coverage.

You would get better mileage with tools such as LaTeXML[1] or TeX4HT[2], which cover very substantial subsets of TeX, if not yet 100% feature-complete.

My own take on "Should LaTeX survive on the web?" is "yes, if it evolves to meet the paradigm shift". I wrote a detailed version of that position at [3].

- [1] https://en.wikipedia.org/wiki/LaTeXML

- [2] https://en.wikipedia.org/wiki/TeX4ht

- [3] http://prodg.org/blog/latex_today/2015-03-16/LaTeX%20is%20De...


I cannot thank you enough for the work Bruce Miller, you and others are doing on LaTeXML.

However, LaTeXML is not 100% feature-complete at all (sorry if this is not what you were saying). For anybody not knowing how to improve LaTeXML itself (eg. by adding support for missing LaTeX packages), the only way to get it working is by using it from the beginning to avoid using anything that would not work out of the box in LaTeXML.


TeX4HT is mentioned in the article.


Part of what bothers me so much about TeX is how obscure and opaque the language feels. I am familiar with ten programming languages, from C and assembly to Haskell, Perl, Lisp. But just the way that people write TeX macros makes everything so unapproachable. The only insight I have gleaned is that it's like one big fat state machine, which makes it extremely hard to code with principle.

I understand Knuth wrote a book, but reading through it has only made me wonder more about the bizarre choices.


The reason TeX can be so unapproachable is the fact that it's a very different language from pretty much any other programming language you've ever used. It's not a Turing machine, and it's not lambda calculus. It's not even Horn clauses (which is the model behind Prolog). It's a string rewriting system; basically a giant cascade of rules like "rewrite this string to this other string" applied over and over until the whole input is consumed.

As it happens, this is yet another Turing-equivalent form of computation. The closest analogy I have is programming in Lisp, but with the constraint that you can only use special forms and macros, no functions are allowed.


I was about to say: isn't that just another expression language?

I've sometimes wondered if Lisp-like languages wouldn't be a much better way of writing documents than our current markup languages of choice. If I'll ever seriously try to pick up Racket, it'll probably be to give Pollen a spin and see how it is in practice.

http://pollenpub.com/


If you want to compare to abstract models of computation, the closest one might be a multi-stack push-down automaton (which is Turing complete).



LuaTex project (http://www.luatex.org/) is working on fixing the language part. Hans Hagen, of Pragma Ade company, is on the team. I know about him and his work because I've used the ConTeXt macro package for TeX for my bachelor's thesis. This was like 15 years ago and already then I wished that Context were standard instead of Latex.

Whereas in Latex you have a bunch of (sometimes conflicting) packages to solve various problems, Context is a coherent unit. I didn't need external package for anything, not even for complex stuff like wrapping text around figures.

The downside is, and the reason why I couldn't keep using it, is that the markup commands are incompatible with Latex. Submissions to journals accept Latex only :(


Unfortunately the LuaTex website is completely opaque and I can't see any examples of the language or even a top level description of how to use it or how the workflow differs from that of LaTeX.


LuaTeX is still under development (note, though, that this is not abandonware or anything like it, it is just a deliberate and careful project that has done amazing work over an extended time). A person who wants a document that will still work in five years should likely go elsewhere.

That said, however, if you are interested, a good place to start is by reading the user support list http://www.luatex.org/support.html .


What's the point of a project that never ships?


It comes with TeX Live. You can get it today, or five years ago, for that matter.

But the final version, for which there is a plan, has not shipped or been written. Writing it is a very ambitious task and takes time, and a little money (through support from, e.g., the TeX Users Group). Things take time.


Please see the links in the "Further Reading" and "External link" sections of this article:

https://en.wikipedia.org/wiki/LuaTeX


Keep in min that tex is really old software, from a time when computers weren't as powerful. So it makes a bit of sense that it was architected to make multiple passes over the input instead of storing the whole document in memory.


Surely. I'm just saying modern times call for modern languages, and that makes me uneasy about TeX.


I've seen projects announced on and off to make a next-gen TeX or TeX-like from a clean start, but they never seem to go anywhere. I think writing a good rendering engine is just a ton of work.

Two approaches I see as plausible:

1. Use TeX as a "document assembly language" of sorts, a pure back-end rendering target not exposed to the user, and do your document markup/scripting in some other front-end language. Pandoc [1] is a start at building infrastructure for this. You can then build a new renderer too if you want, to remove the dependency on TeX, but at least in the meantime you have a pathway to producing quality output right now.

2. Build something on top of HTML+CSS (maybe also +JS). Advantages of this would be that it'd make it relatively easy to target the web with the same document source, and open-source web browsers have already sunk a ton of development time into rendering. CSS even includes a bunch of print-oriented features that in principle provide markup for most of what you might need here, though browsers for obvious reasons haven't tended to prioritize those parts. wkhtmltopdf [2] is a project aiming to build a to-print or to-PDF document workflow on top of Webkit. However imo the results are still not near TeX-replacement level. I believe the gold standard currently, if you want high-quality print out of HTML+CSS, is the proprietary PrinceXML [3].

[1] http://pandoc.org/

[2] http://wkhtmltopdf.org/

[3] http://www.princexml.com/


I seriously doubt that implementations of any "modern languages" - or any other languages for that matter - have had anywhere near the astonishingly low defect rate and efficiency of TeX.


The TeX ecosystem (what you actually use) is riddled with tons of bugs. Images that end up somewhere totally wrong, text/graphics overflowing onto the page margins, are just some of the bugs that you regularly have to fight with, but there are also some more higher level systemic bugs like that there's no formal grammar. I don't know what is supposed to be efficient concerning TeX


> Images that end up somewhere totally wrong, text/graphics overflowing onto the page margins, are just some of the bugs that you regularly have to fight with

I write a lot of TeX, and I have never found that those things are the fault of the standard software; rather, they are my fault.


> I don't know what is supposed to be efficient concerning TeX

The absence of a better alternative in the domain, even though attempts to serve the domain by other tools (including ones using more "modern languages") have come and gone over the years.

TeX isn't theoretically ideal, its just practically very good and the expected cost to benefit ratio of a ground-up replacement is very high.


What about something ala Neovim? I've only ever looked at Tex from the perspective of a user (I don't program my own macros too often), so I don't know how hard it'd be, but why not a language overhaul?


The issue is that so many man-centuries (man-millennia?) are invested into TeX & LaTeX that sinking time into something else is very, very expensive (much like trying to build a better CPU than amd64 or arm64).

TeX is really, really amazingly powerful. It can do almost anything a typesetter could want to do, fairly easily, and it can do just about everything, one way or another. And its output is heart-achingly beautiful. Sadly, the code necessary to achieve that output ranges from…heart-achingly beautiful to heart-breakingly ugly.

There are other projects out there, of course. I do think that TeX & LaTeX are close to a local maximum, if not al the way there.

XML, in comparison, is a booger joke.


> I do think that TeX & LaTeX are close to a local maximum, if not al the way there.

Funny you say that since TeX's version scheme (in part) is that it approaches pi. And IIRC will be pi upon Knuth's death.


XML isn't even really in the same field as TeX. TeX is for typesetting, that's all it does. XML is for encoding general data. It's like saying PDFs are way better than protocol buffers.

TeX is incredibly powerful, but it's also incredibly idiosyncratic. I also personally think Computer Modern is an ugly font.


TeX has long ago gained the power to use any OpenType/TrueType font on your system, so dislike of CM is really not a reason to avoid using it. (I happen to like CM, but I know it's disliked by multitudes.)


> XML is for encoding general data.

No, it's really not. XML is a markup language, not a data-encoding language. JSON, S-expressions, ASN.1 &c. are all data encodings; TeX, LaTeX, HTML and XML are markup languages.

> TeX is incredibly powerful, but it's also incredibly idiosyncratic.

Agreed.

> I also personally think Computer Modern is an ugly font.

Eh, it's not great on-screen, but it looks pretty good on paper. But TeX & LaTeX have supported multiple fonts since the beginning.


> XML, in comparison, is a booger joke.

A dried up booger on the floor.


The trouble is the ecosystem of packages that people would have to redo.

If you replaced TeX, LaTeX itself would need replacing, as would every single LaTeX package and class that you ever want to use, ones like microtype and stuff, would have to be rewritten.

To be honest, the multi-pass deal isn't that bad, but the macro expansion system is crazy complicated. Every once in a while after working in LaTeX I'll get the feeling I understand it, but that feeling inevitably dissipates after ten minutes or so.


Very much this. As far as I can gather TeX is actually rather simple - considering that it basically does what PostScript does. Now, while coding raw PostScript might be fun as an exercise, most would prefer not to. LaTeX lands somewhere between PostScript, and something higher up. To meaningfully replace Tex/LaTeX/Metafont and even just a selection of "the best" LaTeX packages... would be a herculean task.

Making a "new" TeX probably wouldn't be that hard - but it's also something that wouldn't be that useful. I would very much like something that's both simpler and also keeps some of the lessons learned/implemented (word spacing/splitting, page layout, page breaks etc).

As for other "tools in the same space", I do like pandoc a lot. I want to like python's ReST (Re-Structured Text) - but that's a package I feel is in need of a rewrite/redesign. Many good ideas there - but figuring out how to take a simple document and produce simple, modern (preferably somewhat semantic) html for example -- or to produce a decent looking PDF without needing all of LaTeX/Texlive on hand isn't easy.

Rewriting ReST tools would be a lot of work, but I think if one didn't try for 100% backwards (output, plugin) compatibility it might be worthwhile.

The astute reader will notice that ReST/Pandoc deals with structured documents, and not really layout for paper/screen (both use TeX/LaTeX as an output target/pipeline). I don't know of anything that comes close to TeX/LaTeX for "rasterized" output.

On the other hand, I also don't know of any package/combination that'll make TeX/LaTeX produce anything but messy, 90s-style html -- that generally looks awful. Even if you were to try and force a modern set of CSS down over the resulting mess. If anyone knows of a modern hypertext package for TeX/LaTeX or some similar tool, I'd be happy to be proven wrong.


> I don't know of anything that comes close to TeX/LaTeX for "rasterized" output.

XSL:FO at one point seemed to have aspirations in that direction...


XeTeX and LuaTeX are projects you might be interested in.


People are trying to build alternatives, see http://www.patoline.org/ for example


Yet Tex is slow as hell


It doesn't seem slow to me.

I have a 450 page book that as part of the compile generates a 200 page answer answer manual. They are full of math, hyperlinked cross references, figures, etc. They use tons of packages including amsmath, hyperref, and many more. Compiling takes perhaps 15 seconds. This means compiling twice, to resolve references.


TeX itself is quite fast, it's loading all the extra packages that slows things down.


All languages you mentioned are general programming languages. TeX is a word processing language, so it makes complete sense that it is different. If you take the time to properly learn TeX you will see that many of the decisions are obvious for what it intends to do, which is to create full-featured document formats.


It probably doesn't fit exactly what you want, but I found Lout (http://savannah.nongnu.org/projects/lout) to be a suitable enough replacement for a lot of LaTeX-y tasks. I'm not a heavy LaTeX user so it's likely there's a whole lot missing, but I found the document structure and syntax to be a good deal simpler. In addition the installer is (iirc) only a few MB - I'm sure you can get tiny LaTeX\TeX installers or packages but the ones I've seen tend to be very "batteries included, plus a whole lot more" which is maybe a bit excessive.


I always had HIGH hopes for LxY http://www.lyx.org/ for people with issues with LaTex but have just decided that people really need to learn to write code for typesetting and wish that Latex would produce a better syntax. I have high hopes for LuaTeX.


As an instructor, I want to provide printed handouts, assignments, problem sets both printed and online, and having reflowing HTML is a great advantage for students when working though a problem in a split screen.

I was looking for a workflow that would produce PDFs and HTML, and I reached the same conclusions as the article.

I was hoping they had some sort of solution.



I had settled on AsciiDoc, but I'll reconsider pandoc. It seems to be very popular.


Have you tried tex4ht? Maybe it doesn't produce nice output by the default, but it is extremely configurable and it should support all LaTeX out of the box, unlike other solutions such as Pandoc.

I've wrote some basic tutorial about tex4ht configuring [1], lot of information can be found when you search TeX.sx for tex4ht tag [2], for example how to use Mathjax for math rendering[3] or how to include Javascript libraries and some responsive CSS[4]. You can ask tex4ht related questions on TeX.sx or on it's mailing list[5], there is also a issue tracker[6].

[1] https://github.com/michal-h21/helpers4ht/wiki/tex4ht-tutoria...

[2] http://tex.stackexchange.com/questions/tagged/tex4ht

[3] http://tex.stackexchange.com/q/265815/2891

[4] http://tex.stackexchange.com/a/239944/2891

[5] http://tug.org/mailman/listinfo/tex4ht

[6] https://puszcza.gnu.org.ua/bugs/?group=tex4ht


I second the recommendation to look at Markdown/pandoc. As far as I know, good html from TeX/LaTeX doesn't exist.

I've had some success with ipython/nbconvert too (which in a roundabout way works like Markdown+pandoc, but doesn't use pandoc -- but outputs both reasonable(ish) html, and decent PDFs (via LaTeX).


nbconvert infact does use pandoc for most of its internal mappings between markup languages. See the first 'Note' box:

https://ipython.org/ipython-doc/1/interactive/nbconvert.html


If you know emacs I recommend taking a look at org-mode. It's an emacs mode that let's you write in "markdown" (but including tex-syntax for math) and then lets you export to html or pdf (via tex/latex.)


org-mode sounds nice, but I've tried it and it's complicated and annoying. I hate how it folds everything as a default, so that when you open a file it's folded and the content isn't visible. I know I can change this somewhere but not having sane defaults is irritating.


Reflowing math, in a sensible way, is an unsolved problem.


Tried pandoc with markdown?


TeX's macro style of programming is too difficult. Nevertheless, people have done amazing things with it.

TeX has somewhere around 325 primatives, and one of the most important is the \def primative used to define macros. These primatives are used to define additional macros, hundreds of them, available in different so called formats. A basic format known as Plain TeX includes about 600 macros in addition to the 325 primatives. LaTeX is another format, the most widely used, but there are others, like ConTeXt, that are also very capable. Each of these extend TeX's primatives with their own macros resulting in different kinds of markup language.

TeX's primatives are focused on the low level aspects of typesetting (font sizes, text positions, alignment, etc.). LaTeX provides a markup language that is focused on the logical description of the document's components: headings, chapters, itemized lists, and so forth. The result is a system that does simple things easily while allowing very complex typesetting to be performed when needed.

In addition to the TeX core primatives and the hundreds of commands (implemented as macros) in a format like LaTeX there are additional packages, classes, and styles that are used to provide support for any conceivable document. LaTeX has a rich ecosystem of packages. Typesetting chess? There's a LaTeX package for that. Complex diagrams and graphics, there's a LaTeX package for that. Writing a paper in the style of Tufte? Writing a book? or a musical score? or building a barcode? there are packages for that. The documentation for the Tikz & PGF graphics package is over 1100 pages long! The documentation for the Memoir package is 570 pages.

The amazing thing is that all of this is built out of macros. Diving into this, and once one needs to customize the look of a document it's inevitable, you find yourself in a maze of twisty little passages.

Once upon a time, while writing assembly language for large computers, I enjoyed writing fancy assembler macros. I was facinated with Calvin Moore's Trac programming language based on macros and Christopher Strachey's General Purpose Macrogenerator. These were early (mid 1960's) explorations into the viability of macro processors as means for expressing arbitrary computations. Reader's interested in trying out macros for programming can try the m4 programming language (by Kernighan and Ritchie) found on Unix and Linux systems. m4 is used in autoconf and sendmail config files. Yet, TeX macros are in a whole other dimension.

All of these powerful macro systems have one thing in common: parameterized macros can be expanded into text that is then rescanned looking for newly formed macros calls (or new macro definitions) to expand as many times as one wants. This isn't just an occasional leaky abstraction; it is programming by way of leaky abstractions. Looking at TeX packages is some of the most difficult programming that I've done. It's unbelievably impressive what people have come up with (e.g. floating point implemented via macro expansion in about 600 lines of TeX), but it's also unbelievably frustrating to program in such an environment.

The LaTeX3 project is an attempt to rewrite LaTeX (still running on top of the TeX core). Started in the early 1990's it is still not done. I think its just that they are mired in a swamp of macros. They do have a relatively stable set of macros written, with the catchy name expl3, that are intended for use when writing LaTeX3. Here's a sample

     \cs_gset_eq:cc
    { \cf@encoding \token_to_str:N  #1 } { ? \token_to_str:N #1 }
This is described in the documentation as being a big improvement over the old macros and "far more readable and more likely to be correct first time". I can't wait.

I think LaTeX is absolutely without peer, but I wish improving it's programming method wasn't so daunting. I keep toying with starting a project to do just that, but so many others have tried and failed. It's disheartening.

Links:

[TRAC] https://en.wikipedia.org/wiki/TRAC_(programming_language)

[GPM] http://comjnl.oxfordjournals.org/content/8/3/225.full.pdf

[m4] info pages available on Unix and Linux

[Tikz & PGF] https://www.ctan.org/pkg/pgf?lang=en

[Memoir] https://www.ctan.org/pkg/memoir?lang=en

[expl3] https://www.tug.org/TUGboat/tb30-1/tb94wright-latex3.pdf


wow, how many tasty information, thanks


I'm currently working on a project [1] that aims to tie together a number of existing tools to make "TeX on the Web" as painless as possible. It's still in the (very) early stages, so there's still a number of features yet to be implemented [2]. We're also working with the CorTeX project to bring this to existing collections of scientific publications like arXiv

[1] https://davidar.io/TeX.js/ [2] https://github.com/davidar/TeX.js/issues


One of the best looking HTML output from LaTeX can in fact be generated with https://www.softcover.io/ . Although it uses a limited subset of LaTex called PolyTeX (see https://github.com/softcover/polytexnic) I was able to convert my LaTeX documents quite easily to PolyTeX and generate good looking HTML output.

Some example output can be seen in the manual http://manual.softcover.io/book/softcover_markdown#sec-embed...

The softcover tools (https://github.com/softcover/softcover) are based on code developed for creating the HTML for http://tauday.com/tau-manifesto and http://www.feynmanlectures.caltech.edu/. Internally Softcover uses https://www-sop.inria.fr/marelle/tralics/ to convert from LaTeX to XML/HTML.

P.S.: I'm not affiliated with softcover.io in any way.


Does anyone have much experience using roff rather than TeX? I feel like I have to install heaps of stuff to get anything done in TeX, LaTeX, or any of the *TeX, and it feels like a wobbly tower that collapses with minimal provocation. roff seems dead but I was encouraged to recently stumble upon this package of roff macros which makes me think roff deserves another look, especially for my simple needs.

http://www.schaffter.ca/mom/mom-01.html


Groff is definitely easier to use. For really simple stuff, just the standard macro packages me, mm and ms are absolutely fine. And they compile faster - much faster than pdflatex. The mom package is very good, but a bit more complex. I gave up on it a few years ago after having trouble building it on NetBSD, and it looked like it was abandoned. But that link shows Peter Schaffter is still maintaining it. Might have to look up my old groff_mom files and have another go.

And it looks like it comes with the standard Ubuntu installs. See man groff_mom.


The Article contains a mistake regarding the KaTeX library. KaTeX supports server-side rendering via node, but also rendering in-browser.

In the article it is stated, that it only supports serverside rendering.


I was a lot into TeX and LaTeX back in the day while at university, however nowadays I just use Word and save to PDF for distribution.

For the multiple output formats, at work the documentation teams use Dockbook and DITA.

Of course they don't write XML by hand in such cases, rather use tools like oXygen, XMLmind, CORENA Studio among others.


It would be cool if there was a javascript WYSIWYG editor that generated Tex under the hood


the author should have tried other browsers as well f.e. firefox, which supports both hyphenator as well as ligatures... (need relevant config lines, disabled by default afaik)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: