I looked into this project when it was first announced. The “in Rust” part seems more aspirational than reality. For those who may not know, Knuth originally wrote TeX in a language called WEB, which is basically Pascal with some preprocessors for making it usable and documentable. Later extensions to TeX, including eTeX, pdfTeX and XeTeX have also been written in WEB. The existing TeX distributions (TeX Live, MikTeX, etc), at their core first translate this WEB/Pascal into (autogenerated and basically unreadable) C, then run it through a C compiler, etc.
What this project has done is take the auto-generated translation into C of xetex.web, and wrapped this (unreadable) C code in Rust — which is an odd choice to say the least. It seems (from Reddit comments by the author) that the reason is that author of this project at the time was unaware of LuaTeX, which (starting with a manual and readable translation into C years ago) is actually written in C.
All these odd choices aside, and barring the somewhat misleading “in Rust” description (misleading for another reason: the TeX/LaTeX ecosystem is mostly TeX macros rather than the core “engine” anyway), there are some good user-experience decisions made by this project. With a regular TeX distribution these would be achieved with something like latexmk/rubber/arara, which too are wrappers around TeX much like this project.
There is still room for someone to do a “real” rewrite of TeX (in Rust or whatever), but as someone on TeX.SE said, it is very easy to start a rewrite of TeX; the challenge is to finish it.
Thanks. Is there any code to look at? (Couldn't easily find anything relevant on that repo…) I didn't mention it earlier, but I'm collecting a list of alternative TeX implementations[1] so if there's even (say) a dozen lines of WEB that have been converted to Rust, I'd be very eager to take a look and compare — it would be illuminating.
You can draw your own conclusions from comparing the code samples, but to point out a few obvious differences:
• What has the symbolic name “fi_or_else” in the WEB code has become the magic number “108” in the Rust code. (This is because the author of this project decided to have their Rust code start from the autogenerated C code which has already lost this symbolic information.)
• What is simply “if tracing_ifs>0” in the WEB code is 29 lines of Rust code, involving a magic offset into eqtb.
• The comments from the original are gone.
• Something like “cur_if:=subtype(p);” becomes “cur_if = (*mem.offset(p as isize)).b16.s0 as small_number;”.
I wonder how maintainable such Rust code will be. These problems are not insurmountable and the code can always be cleaned up later I guess… my point is simply that at the moment it is not idiomatic Rust code, for instance.
It's the result of automated c2rust[1] conversion. Of course it will be cleaned up. C2rust itself provides handy refactoring tools scriptable in Lua language. A lot of the code will be removed, for example image handling can be done through image-rs crate, etc.
All that's fine, and good luck to you; I wish you well.
But I guess I wasn't clear. This project is described as a TeX engine “in Rust”. Has any code been actually written in Rust—as opposed to either Rust code that wraps C code (as in the main repo), or autogenerated Rust code (as in this one you linked)—for any of the core parts (as opposed to system-dependencies like I/O or whatever) of the TeX engine?
I'm genuinely curious, and if there is any such code I'd like to read it (and compare it to the equivalent WEB code).
I completely understand that this could be done at some future date. The question is about the description of the current state of the project. The impression from a statement like “Efforts to ditch any C remnants are being made” is that the C code is only a few “remnants”, when in fact it appears to be the entire TeX engine itself.
(Also, I do not foresee any refactoring tool being able to convert, for instance, “108” into “fi_or_else”, recovering information that's already lost. The point here is that it would have been better to start with the original WEB code, not the lossy and autogenerated C code.)
Wierd way to go about it. Using some form of literate programming system - like WEB/NOWEB with Rust should be eminently doable - and possibly easier to translate section by section?
Not sure how easy it would be to call Pascal from Rust or vice-versa - but on x86_64 i belive the calling convention is similar for C and Pascal (strings are different, though).
And the price for the most missleading headline this week goes toooo THIS.
This is a TeX-manager/workflow automation/MikTeX+parrot clone. It's not a TeX-Engine, e.g. parsing TeX, and being extensible and faster than the alternatives (classic TeX, XeTEx, LuaTeX). To add insult to injury: writing Rust into the headline for me triggers a "better, faster TeX"-wish - went there, curious how much of TeX they already supported, came back a little disappointed.
This is all good and well.
But why XeTeX as a starting point instead of LuaTeX?
Most distributions started the transition from pdfTeX to LuaTeX as the main implementation. And the code from the new translation from the original Pascal to C is pretty nice.
And LuaTeX also has UTF8 support and modern font handling.
LuaTeX is a kind of successor to XeTeX. Biggest thing is that xelatex / latex / etc all had to use fixed size memory management while luatex supports dynamic allocation (which means no more "TeX capacity exceeded, sorry." dreaded error message)
Dynamic memory allocation is not always a desirable feature. If I were running a service like Arxiv, I would much rather have fixed memory management than dynamic.
Similarly, if I was running TeX in web assembly, in a tab in a browser I'd much rather hit "TeX capacity exceeded, sorry." than have it try to scuttle off with all the swap...
I remember reading that the unicode-enhanced Computer Modern isn't exactly the same as the original (latin) Computer Modern that Knuth designed with the former having a few cosmetic issues. If that is indeed the case, is it possible to use the original Computer Modern with a UTF8 engine?
Anyone have any experience? I see a date of a year ago on the page.
The latest TeX Users Group meeting, a month or two ago in San Francisco, featured a number of amazing demos of TeX and LaTeX systems compiling large documents basically instantly. Amazing what modern CPU's can do. The TUGboat with the papers should be out soon.
It doesn't really even take 21st century hardware. I remember running TeX on a Silicon Graphics workstation back in the early 90s. The processing was so fast, I didn't realize it had happened. If you consider the hardware restrictions that TeX and MF were written under in the late 70s/early 80s, it makes the system that much more impressive. The WEB source code of these is an education in itself.
Would be interested to see this. TeXLive has always been able to compile documents instantly even a decade ago, but only documents that were a few pages long.
When you got into dissertation-sized folders of documents with lots of TikZ images, it takes a little longer but still not that long. (my 160 page dissertation compiled in 10-15s back in the early 2010s on a Core 2 Duo with a non-SSD drive)
It was hard to get much faster than that for large documents because LaTeX has (had?) critical paths that defy parallelization.
If it is indeed possible to achieve instantaneous compilation for large documents, that would make live-compilation a reality (live compilation already is a reality but for small documents).
Saturday, August 10
10:00am David Fuchs
What six orders of magnitude of space-time buys you
>TeX and MF were designed to run acceptably fast on computers with less than 1/1000th the memory and 1/1000th the processing power of modern devices. Many of the design trade-offs that were made are no longer required or even appropriate.
An absolute plain vanilla TeX, exactly as Knuth wrote it, and my tool chain compiles it, composes all 495 pages of The TeXbook in 0.300 seconds on a 2012 MacBook Pro laptop (the first "Retina" model). Single threaded, composing pages in 0.6 msec each, running in well under 1 Megabyte total for code and data. Back on the SAIL mainframe (a Dec PDP10) that Knuth used to develop TeX, it was almost exactly 1000 times slower: the pages would tick by every half-second or so (at night, anyway).
Of course, nowadays we also have lots more memory to throw around. One cool idea is to modify TeX to retain all the internal data structures for the pages it has created, and run a nice screen viewer directly from that; Doug McKenna gave a slick presentation at the recent Palo Alto / Stanford TUG meeting of such a system that he created in order to enable live viewing of his Hilbert Curves textbook, including displaying figures of space-filling fractal curves that can be arbitrarily zoomed, which is simply impossible to do via PDF.
Going further, you can additionally modify TeX so it takes snapshots of its internal state after every page, and is able to pop back to any of these states. Presto, now if the user makes an edit on page 223, TeX can quickly back up to the state of the world just before page 223, and continue on forward with the modified input. Page 223 gets recomposed and immediately redisplayed, essentially in real-time. Of course, the trick here is creating and storing the snapshots efficiently; the TUG demo I gave using The TeXbook runs in a few hundred megabytes, and does the whole "pop back, recompose a page, redisplay it" rigamarole in milliseconds.
The bad news is that my stuff is still in the proof-of-concept stage, as there's no support for the well-established extensions to Knuth's TeX (importing graphics, using system fonts, etc.) that are required by the vast majority of LaTeX users. I don't expect any of these features to slow things down appreciably, but time will tell. I intend to do a "Show HN" by and by, with lots more details, when it's able to handle real-world documents.
My apologies for failing to successfully fly under the radar until things were ready for prime time. My premature TUG demo was intended to wow Prof. Knuth sufficiently that he'd approve of a decades-late Ph.D. for me. (Happily, he did agree, contingent on just one additional feature being added...)
> Presto, now if the user makes an edit on page 223, TeX can quickly back up to the state of the world just before page 223, and continue on forward with the modified input. Page 223 gets recomposed and immediately redisplayed, essentially in real-time.
Isn't it possible, in the worst case, that editing the source line that maps to page 223 could trigger re-rendering arbitrarily far back before page 223? Like if you wrote all 223 pages without any chapters, parts, \newpage, etc. How does your program handle this?
Sure. It seems best to redisplay quickly, then update the screen again when everything is quiescent (the user hasn’t typed anything for a few tenths of a second, and the whole document has been fully recompiled with no changes detected). Usually it’s not even noticeable, though of course there are degenerate cases where a document oscillates, which gets called out in the UI in the unlikely case it happens.
My incomplete ~240 page dissertation takes about 2 minutes to compile until complete (i.e., running multiple times to get references right, etc.). No TikZ at all, Intel Core i3 dual core processor, normal hard drive.
Sorry, I don't believe there are videos this year. I'm looking forward to the articles, though. Even Knuth was visibly struck by the whole thing.
And I'd also be interested to hear how big a doc. I have a 300+ page book I am developing, with many graphics. Compilation on my several-years old standard laptop is 5-10 seconds. So I wonder what is up.
I am happy somebody is working on this, but I really wish somebody would rewrite TeX and derivatives from scratch in a modern language. The syntax of LaTeX is mostly adequate, and should be kept so people don't have to relearn everything. But the list of other improvements that need to be made is a km long.
* The backend should be in a language that allows for easier editing and package development.
* Need modern bidirectional UTF8 font support.
* Need the compiler to stop producing a bunch of extra files in the same folder, which is a significant adoption barrier.
* Need a way to generate clean html output with nice css.
* Tikz is great, but it would be excellent if it was possible to include the graphical output of any language by writing inline code (org-mode style).
* Same for mathematics - if I can send input to Sage or Mathematica and print the results from within the tex files, life would be so much easier.
* Beamer is interesting, but it is hard to make anything but rather bland scientific presentation in it. A framework for rapid design prototyping in beamer would help so much.
I know it's possible to have workarounds and I have used them as needed. But the workarounds still have bugs, or are OS specific (your third one for instance), and they don't solve the problem of slow compilation.
There are also fundamental limits to TeX language such as the number of arguments to a function limited to 9 [1], or overloading functions is a pain [2], etc etc.
The advantage of rewriting from scratch in a modern language is that these issues can all be dealt with without workarounds.
It sounds like what you want is "something that isn't remotely what TeX was for, and what it's still really good at". Which is fine, of course, but not something a TeX engine should try to do. Maybe some kind of modular TeX-wrapping typeprocessing (not typesetting) environment.
tectonic is great! I find it works as a "daily driver" now, it's not just another experimental Rust project. it downloads everything as it's used, and automatically decides when to rerun compilation to deal with citations, float numbering, etc.
Rarely do I find I need to fall back on the standard programs.
I strongly recommend giving it a shot if you use LaTeX.
What I really like about tectonic, is that a program can use it like a library, and embed the engine without shelling out to execute programs. There is at this point some system font dependencies that need to be realized before compilation.
But otherwise it pretty much just works without a lot of pre-installation steps.
A4 is an internationally agreed upon standard, letter size is a mostly US thing. It's the whole SI units vs imperial units debate all over again. Makes sense to have the international standard as default rather than a country specific one.
> Letter or ANSI Letter is a paper size commonly used as home or office stationery in the United States, Canada, Chile, Colombia, Costa Rica, Mexico, Panama, the Dominican Republic and the Philippines. It measures 8.5 by 11 inches (215.9 by 279.4 mm). US Letter-size paper is a standard defined by the American National Standards Institute (ANSI, paper size A), in contrast to A4 paper used by most other countries, and adopted at varying dates, which is defined by the International Organization for Standardization, specifically in ISO 216.
LaTeX was invented in the U.S. (as was Unix, as was the transistor), so it makes sense for the default to be letter size.
It makes sense for there to be a system-wide setting to override the software default, so folks who use some other standard can set it once and never worry about it again.
I've been a fan of LaTeX for some time, but I've mainly switched to using Chromium instead.
Yes, it seems a bit strange, but browser engines have become mature enough to replace most of what LaTeX can do, and there are even work-arounds for things the browsers can't do natively. For example "Paged.js" is a polyfill for implementing CSS paged media extensions.
Using various Javascript (or webassembly now) libraries, I can directly render Math, SVG, musical charts, all sorts of quirky text directions etc.
There are even knuth-plass implementations in Javascript. I hope that someone smarter than me figures out how to marry that algorithm to the new CSS Houdini API!
I’m afraid your desire to meld Knuth-Plass with the CSS Houdini efforts is a bit like saying “I hope someone figures out how to build a wall for a house out of paint.” Except probably even less plausible.
But there are implementations of knuth-plass in js, only they are usually not used, as the speed trade-off is not worth it for the web. If you are only using chromium to render a PDF then the speed stops being a dealbreaker.
When I first wrote that comment, I wrote something like “you could do it, but the end result would be absolutely terrible”. Then I decided that actually it wasn’t possible after all in any meaningful way, and changed it to “except probably even less plausible.”
In reality, with what I have in mind, it’s kind of a bit of both: what I have in mind is that you’d need to split each piece of the Knuth-Plass layout (box, &c.) into a DOM element of its own first, so that the layout can determine their sizes and shuffle things around appropriately—since the layout API is only giving you the set of CSS boxes to lay out and their sizes and any engine-decided inline breaks in them, and not the ability to inspect what’s in them or to break them up into further fragments.
Once you’re doing that, it’s probably a bad idea to use the Layout API, because you get no substantial benefit (a ResizeObserver to notify you when you need to redo the layout is just about as good), but are using a lot more DOM nodes (which is bad for performance, and I strongly suspect it’d cancel out the benefit that the Layout API version can run away from the main thread), and are using a new, less-well-supported and probably-buggier API to boot.
Also browsers have some pretty terrible bugs around how hyphenation especially works when you have zero-width characters, and they have shown no interest in fixing them. (Chromium’s are the worst, but Firefox has a couple of interesting ones as well.) Therefore you’d probably need all of your line-breaking opportunities (most notably, soft hyphens) to be in boxes of their own. And now you probably won’t get your ff ligature if a hyphen could be inserted between them, so I’m probably going to have to disqualify it as not being able to produce the same output.
In the end, I’d be surprised if a variant of https://github.com/bramstein/typeset using the Layout API as far as possible while retaining identical output (excepting this soft-hyphens-inside-ligatures case, if my guess is correct) could get down to even 20× slower than it, or using less than about 10× as much memory. In practice, I think figures like 100× slower and 500× memory are more likely. It’s possible that it would be less janky for large amounts of text, given that it operates in a worklet which may be run off the main thread by the browser; but I doubt it, due to the increase in other requirements.
This is all assuming that my understanding of what would be needed and possible is correct—I may have stated it too strongly given my lack of particularly detailed knowledge in the area.
The ultimate problem I see with breaking words into separate boxes is that they can't be rendered "in flow" anymore. So I guess the actual Text has to be set outside the layout API.
However, consider the case of the CSS Flexbox. It can also wrap boxes, and there some smarter way of looking ahead could be helpful and would be within the scope of the layout API. Not sure.
And indeed, for rendering PDFs it may not be necessary or beneficial to rely on the Houdini APIs at all.
Another thing to consider is it isn't counting all the rust dependencies. You can get away with a lot less code in tree when things get pushed out into crates.
All TeX rewrites seem to keep the batch oriented architecture.
What could improve usability and add the possibility of a modern UI for TeX/LaTeX is to have an incremental TeX engine, that only computes the changes in a document instead of everything from zero each time.
Plus an engine that realizes that it has to recompute the document immediately instead of making "pdflatex ..." run 2-3x necessary in order to get what you actually wanted.
It's using web2c, and linking against a cleaned up version of the generated C sources for a combined c/rust binary.
There is a further project which runs c2rust over the generated c sources, and is cleaning up the generated c sources, this is linked in comments above.
Going directly from web -> rust, is expected to be difficult, and will probably have a lot to learn from the manual conversions above.
Cool, I've just started messing around with LaTeX, mostly for pgfgantt because Microsoft Project is so damned expensive. In fact so many working project apps (Like OmniPlan) seem to be way too high.
What this project has done is take the auto-generated translation into C of xetex.web, and wrapped this (unreadable) C code in Rust — which is an odd choice to say the least. It seems (from Reddit comments by the author) that the reason is that author of this project at the time was unaware of LuaTeX, which (starting with a manual and readable translation into C years ago) is actually written in C.
All these odd choices aside, and barring the somewhat misleading “in Rust” description (misleading for another reason: the TeX/LaTeX ecosystem is mostly TeX macros rather than the core “engine” anyway), there are some good user-experience decisions made by this project. With a regular TeX distribution these would be achieved with something like latexmk/rubber/arara, which too are wrappers around TeX much like this project.
There is still room for someone to do a “real” rewrite of TeX (in Rust or whatever), but as someone on TeX.SE said, it is very easy to start a rewrite of TeX; the challenge is to finish it.