Hacker News new | past | comments | ask | show | jobs | submit login
XeTeX: could it be TeX's saviour? (vallettaventures.com)
92 points by steeleduncan on March 28, 2012 | hide | past | favorite | 64 comments



There are two problems with TeX (in general):

1. It is a local maximum. It produces great output and works now. Changing it requires a huge effort which will result in a system that is inferior, for many years to come. Only then another maximum could be achieved. This means there is resistance to change — it's easier to adapt to TeX's flaws than it is to write something completely new. Many systems suffer from similar fate (think Emacs).

2. It is a black hole, a programmer sink. People start using TeX and curse it, then they learn it, and by the time they know its limitations and are ready to take on a job of writing something better, they are proficient enough in TeX to stay within it. Many systems suffer from similar fate (think Emacs).

I wonder if something will eventually happen to budge the TeX community from where it is now.


Many systems suffer from similar fate (think Emacs)

That happens so much in software there needs to be a word for it, a term of its own.

After a while, you get deep enough in the rabbit hole so that it seems your initial judgement of redundant complexity was wrong. Even though it's still true, you've adapted to it. You just can't see it from the viewpoint of someone new anymore. For this reason I tend to carefully write down my initial concerns with something.

More on-topic, I don't think what TeX needs is a complete, sudden break with the past. It would be better to deprecate old, flawed features, and later on disable them and provide and optional compatibility mode (which is a bigger download).


> 2. It is a black hole, a programmer sink. People start using TeX and curse it, then they learn it, and by the time they know its limitations and are ready to take on a job of writing something better, they are proficient enough in TeX to stay within it. Many systems suffer from similar fate (think Emacs).

I don't really see this. Tex requires a huge investment to learn, but so do most programming languages. That doesn't mean the languages are bad or need replacing. I think that the high cost of entry to tex programming is not necessarily a tex problem -- it's difficult to visualize a system with equal power being much easier to learn. (Though the rough edges could be smoother.)

More importantly, when you say "Take on the job of writing something better", I have to believe that there are very very few who are up for this task in the world, and it would probably require their combined efforts. It doesn't seem likely at this stage. Maybe eventually, though.


The (La)TeX ecosystem is still evolving. For instance, see:

LuaTeX - https://en.wikipedia.org/wiki/LuaTeX

ConTeXt - https://en.wikipedia.org/wiki/ConTeXt

LaTeX3 - http://www.latex-project.org/latex3.html


XeTeX-generated PDFs are not compatible with the toolchains of some academic publishers (Cambridge University Press, which publishes the Journal of Functional Programming comes to mind, but I seem to recall Springer-Verlag having issues as well). Without full support for academic publishers, I believe that majority of TeX users could not upgrade.


What about XeTeX-generated PDFs makes them incompatible?


Great question; I have no idea. We just get mail back from the editors that says things like, "your PDF breaks our scripts; regenerate it directly with LaTeX."

Some, like Springer-Verlag, also have an editorial process where you have to send your .tex files to them and they will merge all of the documents and run LaTeX on them. I don't even know how you could get xetex (or pdflatex) to work with that system.


Umm, have these people ever heard of backward compatibility? Admittedly, many TeX package authors haven't either, but just dropping pstricks is going to make a ridiculous number of documents that have a figure in them impossible to compile. Not to mention the fact that TikZ, while better, is not better enough that everyone will want to invest time learning it...

MikTeX's install-on-first-use has its problems, but does help balance bloat and not removing older packages.


I agree that XeTeX doesn't provide a drop-in replacement for TeX-with-dvips, but neither does pdftex/pdflatex, and many people already use that. Generating postscript or DVI first and converting to PDF produces suboptimal PDFs.

For quite a long time, pstricks didn't work very well with pdflatex, until eventually pstricks added limited support for PDF.

Also, I do consider TikZ sufficiently better than pstricks to switch. The low-level PGF library alone wouldn't provide enough benefit, but the addition of the high-level TikZ makes it extraordinarily good.


Can't they just maintain both a bloated legacy version and a stripped down version that they focus on?


The TeX community in general has a philosophy that anything written since 1983 (TeX'83) or so should a) compile and b) produce precisely the same typeset document if it is recompiled today. This has obnoxious issues and is one of the reasons for the profusion of packages--the norm is to change the name for incompatible changes so that old users can continue on.


Which is a ridiculous goal, because, in general, most documents found in the wild are portable if transported together with the author's computer system. Chances of successfully producing typeset output from a .tex file (+ a bunch of external files) that you get from someone else are slim to none.

I'm glad there is a movement to do something. I used to use TeX but stopped a long time ago, because it was simply too much pain for too little gain. I still have to deal with it sometimes when helping my wife who publishes scientific papers. Whenever she mentions that she needs "a small modification in the BibTeX style", I get nightmares.


Actually anything that's gotten as far as journal publication should be exactly reproducible; I remember having to include the packages I was using and everything else in one archive for my couple of journal publications. Conceded that folks are not quite so careful for off the cuff work; same thing as having file:/// URIs in web pages.

And I'm hesitant to complain about it too greatly; my father's PhD thesis cannot be reproduced without a specific type ball for a Selectric and I know of numerous people who have kept dragging their old text files off disk packs to magtape to 8" disks to 5 1/4" disks to 3 1/2" disks and finally to CD ROMs where they have to duplicate all the CDs every few years lest they become unreadable due to dye fade. I know several photographers who are downright paranoid about an almost literal bitrot—if you ever want to see someone spitting mad with rage, telling a photographer that his precious negative or RAW file cannot be located, or read, or even decoded (a nontrivial issue with RAWs and older image formats, and depending on language and system even with text files!) will get you there.

Saying "if you preserved the files, we will recreate your masterpiece exactly" might be an overreaction to this but it's not impossible to understand where people are coming from. Particularly the poor dears coming from Word where moving a document from 2002 to 2008 will change the typesetting subtly and sometimes not so subtly.


People who downvoted me obviously never had to deal with documents that you have to process with TeXLive greater than something, and then only with particular fonts properly installed in the system (ever tried to install an OTF font? ever seen UTF-8 or ISO-8859-2 character encoding in a document?). Not to mention obsolete versions of packages that the author had on his system.

Well, I did have to deal with such documents, hence my comment. It was an informed rant.


>> Undoubtably a change this severe will be painful for some, but it will be less less painful than heading out to the computer shop in 3 years time to purchase the 2TB harddrive that will be required for the exponentially expanding tex updates.

Few arguments ever gain feasibility from hyperbole, this article is not an exception. The size of his texlive installation is purely circumstantial evidence, since that folder also includes backups of updated packages and all sorts of other "dynamic", i.e. user-specific data. Basing the argument on that seems...silly.

>> For those for whom adding the letters xe before typesetting is too much to bear, or for typesetting ancient documents

It isn't as easy as "just adding xe" before (La)TeX, since not all packages are integrated with it yet, and since the polyglossia package is still not fully stable, either (yes I know babel is old, but at least stable), so some packages have trouble dealing with polyglossia or have experimental interfaces in order to work with Xe(La)TeX. Csquotes is one of the packages that comes to mind. A further problem with XeTeX is that it still does not offer a proper version of the microtype package. And on top of everything, the hyperref-support for colours is spotty at times, at least for me.

For me, depending on situation, pdflatex and xelatex live happily next to each other and are both included in the same in my generic template via the ifxetex package and \ifxetex...\else...\fi, so depending on what I need in a given instance, running either binary on the same file produces either output.


In all honesty I did not realise LaTeX needed saving.


Outside of mathematics and physics conference/journal publishing, there's a widespread perception that it's losing ground. In CS some conferences are slowly switching to Word, or offering an option to authors (who are slowly switching to Word, especially outside of theory-heavy areas). In book publishing it's shrunk to a very small niche, even in technical areas. Part of the problem is cruftiness of tools, and part is the bizzareness of the underlying languages. As far as I can tell, few people who've written significant chunks of LaTeX stylesheet code or TeX macros/packages think the language is good, and it's complex/weird enough that even the vast majority of people who've written dozens of papers in LaTeX have no idea how to make nontrivial changes to its stylesheets. Also, many of the base tools still don't have Unicode support.

Belief that something is broken and attempts to fix it are a 20-year-old refrain at this point, e.g. ConTeXt is an attempt to make TeX more usable for book publishing, LuaTeX is an attempt to reduce reliance on TeX macros in favor of a less weird scripting language, XeTeX is a project to add Unicode, etc.


> In CS some conferences are slowly switching to Word, or offering an option to authors (who are slowly switching to Word, especially outside of theory-heavy areas).

Huh? In my research areas (evolutionary computation, machine learning, artificial intelligence, multiagent systems, robotics) this isn't remotely true. CS conferences of almost all stripes have always offered non-LaTeX submission routes. But LaTeX dominance is just as strong, if not stronger, than it used to be ten years ago.

Indeed, I think that the stigma surrounding Word submission in conference papers, journal articles, books, theses, even grant proposal submissions, is so strong that one must think twice before using it in the bulk of CS fields.


It varies by sub-area, but artificial intelligence is also my area, and I don't see that dominance anymore, especially in the more interdisciplinary areas (anything that overlaps with HCI, psychology, cognitive science, etc.). I also don't see the stigma anymore among younger researchers; I detect that attitude from older people mostly, and some of the "harder core than thou" people in math-heavy sub-areas, but there's a bigger mix of preferences among people under 35 who work in less math-y areas. I use TeX myself when it's my choice, but I've collaborated on Word papers as well, if I wasn't the primary author/instigator, and it seems common/expected these days. Especially if someone from industry has been the instigator (e.g. on DARPA-contract type research), or if it's interdisciplinary with someone not from CS/math, they've preferred Word.

AAAI, IJCAI, and AAMAS now provide both options, and my informal observation is that more Word papers are being submitted than used to be the case, especially but not exclusively when it comes to authors from industry. CHI recently officially deprecated LaTeX as a supported option, but still provides the old (no longer maintained) stylesheets as a courtesy. Several universities (e.g. Georgia Tech) have also stopped officially supporting LaTeX stylesheets for theses and moved to Word as the only official option, though they do distribute student-edited LaTeX stylesheets as a courtesy. I assume that one's because nobody in the IT department knows how to edit the stylesheets. The unofficial GT thesis stylesheet is a hilarious example of copy/paste cruft, too, with bits taken from 20-year old U. Texas stylesheets and various other places.


> AAAI, IJCAI, and AAMAS

These venues have had Word options for well over a decade or more. I recall a higher rate of Word (and HTML!) submissions in Agent97 -- the predecessor of AAMAS -- than I see in AAMAS now. At any rate I think there are few significant changes in LaTeX usage in those conferences.

You're right that the big place where Word shows up in CS is in HCI, software engineering, and interdisciplinary areas. Is it possible that, given your mention of CHI, that you're from these areas and possibly experiencing a sample bias?


In the areas of high performance computing, systems and languages, Latex is still dominant. Every conference I've submitted to has a Word template, but I've never known anyone to use it.


It depends on your particular flavor of CS. The less-theoryish flavors, like games or graphics, definitely do seem to be accepting of Word, and I've been to a couple conferences where Word was more prevalent.

Personally, I can't stand Word as an editing environment, because it doesn't offer a way of writing without worrying about formatting at the same time (seriously, if it had a markup view, I'd probably switch), and citation support is still a huge pain in the ass. However, LaTeX has many, many more pains in the ass for most people.


This kind of has been my experience as well in Statistics/EE/Machine Learning. While, IEEE/ACM have word style files, I have seen very few submissions that use word. In fact, I know reviewers who have had a significant bias (which whether it is right or not exists) against word submissions.


Yeah, TeX is a really broken tech in 2012. When I tried to look at the base languages to edit a BibTeX file, it blew my mind. I'm lucky that LaTeX has templates that I like, I have no possible means of making one myself.

It seems like the whole article misses the point. pdftex doesn't need replacing, TeX needs replacing. Pandoc (with some better extension method) + CSS style sheets + TeX algorithm layout engine would be bliss.

Unfortunately, the engineering required for that would need some really committed people, and the people committed enough to typography seem to have put all their eggs in the TeX basket, because that's all they know.


I agree that markup/stylesheets is what most strongly needs to be improved. I've only just started experimenting with it, but Biber, positioned as a BibTeX replacement, does really seem to be an improvement for the citation piece of the puzzle; its stylesheets look like something a mortal might be able to edit. It can also be used to generate HTML output, which has long been pretty crufty with BibTeX (I already have paper lists in BibTeX, so I should be able to generate things like my web page's publication list, and be able to customize the generated HTML, maybe even have some sorting/grouping options, all of which Biber can do).

If I had to predict, I would guess that something like HTML+CSS, or possibly a Markdown-ish input language with some kind of CSS-ish stylesheet, will eventually overtake TeX in significant areas, once the PDF renderers get good enough. Maybe someone will even find a way to pipe it into TeX as a renderer, as you suggest. A plus of that workflow is that it also makes it easy to produce good-looking HTML versions of articles, which is getting more important. And the "math in HTML" question is (finally) converging on a constellation of acceptable solutions. Advanced typography in HTML is slowly creeping forward as well.


With docutils you can already convert restructuredtext to pdf directly (not very nice) or to latex.


Thank you for informing me about the existence of Pandoc.


Is not changing the stylesheet a bad thing really? I definitely always prefer documents from people who use latex without style to those using word without style. Wasn't that one of the main reasons for creating Tex anyway? Of course it wasn't word at the time...


> even in technical areas.

Really? What has it been replaced with?


At MIT Press they've moved to Word with the eXtyles plugin as their preferred manuscript/intermediary format, and InDesign for final layout. I'm not sure what precisely eXtyles does, but it appears to produce some kind of XML-based workflow that lets them retarget books to both InDesign and ebook formats, as well as process them through various scripts (e.g. for indexing).

They appear to still accept TeX manuscripts as a "not preferred" option, with their preferred approach to those being to convert them to Word+eXtyles ASAP, via some kind of processor they have. It looks like they still offer a full-TeX pipeline if authors insist, but it's deprecated and not done for most books anymore (and they won't produce ebook versions of your book if you choose this one).

More info on:

* The preferred pipeline: http://mitpress.mit.edu/authors/guidelines/monographs.asp#MS...

* The TeX options (I assume they put this information only in a Word .doc, when it's just plain text, out of a desire to taunt potential submitters of TeX manuscripts): http://mitpress.mit.edu/authors/guidelines/texscenarios.doc


People consider Word a replacement for LaTeX for authoring books with equations? Seriously?


If it's just equations (i.e. using no TeX features besides the $equation syntax$), that's what their conversion workflow is for. You can submit your manuscript in simple TeX and then they convert it to their preferred format with some scripts. I believe they'll also accept Markdown extended with TeX-syntax equations (not sure if that's official, but I know one author who submitted a Markdown manuscript w/ TeX equations... though he got the proof copies back as Word, post-conversion).


Ah, now I understand: When you wrote "In book publishing it's shrunk to a very small niche, even in technical areas", you mean the "technical areas" that don't use equations in their books.

My background is blinding me to the fact that books with equations is really a small niche market. That small niche is completely dominated by LaTeX.


I don't know. I can write LaTex but I must admit at times I would prefer something that could be written from a real language but still use LaTeXs typesetting, kerning, etc.


Probably OT: I haven't had much of a chance to use LyX, but it looked promising. (I know folks that swear by it.)

Any reviews here welcomed.


LyX is ok for what it is. It certainly gets some use, even in publishing.

But at least for me a big part of the appeal to TeX is that it isn't a (%*#% GUI. It's text. I can edit in an efficient plain text editor, "refactor" styling, etc. I've even at times generated TeX code programmatically (It's often less painful for stuff like receipts than writing a PDF generator by hand, usually looks better too.)

So, I'd like a better input language, but I _don't_ want to fundamentally change the way it works. Just don't make me use a textual-replacement macro language.


> I can edit in an efficient plain text editor

... which is probably the real reason I don't use LyX either. Emacs + Latex are second nature to me.


Multimarkdown allows you to convert markdown into LaTeX.


Neither did I. I use LaTeX frequently and /vastly/ prefer it to the drek of Word.

It provides a nice default, fast editing, and an incredibly large and featureful set of libraries.


Download and install LaTeX from scratch.

Write a document, add some figures, use a custom OT font with it and save it as a PDF.

If you couldn't find at least 10 things that beg to be hugely improved in the whole process, don't ever try to work at QA.


Yes, people can continue to obliviously work with the same legacy tools and the same convoluted processes for years, until something better comes along and they wonder why they put up with that shit all along.

Oh, you meant it sarcastically? Well, I didn't.


The mission of TeXLive is to include everything and the kitchen sink. However, why should their TeXpad support everything? They could go the XeTeX+biber+TikZ route and educate their users how to switch from pdflatex,bibtex,pstricks,etc.


Precisely, there are other people out there who use TeX in different ways. They should just target their preferred compilation chain and just handle that in their app (they are not wrong in thinking that _most_ people out there just wan LaTeX to spit out PDFs and use "normal" graphics).

Otherwise, has anyone used their app? How does it stack against Emacs+AucTex+RefTex+Skim?


I think one of the best moves to get this in the works would be updated tutorials and documentation "Using LaTeX in 2012" or whatever, because a huge amount of the resources out there will promote use of outdated or old packages that would hinder moving forward.


Why is it necessary for TeX to keep all the libraries in source form on disk? Why not use compression, or package some pre-compiled form of the library code, or both?


From what I understand everything above TeX is one giant text manipulation macro, which means it all has to be available to the programs in text form, you could compress it but that would me decompressing hundreds (thousands?) of files every time you compile.


I understand, but I'm not convinced that the macros couldn't be somehow compiled down to some more compressed binary representation. Regardless, I still think keeping them on disk in a tarball or something would be more economical. Decompression can be quite fast if the right compressor is used.


> I understand, but I'm not convinced that the macros couldn't be somehow compiled down to some more compressed binary representation.

My understanding is that this is what .fmt files are for. I'm not sure where the line is drawn between what goes in a .fmt file and what is packaged as ordinary TeX code; I imagine that, like much else about TeX (which I love), it reflects the machine constraints with which Knuth was faced while writing it (in the late '70's and early '80's).


I just use XeTeX so I can get proper kerning and ligatures. Oddly enough, as a typography nerd LaTeX just doesn't cut it.


I understand that XeTeX allows some fancy ligatures and access to other opentype features, but what's lacking in [La]TeX's kerning, and basic ligature support?


Why don't you use luatex?


I have used XeLaTeX for a while because of the support for freetype fonts and better support for unicode.

Linux Libertine is my favorite font for XeLaTeX because it has better ligatures than any of the computer modern etc. fonts.


I didn't understand why they wanted to port LaTeX to the iPad when they first wrote about it:

http://lee-phillips.org/latexipad/


I'm sorry, but your logic here is just terrible. Your argument is that "I don't want an iPad" and "The iPad isn't a full computer" implies that "Nobody needs/wants LaTeX on an iPad". This doesn't logically follow and is instead a self-centered presumption that because you don't want something, nobody else does.

They want LaTeX on their iPads, I would like LaTeX on my iPad, and I have met other people who also want LaTeX on their iPads. When I'm taking notes, LaTeX works well for me (I'm not claiming for everyone) when I need to type up equations. I use my iPad with a bluetooth keyboard to take notes (the iPad's battery life (amongst other things) makes it very attractive for this application). I'd like to be able to render my equations to make sure I didn't make a transcription error.


That would indeed be a bad argument. I'm glad I didn't try to make it. The iPad is a superb browsing/reading device. But sometimes people insist on using the wrong tool for the job. You want LaTeX on your iPad. But you don't have it and won't get it, because it's not a computer. That was my point. You want to take notes and use LaTeX? Why didn't you buy a used Thinkpad for $100 from ebay and install linux? That's what I did. I don't have the excellent iPad screen (or the battery life - I certainly see your point there), but I don't have to say "I wish this could run LaTeX / Gimp / gcc / python / ....".


I use xelatex in my work and it's still embarassingly fragmented and outdated. We need to start over on a new TeX-like project (also so that it can be ported to mobile).


I thought that luatex was going to be the savior?


The lack of microtype support in XeTeX is a deal breaker. That's the main reason why I continue to use pdflatex.


Microtype is partially supported (read: half of what it mainly does), i.e. protrusion sort of works, but expansion does not. Protrusion is already something, though I agree that this is one of the biggest things holding me back from switching completely.


Tex, because it's author didn't care to reuse XML.

Having used it for publications and during the university i really, i despise Tex. Simply the language sucks, the outcome is nice, but those slashes and parenthesis really sucked.

With a XML syntax we could easily create beautiful editors, make it easy to parse with schema validation, etc.

Problem is that this card castle grew, avoiding alternatives such as docbook, and is and will always be a mess, since it's foundations are not parseable.


XML: published 1996.

TeX: Initially released 1978.

Plus, writing XML makes me puke, so even with the occasional grossness of TeX syntax I'm happy it's not XML.


An XML-syntax language? Like XSLT? Yeah, that's thriving.

The awesome editors and tools that are supposed to grow around any XML-based syntax (after the hard part -- parsing -- is taken care of), are like the modern version of the "sufficiently smart compiler".


XML 1.0 is from 1998. more than twenty years after Knuth started TeX. Even SGML is only from 1986, I suppose he could've used GML....


Doesn't Tex pre-date XML? If so, that would make it hard, even for Donald Knuth, to reuse a non-existent technology.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: