Hacker News new | past | comments | ask | show | jobs | submit login
MathML is a failed web standard (peterkrautzberger.org)
247 points by Mathnerd314 on April 7, 2016 | hide | past | favorite | 169 comments



It’s instructive to see how different MathML is from other math markup languages. Here’s the quadratic formula:

In troff, x = {-b +- sqrt { b sup 2 - 4ac}} over 2a

In TeX, x = {-b \pm \sqrt{b^2-4ac}} \over {2a}

In plain Unicode, π‘₯ = (βˆ’π‘ Β± √(𝑏² βˆ’ 4π‘Žπ‘))⁄2π‘Ž

In MathML, <mrow><mi>x</mi><mi>=</mi><mfrac><mrow><mi>βˆ’</mi><mi>b</mi><mi>Β±</mi><msqrt><mrow><msup><mi>b</mi><mi>2</mi></msup><mi>βˆ’</mi><mi>4ac</mi></mrow></msqrt></mrow><mi>2a</mi></mfrac></mrow>

MathML is simply unreasonable to write by hand. Most of the time it’s only ever used as an interchange format, automatically generated by tools.

Indeed, the only time I ever use it is with mandoc(1), the default manpage formatter on BSD, Illumos, and some Linuxes, which converts equations to MathML when converting manpages to HTML.


I audibly groaned at the XML version, but to be fair, it can be presented better:

  <mrow>
    <mi>x</mi>
    <mi>=</mi>
    <mfrac>
      <mrow>
        <mi>βˆ’</mi>
        <mi>b</mi>
        <mi>Β±</mi>
        <msqrt>
          <mrow>
            <msup><mi>b</mi><mi>2</mi></msup>
            <mi>βˆ’</mi>
            <mi>4ac</mi>
          </mrow>
        </msqrt>
      </mrow>
      <mi>2a</mi>
    </mfrac>
  </mrow>
Again, I'm not saying this is good. Compared to the brevity of TeX or troff, it's difficult to accept. But XML is easier to read when you give in to its heavyweight structure and format it appropriately.

By the way, on my system (OSX, Chrome), the unicode version is beautiful. I had not realized it was a good math option.


The big downside with the Unicode option is the poor handling of fractions. It's not so apparent with the Quadratic Equation, but if your have a complex divisor the / notation starts to fall apart.

Even properly formatted that MathML version is just awful. You could help it a bit by combining some of those <mi> elements on a single line maybe, but it's way too much mental effort to parse that mess.


Lack of proper subscript and superscript support is a killer for Unicode as well. Unicode is amazing for smaller, simple equations, but anything beyond that soon becomes extremely hard to process.


I know what you're referring to, but then again, that's a matter of formatting, not encoding.


You can remove the <mrow> inside the <msqrt> too.


Shouldn't the '=' and the minus and the other signs be inside <mo>, rather than <mi>, because they are binary operators, not identifiers? TeX for example, makes a clear distinction in terms of how much whitespace it would surround operators with, vs. identifiers.

In any case, I suspect MathML was intended as an intermediate computer-readable representation, not something that anyone would write by hand (MathJax can compile your LaTeX to MathML). I don't see what's wrong with an intermediate representation that's difficult to manipulate by hand. And unless I misunderstood the article, their point is that MathML is bad as an intermediate representation.


>Shouldn't the '=' and the minus and the other signs be inside <mo>, rather than <mi>, because they are binary operators, not identifiers?

You’re right; I should have marked those up with <mo> instead.


Well, ask Peter and their alter ego, perhaps they will provide an answer.


Yeah. I think that's inevitable with an XML syntax. XML is good at some things, but representing math expressions clearly is not one of them.


> Yeah. I think that's inevitable with an XML syntax. XML is good at some things, but representing math expressions clearly is not one of them.

What is XML good at? (by good I mean better than alternatives like JSON, YAML, HAML, etc)

The only thing I that might qualify is a long term/archival quality document format like ODF/OOXML. The inherently embeddedable nature of XML does seem like a nice fit but it gets very bloated very fast (deflated wrappers help though).


XML grew out of SGML/HTML, and there's no denying that its age shows. It's kind of like going into a house built in the 90s and seeing all the little things that just scream "90s house - that looks so dated!"

Did you know that web browsers have native support for XML? Try fetching an xml resource and pulling the responseXML value off the XHR - you have another DOM object right in your hands, that you can treat just like a regular Node.

The reason that XML is better than the alternatives has nothing to do with the syntax itself, but the tooling around it. Browser support, XQuery, XPath, XSD/RelaxNG, XSLT - please tell me where I can find the equivalents for any other markup language. You don't have to use them, either - but if you need them, they are there in pretty much every framework. XML had the first to market advantage, and was picked up in enterprise systems and made powerful and ubiquitous. If you need batteries included, XML is right there, the others are not. There is really nothing wrong with it, the syntax is clunky for some applications but great for documents, whereas e.g. JSON would be horrible in that scenario.


> Did you know that web browsers have native support for XML?

Not only that, browsers have support for XSLT 1.0, so you can format it and style it with CSS on the fly. Even mobile browsers support that, as far as I can tell.


Great explanation. The tooling and first mover advantage were the reasons I started working with it back when it was in its infancy.


1. XML is very easy to parse (linear-time). By comparsion, TeX or troff, while elegant, are not that easy to parse. In the sample formula (see the first comment) the parser that reads the beginning of the formula has no idea it's going to end up with a fraction until it sees \over. So it's a real parser and then a post-processor that sets things up; check "TeX: The Program" for details. And it can only parse one language. In XML it's just a very dumb highly optimized generic loader that can load any XML. I agree the content has to come from somewhere, but it's a different story.

2. XML data model is more sophisticated than JSON or YAML: it supports element ordering and mixed content and does it rather elegantly and succinctly. It also has namespaces (and these are very good namespaces, they're not hierarchical, they're just long names in a single flat namespace with convenient notation to shorten the long prefixes to reasonable size). As a result it's very easy to define a new language, extend a language, mix multiple XML languages, etc. JSON and YAML are hopeless here.

3. XML comes with tools to define the type of the document or a fragment, so you can read a document and automatically check that it has the right syntax (and/or convert the data, such as dates, into the native format). There are three ways to do this (DTD, Schema, Relax NG) in order of increasing power and expressiveness (not just syntactic sugar, but different kinds of languages). In particular, it natively supports things like inter-element references, which is very convenient for complex documents.

4. XML comes with XSLT, which is a general-purpose tree transformer (transducer) with declarative syntax. This is an immensely valuable tool. To put things into perspective: a compiler is a special-purpose tree transformer that transforms the source tree of a program into machine code (which is also a tree, technically: sections, data, functions, etc.). Are you sure you don't need a general-purpose declarative tree transformer and prefer to write ad-hoc ones? :)

5. The specification of XML 1.0 is shorter than, say, YAML :) OK, this is only one part of XML landscape, the whole is much bigger, of course; but still this part (basic XML and DTD) is noticeably shorter than YAML. (I myself also find YAML pretty cryptic.)


Great summary! XML really isn't a bad format, it just gets bad mouthed by everyone who thinks it's a bitch to author -- and I agree, it is. But just don't author it by hand then. JSON or YAML of any reasonable length is also terrible to author by hand, yet you lose out on so many of the benefits of XML. And for what, better "hello world" samples?


> What is XML good at

Supporting XQuery/XPath queries and typing with DTDs are the big ones for me, including all the surrounding tooling. You can get replicas of both of these in JSON now, but I don't think they're as mature.


"What is XML good at?"

(For the purposes of this post, I'm including HTML in the XML family.)

XML/HTML is good when:

1. You have two dimensions of markup you want to do. That is, you have a clear distinction between what is a new "tag" and what is an attribute on that tag. If you can't almost instantly decide whether some feature you want to add works as an attribute or a tag, you probably shouldn't be in XML.

2. Almost every tag one way or another contains some text, the third dimension that XML supports. A proliferation of tags that never contain any text is a bad sign. A handful may not be a problem, e.g. "hr" in HTML, but they should be the exception.

3. You have a really good use case for XML namespacing, the fourth dimension of information that XML supports, in which case there's almost no competition for a well-standardized format, as long as you're also using the previous three dimensions.

There's sort of this popular myth that XML is useless, which I think isn't because it's true or that XML is bad, I think it's because in general, most times you want to dump out a data structure #1 isn't true, let alone #2 or #3. In a lot of data sets, you've only got the two dimensions of "simple structure" and "text", not annotations on the structure itself. (Or, perhaps even more accurately, they end up implicit in the format itself, and the format is constant enough for that to be just fine.) A lot of stuff in the 1990s and 200xs used XML "because XML" even though it clearly failed #1. XML is really klunky when you don't want that second dimension because the XML APIs generally can't let you ignore it, or they wouldn't actually be XML APIs.

On the other hand, when you learn this distinction, you do come across the occasional JSON-based format that clearly really ought to be XML instead. You can embed anything you want into JSON, but when you're manually embedding a second structure dimension into your JSON document, it loses its advantages over XML fast. If you've ever seen any of the various attempts to fully embed HTML into JSON, without leaving any features behind, you can begin to see why XML or XML-esque standards like HTML aren't a bad idea. HTML is much easier to read for humans than HTML-in-JSON-with-no-compromises.

And if you've truly got the four-dimensional use case, XML is really quite nice. When you need all the features, suddenly the libraries, completely standardized serialization, and XPath support and such are all actually convenient and surprisingly easy to use, for what you're getting.

Some examples: HTML is a generally good idea. SVG is a middling idea; it passes #1 and #3 but fails #2. SOAP and XML-RPC is generally a bad idea; SOAP fails #1 and #2 but sort of uses #3 and XML-RPC fails all three. XMPP I actually think is pretty solid as an XML format (mere network verbosity problems can be solved with an alternate encoding, though admittedly that becomes non-standard), and in a lot of ways, the real problem with XMPP isn't so much the format itself as that people are not used to dealing with the four-dimensional data structures that result. People expecting IRC-esque flat text are not expecting such detail. Using the fourth dimension of namespaces for extensibility is neat, but few developers understand it, or want to.


This is perhaps the best (most terse and accurate) summary of XML tradeoffs I've seen in years.

I generally don't just comment "attaboy" but there you go.


Why does this have to turn into an XML bashing thread? Writing the above equation in JSON would be just as terrible.


> Writing the above equation in JSON would be just as terrible

Not quite as bad as XML.. I think the problem is more the verbose, overly-nested format that was chosen for MathML than XML itself though.

  {
    "mrow": {
      "mi": [ "x", "=" ],
      "mfrac": {
        "mrow": {
          "mi": [ "βˆ’", "b", "Β±" ],
          "msqrt": {
            "mrow": {
              "msup": {
                "mi": [ "b", "2" ]
              },
              "mi": [ "βˆ’", "4ac" ]
            }
          }
        },
        "mi": "2a"
      }
    }
  }


That's as bad as the XML. Actually it's worse, because your ordering is undefined which breaks everything. They are both nearly unusable by humans.

Basically the problem is this: Either you explicitly represent the grouping in a general scheme capable of it, then you get the disaster (from the point of view of human readable and manipulable) that is XML or JSON. Or you use a domain specific language like LaTeX or whatever, with the attended parsing issues,etc.

If you want people to edit it by hand, the latter option is much better - but it has it's pain points. You don't get to use a broad range of robust tools to manipulate them, for one thing.


JSON object properties are unordered per the spec. This is not portable.


S-expressions are the better alternative to XML.


What you have doesn't work. What if I have "mi, mo, mfrac, mo, mi" at the top level? So something like "a - b/c + d". You can't specify the same key twice in JSON. Also, keys are technically unordered, so there's no guarantee that a parser will put that top-level "mi" before the "mfrac".

JSON is great at many things, but polymorphic substructures are AFAIK only really possible with everything being an object defining the "type" that it is. And that looks significantly uglier than what you have above:

    {
        "type": "mrow",
        "children": [
            {
                "type": "mi",
                "identifier": "x"
            },
            {
                "type": "mo",
                "operator": "="
            },
            {
                "type": "mfrac",
                "rows": [
                    {
                        "type": "mrow",
                        "children": [
                            {
                                "type": "mo",
                                "operator": "-"
                            },
                            {
                                "type": "mi",
                                "identifier": "b"
                            },
                            {
                                "type": "mo",
                                "operator": "Β±"
                            },
                            {
                                "type": "sqrt",
                                "expression": {
                                    "type": "mrow",
                                    "children": [
                                        {
                                            "type": "mi",
                                            "identifier": "b"
                                        },
                                        {
                                            "type": "msup",
                                            "expression": {
                                                "type": "mi",
                                                "identifier": 2
                                            }
                                        },
                                        {
                                            "type": "mo",
                                            "operator": "-"
                                        },
                                        {
                                            "type": "mi",
                                            "identifier": "4ac"
                                        }
                                    ]
                                }
                            }
                        ]
                    },
                    {
                        "type": "mi",
                        "identifier": "2a"
                    }
                ]
            }
        ]
    }


While this format is more generic, an abbreviated encoding can sometimes accomplish the same thing. For example, just moving the "type" to be the object key and removing the implied secondary name gets you this far:

    { "mrow": [
        { "mi": "x" },
        { "mo": "=" },
        { "mfrac": [
            { "mrow": [
                { "mo": "-" },
                { "mi": "b" },
                { "mo": "Β±" },
                { "sqrt": {
                    { "mrow": [
                        {"mi": "b"},
                        {"msup": { "mi": 2 }},
                        {"mo": "-"},
                        {"mi", "4ac"}
                    ]}
                }}
            },
            { "mrow": [
                {"mi": "2a"}
            ]}
        ]}
    }
It's not as general, but works if you know your syntax is similarly bounded. I don't know how certain static languages would handle serial/deserializing, but makes construction via javascript literals much more pleasant.


FWIW, this would be a literal s-expression translation:

    (mrow (mi x)
          (mo =)
          (mfrac (mrow (mo -) (mi b) (mo Β±)
                       (sqrt (mrow (mi b) (msup (mi 2)) (mo -) (mi 4ac))))
                 (mrow (mi 2a))))
And this would be a saner one, where mrow is implied:

    ((mi x) (mo =) (mfrac ((mo -) (mi b) (mo Β±)
                           (sqrt (mi b) (msup (mi 2)) (mo -) (mi 4ac)))
                          (mi 2a)))
I think either of those is clearly and inarguably superior.


Sure, that works as long as each type only has one property. Decoding it might be problematic; I don't know any serializers that would handle that kind of mapping natively.


Have a go at trying to convert a complex TEI or Docbook document to JSON; you will want to put a gun to your head before the day is out.


> What is XML good at? (by good I mean better than alternatives like JSON, YAML, HAML, etc)

It has a single standard way of doing schemata that all the tools support, which is great. The maven pom.xml format is a much clearer way to specify a dependency than most of the alternatives (which often use an excessively clever concise form), and has really good autocomplete when editing it in eclipse (because eclipse understands the schema and so can offer autocomplete based on the elements that make sense at that point in the document).

If XML had just not bothered with namespaces I think it would have worked really well.


Try the same equation but then with JSON, and see how it is even more terrible. Document Object Models as in HTML etc... are where XML is actually ok, as that was its design niche.

Could have been a bit simpler but hindsight...


Whenever I need to send dates across the wire with a JSON API, I sort of miss XML a little bit. It's a lot easier, and there are existing patterns and tools to help there. JSON really only supports three data types - boolean, numeric, and string. Any other type needs to be fiddled with some ad-hoc system for communicating the schema or just expecting created_at to be a date.

XML, on the other hand, had the ability to do something using attributes in the element or a DTD / XSL. Occasionally I do miss that ability to communicate data schema alongside the data. But only occasionally.


I don't see a reasonable alternative for text markup (HTML, ODF, OOXML)


XML is unreadable and unwritable for any markup-heavy document, and math expressions tend to be more markup than text.


Indeed. But you could always define your own alternative syntax and write a simple translator to XML.


MathML was never intended to be written by hand any more than SVG was. It was designed to provide a standard output format for equation editors. Wolfram created MathML explicitly to prevent TeX from being adopted as a defacto standard on the web because TeX, being concerned only with visual appearance, does not do a good job of describing the structure or semantics of equations. MathML excels in both these areas, providing a fairly simple and standardised visual model which fits well with web browsers, and a rich semantic model.

In other words, MathML is great and it's a much better fit for web browsers than anything else on offer. Just don't write it by hand!


In AsciiMath[1], x = (-b +- sqrt(b^2 - 4ac)) / (2a)

[1]: http://asciimath.org/


Good, you caught that he was only dividing by 2, not 2a


also, thanks for the pointer to ascii math, looks like a nice way to render


I wrote a clone1[1] of asciimath recently that aims to be just a little bit better then asciimath. It only targets MathML though.

[1]: https://runarberg.github.io/ascii2mathml


XML is not meant to be written primarily by hand (although I agree some XML languages could be more elegant). The markup part of XML is for the machine; if you remove it from your example, you'll get x=βˆ’bΒ±b2βˆ’4ac2a, which is, basically, the text content that was meant for the humans.

Now, as a machine language XML is much better than troff or TeX, because it's very easy to parse: it's basically a syntax tree, the result of parsing those other formats, you don't even really parse it, just deserialize. Naturally, it's a very good interchange format. Technically it would be a much better option than a full-fledged JavaScript parser and typesetter because it would've removed the parsing part. (And this is only one of the advantages.)


Does anyone else prefer writing troff/tbl/pic/eqn? I always felt that the language of TeX/LaTex was a step backward in user-friendliness.


I’m also one of those who prefers troff. I read all five volumes of Knuth’s Computers and Typesetting cover to cover, and TeX is a beautiful piece of work. But in my preferred alternate universe, two things would be different: Joe Ossanna would not have died young, and AT&T would have been more open with licensing in time for Knuth to use troff as a base for his improvements to the world of typesetting.


He tried, but the syntax was ill-defined, and the typography was poor.


I found troff syntax to be very hard to debug for parse errors, while you can easily fit the tex tokenizer into your head. There is also much less control in eg alignment of complex tables.


I usually use OpenOffice / LibreOffice Equation Editor which is based on eqn. Ironically the on disk format it uses is MathML.


Yes. I would use OOo if the equation typesetting wasn't awful. They don't seem to have put any effort into improving it over the years. MS Word typesetting has become quite good, though not up to the best you can get with LaTeX.


I've found XML is more suited for proper indentation formatting and reading top to bottom not left to right. When it is in up to down and indented it is VERY readable and can be even more understandable as the elements are well implied.


Blargh. It annoys me that MathML was even a thing. We've already had a perfectly-fine and widely-used markup language for mathematical formulae; it's called TeX.

Ideally, I'd prefer that HTML5 include a <math> tag, or something similar, that takes TeX as input and produces formatted output. A side benefit would mean that every browser, and every system, in the world would have TeX installed!

At least MathJax can take TeX as input.


> A side benefit would mean that every browser, and every system, in the world would have TeX installed!

TeX doesn't have dynamic reflow, is defined by a macro system that involves essentially monkey patching the parser as it goes, lacks anything resembling a DOM, wasn't designed with Unicode or internationalization in mind, is completely unparallelizable, doesn't interoperate with CSS, has its own concept of block and inline layout that doesn't allow for text wrapping around floats without hacks, requires multiple passes to compute various things like tables of contents, and wasn't designed for interactive-level performance. Something syntactically like TeX may well be the solution, but jamming TeX itself into the Web platform wouldn't end well.


Yeah. The main thing people probably want is "TeX quality" typesetting of mathematical formulas inside web pages. Bundling the entirety of TeX inside a browser for this purpose would be kind of a hack, and suboptimal in many ways. I think that instead of just using TeX as a black box, it would be more interesting to actually learn what makes its formulas look good, and apply these lessons in modern software.

There's a related article called A Functional Description of TeX's Formula Layout [1] that describes a more modern reimplementation of a part of TeX.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39....


Considering amount of expertise that went into making TeX and the fact that (in my experience at least) it far surpasses other solutions, it might simply be that mathematical notation typesetting doesn't lend itself to dynamic reflow, internationalization, parallelization, DOM representation, interoperability with CSS and its concept of blocks and inline layouts, wrapping around floats, single pass processing and interactive-level performance.


I don't see why, from a technical point of view. There's no reason we should throw out the benefits of the Web platform because TeX happens to do things a certain way. With MathJax and tools like it, we can bring the benefits of TeX to the Web platform in a way that actually meshes well with what the platform provides.


This is about how I feel. Can someone who’s actually competent explain why (La)TeX output looks so much better than what web browsers do? I’m not talking about childish colors, sticky headers, and all the other annoying β€œmodern web features”, just the relevant content: some text and inline images.


Because TeX documents are usually made by people who have something valuable to say, and the tool reflects that. On the other hand most of the web is designed to sell you stuff, and the actual content is not important. The tool, again, reflects that.


I don't think TeX output looks better than what Web browsers do if you enable all the high-quality text features like hyphenation and use modern OpenType fonts (in particular not the Core Fonts for the Web). The reason why the Web doesn't do this by default is backwards compatibility and performance: hyphenation is asymptotically more expensive.


TeX doesn't hyphenate every word, it only attempts hyphenation when it looks where to break a line. That said, I don't see why a document cannot be prehyphenated once on the server; I'd say this is the right thing to do. (Besides, in this case we don't have to rely solely on algorithms.)

It's not that CSS is still very far from professional typesetting; what amazes me is that at the same time it's much more complex. Look at all these possible 'display' values and box models! And you still cannot do a run-in header! Or automatic numeration! (You can do the latter natively with XSLT, but it's not cool.) By comparison TeX only has boxes, vertical and horizontal lists, glue, and a few lesser things like kerning. And it produces works of art.


A few things that make LaTeX docs look good:

    - quality justified paragraphs making use of inter-word spacing and hyphenation
    - font ligatures (tex uses quality fonts have special characters for sequences like fi fi fl ffi ffl ft etc)
    - good heuristics about where to cut pages (e.g. flexible spacing around heading)


Web browsers implement all of this nowadays. In fact, OpenType GSUB ligatures in particular go far beyond what TeX supports.


But that begs the question why web browsers don’t implement this. Is there a technical reason or did at some point some people decided that decent text rendering is irrelevant one should focus on other gimmicks?


The Web started as a tool for rendering text and occasional images pretty much as soon as it comes out of the wire. TeX on the other hand, was meant to be a typesetting package which compiles textual description of the document into a beautiful printable form. That compilation takes noticeable time even today, I imagine it was unacceptably slow for the web back before. Also in TeX you deal with paper sizes, in browsers you deal with adjustable viewports of unpredictable size.

I guess that the Web just started simple and evolved from that, and then nobody bothered to remake it for prettier text.


I remember an old publishing package, Ventura Publisher; the first time I saw it was 1990 or 91 maybe and then it was at v3. It was a very versatile publishing tool (all the perks of a professional typesetting package, tables, equations, floating illustrations; lots).

What's interesting is that it could separate the content and the styling information, so you could load the same content into two or more publications with different styles and get two different typesetting from the same source (e.g. one column small page for a book and two or three column large page for an article in a journal). And it was very easy to change the format or paper size: just change it and it would reflow the content accordingly. It didn't even took much time, as far as I can remember, and I'm talking about IBM PC 286 here.

So, truly, it's not a technical obstacle; web could have a much better typesetting engine working at high speed (and I'd say that it would be much better to imitate paged media instead of scrolling; scrolling is really inconvenient for reading). It's just that it grows wildly and in all directions at once; it tries to be a "semantic" storage, a rendering medium, and an application engine at the same time, and is not particularly good at any of this.


I don't think HTML-typeset math has to look bad. e.g. http://www.deeplearningbook.org/contents/linear_algebra.html Well this particular example is auto-generated and the code looks terrible, but it's not hopeless to get nice math without TeX.


Compared to the amazing complexity of OpenType layout for complex scripts, mathematical typesetting is pretty simple. It's just not been a priority for the browser vendors, in the same way that we now have amazing technologies like WebRTC and WebGL but managing bookmarks still sucks.

It might "simply be" the case (well, in fact it is) that TeX is simply really old and doesn't lend itself well to concepts it predates such as dynamic reflow, etc...


All good points, it would be insane to include a full TeX engine in the browser.

But for just rendering equations, I don't think you can beat TeX for ease of use. And most of the criticisms then don't apply - you can void the insane macro system, it doesn't require multiple passes, etc. I guess lack of CSS support could still be an issue.


Good answer. TeX is an input language and is fairly good at what it does for mathematicians. From a programming language and systems point of view, it is an ancient, ad hoc mess. TeX could be used as a computer representation, but it fails for that purpose for lots of reasons, several of which you mention here.


> TeX doesn't have...

Is that due to implementation or the specification?

I used TeX a few times back in the mid-90's and something about it really clicked with me. The documents I made with it are easily the most attractive documents I've made.

Since then, I've wondered why TeX isn't used for ebooks and thanks to your comment, now I know.


There is no spec for TeX as far as I know: it's defined by its (effectively frozen) implementation.


OMG.

If your equations require concurrency to render, you're either a genius or insane. Or both.


It's not that we need parallelism for layout to work. Rather it's that, in 2016 with Moore's Law over, new standards for performance-critical things ought to consider parallelism so they can continue to scale in the future.


Thanks, now I know I don't need to spend time to study the layout system of TeX.


TeX is analogous to Presentation MathML but doesn't solve the accessibility concerns because the semantic intent of the TeX source isn't always clear. (How do you read a \phantom{}? Or how can you know that "\left( 7 \atop 3 \right)" should be read as "7 choose 3"?)


TeX has "{n \choose k}".

(Replaced by "\binom{n}{k}" in LaTeX with amsmath.)


The fact that semantic commands are available in TeX doesn't change the fact that it is, unabashedly, a display rather than a semantic language. Knuth's goal was to create a language to facilitate beautiful typography, rather than one directly for expressing mathematical meaning.


Hence amsmath.


You should define your own commands and then use them, in preference to directly using markup. For example, at the start of my essay on non-standard analysis, I define some commands:

- "\near" as a synonym for "\simeq"

- \newcommand{\hyp}[1][\mathbb{R}]{\prescript{*}{}{#1}}

- \newcommand{\powerset}{\mathcal{P}}

This way you can change the notation easily and all-at-once if you need to, as well as making the semantics clear to the source-reader.


I think we do not need semantic representation in HTML case at all. For example, f(x+1) can be a multiplication, or a function, but should we write something like \function f (x+1) ? I think knowing the layout similarity with query is enough for math-ware search engine to identify similar math expressions. Adding too much in Math HTML standard is not helpful but redundant.


For sighted users it may be possible to differentiate, but whether "f of x plus 1" or "f times x plus 1" is read aloud is a significant difference.


What I really want to say is whether f(x+1) is function or multiplication does not matter that much in terms of both browser presenting and math-aware search, moreover, extracting semantics can be done by algorithm from context. Considering that few author want to annotate on their expression semantic, and adding semantic does not really help math-aware search, I argue for the necessity for bringing math semantic notation into WEB.


The same way a sighted person does?


Yes, you could try to build algorithms that turn that markup into semantic meaning but now we're almost talking about computer vision. I don't think a solution like that is likely to ever work reliably, and it's better to encode the semantic meaning so that it doesn't need to be re-derived.


I think they meant that a person needs to interpret β€œtwo numbers arranged vertically between parentheses” according to context, regardless of whether they’re sighted.

Having more semantic information in that scenario would help the sighted just as much as the visually impairedβ€”imagine hovering over an unfamiliar notation with your mouse and seeing it explained.


The other reply made the same point I'm about to make, but I think it's worth clarifying.

Specifically, just like a sighted person has to determine from context whether "a(b)" is function application or variable multiplication, a blind person can be asked to determine from context whether "a left-parenthesis b right-parenthesis" is function application or variable multiplication.

No AI or CV is required for such a reading algorithm. It's unfortunate that this reading algorithm doesn't quite match what two mathematicians would say if they were conversing with each other, but there's at least one good reason to believe this is still a useful reading algorithm: it's exactly the system used by blind mathematician Abraham Nemeth with his readers, called "mathspeak":

    The speech generated by this protocol is not exactly what a 
    professor in class would use, but it is absolutely unambiguous 
    and results in a perfect Nemeth Code transcription. It avoids 
    largely unsuccessful attempts by a reader to describe the 
    notation he sees, accompanied by the shouting and gesturing that 
    such attempts at description engender.
http://www.nfbcal.org/s_e/list/0033.html

(Nemeth notably created the Nemeth Braille Code for Mathematics, which is part of Unified English Braille and is probably the most widely used Braille code for math. MathSpeak hasn't enjoyed the same level of adoption, but only because there's no standard, MathPlayer, VoiceOver etc all have their own ad hoc rules for how to read math.)

(Strictly speaking the example I gave was a verbose variant of the system described in the email: http://www.gh-mathspeak.com/examples/NemethBook/?rule=18 ) [2]:


That's interesting, thanks for the references. I wonder if this is applicable to screen readers as well as Braille?


Oh, I was talking about screen readers the whole time, except for an aside that the speech rules system I was discussing was created by Nemeth who is known for his Braille system. Other than that though, I wasn't talking about Braille at all.


The problem is that mathematical notations can vary from topic to topic, from university to university, even from office to office inside a university department, even within the same lecture of a single professor. (Been there: "Notation will be different in this chapter since I've copied it from another author's paper.")


I'd imagine any real solution would actually be akin to ruby markup, with the author of the content needing to supply it since there will be cases where it simply cannot be usefully deduced by a browser.


> A side benefit would mean that every browser, and every system, in the world would have TeX installed!

Is to run that giant heap of probably not sandboxed or security-audited code in every client actually a thing I would want to have? I mean people are paranoid about js doing unwanted things, but a large distribution of tex code running inside the browser, really?


TeX's bug bounty and the credentials of its author are good reasons why it's probably one of the most reliable pieces of such code that you might have.


Knuth has an older and more narrow concept of 'bug' than the current common definition.


A "bug" in relation to the functionality of a layout software is also vastly different from battle-hardened code in the field of security.


You can already render LaTeX equations with javascript. It would be madness to include everything but a safe subset of commands that are commonly used to render maths would be fine.


There is no "math subset" of TeX. You would have to define it. Or include a full TeX engine including macro capabilities into web browsers.


It wouldn't be that hard to define a "maths subset". The fact that there are javascript LaTeX equation renderers clearly means other people have done it. Here's an example:

https://github.com/Khan/KaTeX/wiki/Function-Support-in-KaTeX

You just need to standardise it. There's no need to include all of LaTeX in a standard that is just for rendering equations.


> widely-used markup language for mathematical formulae; it's called TeX.

For display yes, though it is far larger in scope than that and I wouldn't want to teach someone TeX just for a few formulas, and it is geared for display and not manipulation.

MathML was intended to be just for math type markup so is more concise in that respect, and IIRC was intended to allow automated manipulation in an easy manner (I wouldn't want to do it by hand due to the verbosity of it all, but understanding and manipulating the structure in code must be easier than trying to interpret TeX).


you cannot compute TeX... there are two versions of MathML: presentation MathML (which is like TeX) which shows only the look. And there is content MathML which describes semantic. So you are comparing only one side of the medal with TeX ;) See here https://en.wikipedia.org/wiki/MathML#Content_MathML

But I agree TeX is easier to write and read if you only care about the look and feel.


I made a set of essentially <math> web components that render with KaTeX: https://github.com/justinfagnani/katex-elements

You use them like so:

    <katex-inline>c = \pm\sqrt{a^2 + b^2}</katex-inline>


we also use katex in beaker and it's great. very fast, not as complete as mathjax (also good and we used to use it). https://github.com/twosigma/beaker-notebook


You could put TeX code in a script tag

  <script type="application/x-tex">
    ...TeX code...
  </script>
and process it using a javascript TeX-to-DOM processor... seems likely somebody, somewhere, has done that?

Update: I just read the rest of the comments and discovered KaTeX.


TeX is not a markup-language. It's a (actually two) turing-complete, proprietary (not defined by anything than it's implementation) programming language.

Having a context-free representation is IMHO a necessity for processing untrusted inputs (e.g. webpages). It also makes it a way faster.


I don't think that proprietary applies here.

proprietary adjective 1. relating to an owner or ownership.

2.(of a product) marketed under and protected by a registered trade name.

None of the above is true to TeX. What am I missing?


In a more liberal sense the opposite of "open standard".


I wouldn't describe TeX as a "standard", but it's pretty hard to argue that it's "closed": TeX is arguably the best documented program in history!

Knuth wrote it using his novel "literate programming" technique in which the source code is embedded in the documentation. Every 4-5 lines of source code has roughly a paragraph of explanation. That source/documentation is published for anyone to read as Knuth's "TeX: The Program" book, accompanied by his "TeXBook", a couple hundred pages each.

That documentation has definitely encouraged all the people reimplementing the TeX layout algorithms, for example MathJax and matplotlib for math layout, as well as a number of alternative tex implementations.


I wouldn't call it easy reading though :) By the way, it's written in Pascal, while all current implementations, afaik, are rewritten in C.

Also, I found "TeXBook" very hard to follow; "The Program" is much more understandable. Every chapter in "TeXBook" starts with a few paragraphs of relatively simply text and then descends into a bunch of additional paragraphs marked with "dangerous turn" signs that talk about things that were never mentioned before. As the author put it, they're explained "somewhere", but it's really hard to find that somewhere.


Why not use closed standard than, it is not ambiguous and exactly expresses what you would like to convey. Thanks for clarifying though.


I don't think that it's a closed standard either. Rather, like Perl 5, it seems to be a non-standardised, or, perhaps less pejoratively, implementation-defined language.

On the other hand, somewhat like Perl, it is also a language intended from its beginning to be extended. Knuth has said that he never expected things like LaTeX to be built on top of TeX; he rather thought that people would hack directly on the TeX source as necessary.


I don't know how many of the people lamenting MathML ever came to exist are active math-on-the-web developers. I for one am dealing with third-party math equations on a daily basis and can only dream of the wonder of ubiquitous MathML support in all major browsers.

The details as to why are here: http://prodg.org/blog/mathml_please/2015-09-16/MathML%20on%2...!

As things stand, math handling on the web is inconvenient at best and a nightmare at worst.

Some quick points about the comments here:

- MathML as a spec is akin to SVG, it's meant for the browser/machine first, and not for direct human consumption. You don't read/write SVG by hand, and neither you should MathML.

- Calling MathML a "failed web standard" is fine by me, as long as you continue the remark with "the Chrome and IE browsers have failed mathematicians" and "the math-on-the-web developers have failed as a community".


All you've really written here is that you want a spec and support for properly displaying math content in the browser, but then you jump to happily taking MathML because it already exists in some places. None of that addresses the critiques of MathML as a bad standard, one that fits poorly with the layout and display of the rest of the web.

The author here is saying we want a spec and support for properly displaying math content in the browser, so let's make a sane spec that works with the way the web works, not as if it's in an embedded foreign object viewer.

SVG is actually a great parallel In many instances it is easy to write by hand, until you go to any level of moderate complexity and hit the brick wall of SVG's usability. And that's when you wish it had a sane design, because there are a few aspects that are quite close.


"None of that addresses the critiques of MathML as a bad standard, one that fits poorly with the layout and display of the rest of the web."

I can't but see these as secondary to why MathML isn't supported yet. The current state works already well enough in Firefox, but no one would use it, since why support a single browser? And we've had it working nicely in Chrome before the devs pulled the plug because of lack of manpower. You could try to overrule the cruft from say the upcoming HTML5.1 spec, and find a good alignment with CSS. If it proves annoying or impossible, you could take the process further towards a MathML 4 that offers a CSS-compatible subset, etc. And the W3C Math group could approve of an experimental note much quicker, so that we don't sit around for years waiting for the standardization process.

The web mathematics community is a small one, and most of us have strong opinions on MathML. The way this debate is heading will tear the community in various tinier subfractions and likely continue the "math-on-the web winter" for another decade.

In my eyes, nothing has changed really - we've had browsers silently complicit in ignoring MathML, and a community tiny enough to not be able to move them in any direction. The two important bits for a math-on-the-web developer to have sanity are 1) expose the mathematics content of a page as data (the way we do tables) and 2) do so in a standard cross-browser way. Anything else leads to insane hacks, and tools such as math search engines, clipboards etc. can not be implemented directly.


- MathML as a spec is akin to SVG, it's meant for the browser/machine first, and not for direct human consumption. You don't read/write SVG by hand, and neither you should MathML.

Sadly that's not true for SVG. SVG has many parts that are only there to make writing/editing SVGs by hand easier. Circle, rectangle elements, the commas in the path strings (which follow relatively insane rules. Almost all implementations already had bugs in the correct parsing behaviour regarding commas/whitespace)


Do you think other kinds of content should be included in the web spec too? So, special tags for musical notation, special tags for all forms of maps and geographic, geological features, tags to represent chemical structures, tags for all major forms of engineering notation, etc?

To me, it's not that I don't think there is a use case for MathML, it's that it's too content-specific to make sense at the browser level.

The only way it makes sense in my head is if I think of math as a language that browsers should support on the basis of internationalization. That makes some sense to me, but if that's the idea then OP is right they should use standard notation (as we do for every other language) and not XML, and it shouldn't be styleable by CSS at all.


Agree, we do not include a 4*4 image in HTML by inserting <img><row><col><pixel r="255" b="0" g="128" a="0"> ...... In browser level, I think we should treat math expression as a simple and atom component, and the only benefits to expose DOM/XML/JSON or whatever structural information in webpage is probably you can manipulate/extract info from it (e.g. using Javascript). Do we really need to manipulate a math expression? I think a simple "<math>\frac a b</math>" makes much sense to me. I think it is about the trade-off on HTML granularity.


"Do we really need to manipulate a math expression?"

If you don't, please don't say no one does. Having a math DOM allows for actual interactivity with mathematics, from highlighting, copying subtrees, embedding links, having on the fly computations/simplifications, etc.

Not to mention add-on services like math indexing and search.

Yes, we need to manipulate and machine-read math expressions, if we want to finally take online math workflows to the 21st century.


What I mean is "really need". In fact, there is also the possibility we want to highlight a portion of an image, copy a subimage, etc, but did our HTML <img> tag designed like the way I mentioned? I am the author of a math search engine OPMES (tkhost.github.io/opmes), the search engine works pretty well without the knowledge of DOM structure of math expression. Actually MathML makes a lot inconvenient during OPMES development, to a degree that I choose not to support it.

BTW, if we want a <math> tag that no one will write by hand and only machine will try to understand, then think about why not HTML being designed as some open binary format in the first place?


So, if you can give me an ill-designed analogy, you think you're making a valid point?

Images are not mathematics, they have nothing to do with mathematics. If you take a look at SVG, you may be shocked to find you can do just as much decomposition of principle components as you can do with any DOM, just the way MathML allows you to.

Please substantiate the "works pretty well" claim about your search engine with some data. Your sense of inconvenience comes far from an objective argument.

How many people write HTML by hand? Generating it from a wide range of tools, richtext editors, markdown inputs, etc etc is much more common. Your analogies are just inaccurate.


We can argue all day about if <math> should be like <img> or a <svg>, but I do not think I am wrong about asking whether we really need to manipulate a math expression. I just said "it makes sense to me" to write "<math>\frac a b</math>" does not necessarily mean I stand firmly for making <math> this way. If you think there are cases we need manipulate AND we indeed need to sacrifice HTTP length (Internet transmission time) and simplicity to enable math expression manipulation, that is totally fine. I admit your points and will still argue for my points, I do not believe there is an evident truth for this issue we argue (so as this thread). It is still OK. However, I should point it out I am quite confident in terms of hand-writing "<math>\frac a b</math>" more quickly than other people who use whatever advanced richtext editor they want to write its MathML alternative. You can still doubt how many people want write HTML by hand, but shorter HTML is not bad at everything, many high-volume websites get benefits from it. Think about a very hot math Q&A website in the future, being able to handle a lot request, math rendering computation on client side is a logical solution. In this case, MathJax makes a lot sense. I will agree we can adopt a solution that define short <math> and convert into lengthy MathML at client side, in this case we both do not have to compromise.

As for my "works pretty well", please refer to my answer in another thread below. To be concise, I use subjective words on search engine effectiveness because NTCIR makes it difficult to compare my TeX search engine with "MathML search engine". But I have already shown better efficiency of my engine compared to Tangent, and an important factor is Tangent have to use LaTeXML to parse every TeX back to MathML. Without considering NTCIR, I am willing to make a comparison (probably after done my new version search engine) with some open-source established math engine (e.g. Tangent) on effectiveness and efficiency based on some corpus with both MathML (used by Tangent) and TeX (used by my engine) annotation.


  MathML as a spec is akin to SVG, it's meant for the browser/machine first, and not for direct human consumption. You don't read/write SVG by hand, and neither you should MathML.
That's true, but the problem is that MathML is fairly useless without a robust set of tools to manipulate it and some way to do efficient manipulation by hand. These tools didn't develop in any meaningful way and so people haven't adopted it.

MathML by itself is only half an answer without a way to easily convert a more human friendly format bidirectionally without loss of detail.


I am with you.

Some people write their math by hand in LaTeX or other formats, but many use visual tools and draw math similar to drawing diagrams. Then those tools generate whatever source code necessary.

In those scenarios, MathML would be much easier to deal with than most other formats. Also, Presentation MathML support is much easier to implement than full-scale support of LaTeX or other more complete math solutions including Content MathML.


And "many people" write code by dragging squares around in a GUI and entering equations into spreadsheets, but that method is completely unacceptable to a professional.

The only people who write math in a drawing program are people who don't do enough math to be bothered to learn it.


> Some people write their math by hand in LaTeX or other formats, but many use visual tools and draw math similar to drawing diagrams.

Which people?

Everyone I know who uses equations in MS Word abandoned the clicky tools as soon as the pseudolatex entry language became available.


>Which people?

Currently, high schools students taking the PARCC, SmarterBalanced, and other web-based standardized tests.

And whoa man is it terrible to watch them struggle with it.


Most of you guys seem to be suffering under a big misconception. MathML was never intended to be typed by humans. Perhaps a few programmers might need to do it but not mathematicians, students, engineers, scientists, etc. Saying things like "I hate typing mathematics in MathML" is much like saying "I hate typing my word processing documents in RTF". Just say no. Like most XML languages, MathML is a computer representation meant to be read/written by software. If you want to enter math notation, use TeX or an interactive math editor like MathType (my company's product).


One big advantage of MathML is the ability to copy-paste from webpages directly into software like Mathematica. I don't think there is currently any alternative to MathML in that respect. The author argues (rightly) about styling and presentation of MathML, but doesn't propose an alternative to this copy-paste problem. May be TeX, when you're using MathJax at least ?


The author knows this problems... he is dealing with that for years because he is one of the MathJax programmer or even inventor?! The question is if you want to maintain a huge standard in the browser (which is not the case for many browsers today) only for copy&pasting (maybe this can also be solved in another way with JS clipboard magic).


The syntax is pretty much the same in Mathematica:

- ToExpression[string, TeXForm] for expressions coming from TeX

- ToExpression[string, MathML] for expressions coming from MathML

But how do you grab the TeX formula? TeX'd up formulae usually have the TeX form in the alt-text portion for the image. For example, the first formula in https://en.wikipedia.org/wiki/Calculus#Fundamental_theorem has the alt-text \int_{a}^{b} f(x)\,dx = F(b) - F(a).


Wikipedia seems to be using a mechanism where TeX code is rendered into images, but on marking and copying into an editor becomes TeX again, wouldn't that be a sufficient solution for copying?


Isn't it the default behaviour when you put alt-text in the image tag?


That makes sense, I just mentioned it as an example of how the result is easy to achieve.


Many standards have failed for the web. Anybody remember VRML? Actually I like it that we have not a broad big big standard which includes MathML which only ~0,1% of people care about in the browser. MathML can stand on its own (encapsulated with JS like PDFs in Firefox) or you can abandon it entirely. You are free to choose. It's the best way for everybody.


Actually I like it that we have not a broad big big standard which includes MathML which only ~0,1% of people care about in the browser.

Did you read the article? That's exactly what we have right now. The author's #1 complaint is that MathML is currently a part of HTML5.

He would like it to be removed from the HTML standard so that MathML could evolve on its own, rather than being constrained by its status as a web standard that's not actually implemented by browsers.


Maybe in W3C specification MathML is a part of HTML5 and browsers will not crash when there is MathML inside (actually they also do not crash with custom tags) but many libs are not supporting the MathML tags or respecting them or working with them. Just one quick example... JQuery and open this in Chrome http://sdiehl.github.io/jquery-mathml/ So for the Chrome and Safari guys MathML is already dead and it even has flaws in Firefox. But what is written in W3C documents is not always the truth out there or common in the WWW ;) So yes I agree... rip MathML out of HTML5 specification.


I used VRML to render in realtime, 1994-99, using low end video game cards (overclocked) that output hi-res video, full screen Cosmo Player, record live to tape.

Retakes were trivial, frame rates were 100fps interlaced to video, silky movement - could even take live camera direction.

Otherwise rendering took weeks and most of the budget, mostly contracted for theatre backdrops & live VJing. IMHO VRML died because it was pre open source, Cosmo was locked up tight and never updated.

Realtime in browser 3D from 1994 to 1999, proto-machinima, fun times.


We still use VRML at work, lord knows why.


Since we're discussing various math rendering technologies for the web, I'd like to plug MathQuill [1]. The big advantage MathQuill has over other solutions is that the math is live editable in the browser in typeset form. We use it for input on the Desmos graphing calculator [2], and MathQuill is definitely the best WYSIWYG math editor that I have used anywhere, on the web or elsewhere.

[1] http://mathquill.com/

[2] https://www.desmos.com/calculator


Desmos's MathQuill interface is great. By far the easiest to use math input I've encountered. I haven't tried to input anything really exotic, but it's extremely intuitive for the simple to intermediate level of complexity.

Thanks for your work on Desmos, it's a pleasure to use.


"We need to get together with CSSWG/Houdini TF/etc to work out solutions that help those developers who actually solve the problem of math on the web."

Unfortunately, Houdini seems to have deprioritized font metrics info currently so this might not come soon. See this issue for more details:

https://github.com/w3c/css-houdini-drafts/issues/135

(However, for math we often use webfonts anyway so it is practical to send down glyph metrics with the fonts. That's what both MathJax and KaTeX do.)


One of my biggest complaints about Peter K's "MathML is a failed web standard" is that most people read that without taking note of "web" in the middle. MathML has been quite successful in publishing and as a computer representation. As far as whether or not MathML is a failed WEB standard, I have a lot more to say about that here: http://bit.ly/1ZLfCF8.


Seeing WEB in a comment thread with extensive references to Knuth is amusing.


Thanks for your response. I totally agree it is not useful trying to code semantic into math markups.


I think the way forward here with math and other niche document formats (like MusicXML) is to keep them out of the HTML standards and let people create web component libraries that handle rendering.

As a practical matter this will bring implementations to all browsers much faster than waiting for four large organizations to decide that a format is important and allocate resources. Even better is the real-world experience those libraries will gather so that maybe we get something better than what a committee would have designed in isolation.

For mat specifically, TeX really seems like a better way to go for now. I wrapped KaTeX in a web component so that it's easy to use in HTML: https://github.com/justinfagnani/katex-elements

You use it like:

    <katex-inline>c = \pm\sqrt{a^2 + b^2}</katex-inline>
I imagine the equivalent for music will be better than MusicXML


MathJax looks excellent.


MathJax is very flexible and its output is certainly high-quality. However, it is difficult to configure and package in my experience, and it's pretty slow.

We used it for a few years at Khan Academy before building KaTeX (https://khan.github.io/KaTeX/) which is around 50x faster to render in our experience, not even counting download size. We ran an A/B test for MathJax vs. KaTeX on the Khan Academy site and people completed measurably more exercises with the latter, even though they look identical. (KaTeX doesn't support everything MathJax does though – we still fall back to MathJax for a small fraction of our content.)


Thanks for making KaTex. Love the ease of installation, API and performance!


thanks for pointing out KaTex. Did not know it before


Why do you open the KaTeX source, but not the source for your mobile apps?


KaTeX is more likely to be useful out of the box to other developers, whereas the source of the mobile apps is not going to be a drop-in component that's useful; it'd have to be used more as a learning opportunity by interested devs. That's fine, but it's also a different market. In practical terms, the KA mobile apps also are much more KA-specific and need to be cleaned up before open-sourcing, and there's a worry that people would take the entire app and repackage it as their own (this happens all the time to open-sourced mobile apps, unfortunately). That said, open-sourcing the mobile apps is still something we want to do, so it may happen someday.


Sorry to hijack this thread and I appreciate your response.

Please keep in mind I don't ask about open sourcing the mobile apps to help other developers, rather, it's to help KA. There are lots of contributions and improvements that would be made by the community if given a chance, which would benefit everyone. Contributions would have to be managed yes, but that's something that done all the time for other projects.


We've open-sourced several other mostly-KA-specific parts of the site, including these:

https://github.com/Khan/khan-exercises

https://github.com/Khan/perseus

https://github.com/Khan/live-editor

Contributions are few and far between; good contributions are even more so. It hasn't been worth the time investment, unfortunately.


MathJax is great, but you may also want to look at KaTeX, which is currently much faster:

http://www.intmath.com/cg5/katex-mathjax-comparison.php


MathJax works reasonably well, but I can almost read the TeX source out loud faster than the browser renders it (together with PDF.js, this has been my go-to counterexample whenever someone claims that JavaScript is performant).

With formula-heavy text, I'm forced to split documents artificially into multiple short pages to make rendering reasonably fast.

KaTeX looks quite promising, but it was missing support for quite a bit of basic markup the last time I checked. Perhaps it's better now.


Is there a way to pre-render MathJax, say, for static blogs?


Apparently SVG output is an option since MathJax v2.0: https://docs.mathjax.org/en/v2.5-latest/output.html


MathJax is slow and doesn't work in RSS.

I'd love to let people read my blog entirely in their RSS reader. But unfortunately most of my posts won't render there.


I assume most RSS readers don't support CSS (and webfonts) fully? If they did you could use KaTeX's server-rendering feature to embed the actual HTML, but otherwise images are probably your best bet.


I'll have to check out KaTeX's server rendering. Might be exactly what I'm looking for.


"... MathML is effectively preventing mathematics from aligning with today’s and tomorrow’s web."

I'm probably one of the few that still likes to print things out or buy a physical copy of a book, but I also like reading/producing math for the internet and I often wish there were a better standard out there. I really appreciate how you've laid out several points that demonstrate the situation clearly.


I am building a project and doing research on math-aware search (my project is hosted on https://github.com/t-k-/the-day-after-tomorrow) As for the search engine for math, it is a pity that MathML has become a standard "input" for mainstream research. The most famous conference on Math search: NTCIR, is actually publishing its main dataset/corpus in MathML.

Converting MathML back into LaTeX is possible but error-prone for most moderate-complex expressions (I tried it using haskell pandoc). This makes math-aware search engines have to include a MathML parser. And the most popular digital math document are still mostly written in LaTeX, math search engine thus needs another tool (e.g. LaTeXML) to convert LaTeX to much more lengthy MathML stuffs. As a researcher in this field, all I see is MathML brings a lot overhead to our life.

I think LaTeX is still the ideal way to "input" math expression, it is human-friendly and most commonly used math input. While WEB standard should focus on "rendering" LaTeX. I have to point it out that I am pretty comfortable about what MathJax provides, but if there needs to be a WEB standard on math, I wish some day the standard way to write math expression in HTML is something like this: <math> x = \frac{-b \pm \sqrt{b^2 - 4ac}} {2a} </math>

Instead of: <math display="block"> <mrow> <mi>x</mi> <mo>=</mo> <mfrac> <mrow> <mo>βˆ’</mo> <mi>b</mi> <mo>Β±</mo> <msqrt> <mrow> <msup> <mi>b</mi> <mn>2</mn> </msup> <mo>βˆ’</mo> <mn>4</mn> <mi>a</mi> <mi>c</mi> </mrow> </msqrt> </mrow> <mrow> <mn>2</mn> <mi>a</mi> </mrow> </mfrac> </mrow> </math>


I am the person behind generating the original NTCIR math datasets, and probably most of the research-produced MathML out there. We've recently presented that we have more than 350 million formulas from arXiv converted over to MathML, together with the rest of the papers as HTML5.

As someone who has stared at arXiv TeX/LaTeX for years, I can testify you don't want to be looking at TeX math in actual latex documents, there is a lot more that goes in there beyond the toy formula syntax used on the web.

As also someone who has worked on math search engines and math-rich NLP for a few years, complaining that you have a structured machine-parseable representation for mathematics and wanting TeX instead sounds naive. On one hand, the MathML formulas in the datasets already could preserve the source TeX (the TeX annotations may even be there, I can't remember right now), should you need it directly. On the other hand, you can use any structured methods, such as the ones used by content-based search engines such as MathWebSearch, or handpick any relevant information from the MathML tree to feed it back into a statistical algorithm, as done for example by the WebMIAS search engine.

The most fundamental bit to understand if you're doing research on automated processing of human mathematics is that formulas are two dimensional objects best represented as trees, be they layout trees describing the presentation, or operator trees describing the content, or some other hybrid tree that tries doing both (such as LaTeXML's XMath spec).


1. In NTCIR (main) dataset, I see many cases where <m:math> does not contain an altext (and thus no TeX). I asked LaTeXML author Bruce Miller <bruce.miller@nist.gov> about this, he said LaTeXML will always put the same TeX string as an altext attribute on the <m:math>. So I assume you guys are using some out-dated LaTeXML version? I really want to plead NTCIR to ensure the original LaTeX annotation is kept in main dataset, or please provide both MathML and LaTeX version corpus for researcher to freely choose. This will allow LaTeX-only math search engines being able to compare results with other MathML search engines. You know it is hard to convert all of them back into LaTeX correctly.

2. I wish NTCIR corpus is not that difficult to download (I once wrote a request for NTCIR corpus, but no one replies), please make it public accessible just like what MIaS does: https://mir.fi.muni.cz/mias/

3. My search engine (http://tkhost.github.io/opmes) is actually using structural method, but I still give up MathML and go parsing TeX directly instead. Why? In TeX I can just omit irrelevant command like "\color" and "\mbox", and only focus on a handful math-related TeX subset, and the result is great. Although my search engine can just handle "toy formula syntax", but maybe it is better than MathWebSearch (https://zbmath.org/formulae/) and even beat Tangent (http://saskatoon.cs.rit.edu/tangent/random) in long query. But in MathML, I have no idea why I need to read its lengthy spec, and I see no reason to write a MathML parser.

NTCIR-math conference (and its none-friendly website) makes me unwilling to submit a single paper.


1. Correct, the dataset was generated back in 2013 and will probably be regenerated for the next NTCIR issue.

2. There are annoying copyright issues with making the datasets available for public use. We're working with arXiv to resolve that, it's out of our control for now. It's a long-lasting frustration of mine that the datasets can't be simply made public.

3.You can omit anything you like from the MathML, there is no inferiority to omitting from TeX. "but maybe it is better than MWS" - prove it, submit to NTCIR, and beat everyone. Also, being better than MWS is not an argument that MWS should be denied the very data it needs to run. At the same time you can still obtain whatever degradation you need from the presentation MathML. Failing to recognize any claim to correctness than your own without any substantive proof is not a reasonable position and I urge you to reconsider.

"I have no idea why I need to read its lengthy spec, and I see no reason to write a MathML parser."

You don't need to write a parser, you can use an off-the-shelf parser for XML/HTML5 and handle the MathML reliably and appropriately. In fact you can reuse that from any open source search engine for math, MWS included. Writing a TeX parser on the other hand is something I will always roll my eyes at, since actual real world TeX is not something you can "parse", or do anything with reliably, unless you have a full TeX implementation underneath. Which is 1000x harder than using a parser to deal with MathML.

Finally, whining about NTCIR's UI being imperfect as a reason not to submit is just childish.


Thank you for informing me on my first two questions, so now I understand NTCIR's problem.

At very first I tried to compare my results (MAP, recall, precision) with participants in NTCIR, but I take a lot efforts to get dataset, after which I find I cannot convert MathML back into TeX very confidently, most importantly, my parser-generated tree structure is fine-tuned and very dependent on TeX input, I cannot just take MathML tree structure directly, I need much more efforts than just importing an existing XML parser. Because of these, I can not compare my results with mainstream NTCIR researchers. But I definitely tried very hard, sadly I give up. If NTCIR someday can provide (even if request is needed) TeX data for competition, I will consider to (and able to, willing to) compare my results with NTCIR participants (in order to "prove" it).

Writing a TeX parser only for math search is not that difficult, I have written it, it parses most user-created document on math.stackexchange.com. Although I cannot convince you I get better results, I can argue parsing search-interested TeX subset is effortless (if you only care math-related TeX), I even opensourced my search engine TeX parser. Again, problem is not that easy to grab a XML parser and reuse it in my project, I believe a good math-aware search engine needs to get a tree structure very different from that a MathML structure represents, you get a tree by reusing MWS praser, so WHAT? That tree is not the tree I want, I need a lot effort to convert it, the easy way for me is to convert MathML back into TeX (Since I have already done that from TeX), sadly it turns out to be too complicated to worth giving a shot.


Lastly, I am more than childish to complain NTCIR and refuse submit a paper, I give up putting unworthy and duplicated effort on implementing a MathML parser that generates the expression tree I need (this step is the most difficult, rather than just parsing XML), instead, focusing on finding another conference to publish my efforts, it turns out my paper (a demo) get accepted in ECIR 2016, so glad I did not waste too much time on NTCIR, otherwise I would have missed ECIR.


> MathML is a failed web standard

Well, the line's over there.


any recommended toolbars available for katex or other equation output tools that can be simple to integrate into a site?


I am not sure if the verbosity of XML brings anything to the table here. Why not just implement a TeX based extension?


Because requiring an extension defeats the purpose of being able to run across all browsers.


Sorry I was thinking about something like MathML but more like TeXML without an extension that you need to install. Might not be feasible to implement though.


What is wrong with using TeX ?

MathML is reinventing the wheel. Only, the MathML wheel is square, flat and breaks when used.


pcwalton has a good response for that above:

https://news.ycombinator.com/item?id=11445369




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: