Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Convert screenshots of equations to LaTeX (mathpix.com)
456 points by slbenfica on March 7, 2018 | hide | past | favorite | 98 comments



Immediately made me think of a question on the state of handwriting to LaTeX from a few years ago, and all the massive challenges involved:

https://tex.stackexchange.com/questions/1443/what-is-the-sta...

Under API... you're already doing handwriting? This is uh, nontrivial work to say the least. Really impressive.

The endorsements are a nice touch. :)

Made me really curious how far the system goes, what cases break it.

Oh... nevermind. You have a PDF of examples here: https://docs.mathpix.com

It's honing in on equations without getting distracted by nearby Hanzi or Cyrillic, or even pictures of dogs. Wow.

I keep going back to dig through your resources and getting more impressed.

EDIT: I guess my only constructive criticism is that you should brag more. I like a simple landing page, but I think you've earned a short list of examples of corner cases you tackle well, if the whole API is packed into that free app, because they're really impressive.


The perfection in those examples makes me suspect that they are cherry-picked or part of the training data. Especially the handwritten text is not always clear and could reasonably be interpreted differently. I'd expect a machine-learning model to get at least some things wrong some of the time.

If I wanted to use this in an application, I'd definitely want to see some accuracy figures on validation data as well as a few failure cases to see whether the output remains reasonable even when it is wrong.


The examples are actually very simple compared to a lot of crazy stuff Mathpix can recognize so it's an honest representation of its capabilities. Mathpix is built for perfection because 99% isn't good enough.


> the handwritten text is not always clear and could reasonably be interpreted differently

Digital pen input contains more info than the resulting bitmap; strokes are lost while rasterizing.

That info was the reason how old devices were able to reliably recognize characters written by a stylus. It worked well even on prehistoric hardware, such as 16MHz CPU + 128 kB RAM in the first Palm PDA.


In the OCR world this is known as offline OCR (OCR on the bitmap) vs online OCR (strokes information).

Offline is way harder than online.


This is really awesome OP! Thank you for sharing :)

One note I should make: it was not entirely clear (to me) upon a cursory view of the website, that the purpose of mathpix was to convert handwritten text into LaTeX. For some reason (maybe my coffee hasn't kicked in yet) I thought this was strictly intended to take screenshots of equations on an existing pdf document or a website etc and that will be converted to LaTeX.

My thought at that point was "I wonder if they could do this for handwritten text" and then I looked at the docs and facepalmed..


"Screenshot" is a very odd word to use in this context.


How? "Screenshot" is used correctly by the OP.

For reference:

  screenshot
    (verb) an image of the data displayed on the screen of a computer or mobile device.


Their usage does not match that definition. When recognising writing, they are recognising an image of the page, not an image of what was on the screen at some other point.

Also, your definition describes a noun but claims it's a verb.


The promotional material shows the application interpreting a screenshot of an equation, not handwriting.

I also meant noun, not verb. Thanks.


I see absolutely nothing wrong with the usage here.


Screen grab?


Photograph or picture, surely?


You should reread OPs paragraph. Also if you go straight to the website it shows you the desktop app first which is using screenshots from whatever to turn it into Latex.


Stupid question, how well would this work on a PDF of a latex document?

This would be great for blind people, as pdfed latex is extremely non-accessable, and I have to email authors of papers to get the original latex from them, which is often lost.


As someone who writes with Latex, thanks for pointing out this problem, I hadn't considered it.


> This would be great for blind people, as pdfed latex is extremely non-accessable, and I have to email authors of papers to get the original latex from them, which is often lost.

Many years ago I "translated" course materials into a form which was accessible to a blind grad student. It was a really interesting job and taught me a lot about accessibility.

I was effectively doing latex, but without all the leading \ characters. It made learning latex comparatively easy.

What interface do you use to read equations? Screen reader speaking the straight latex, or do you have some Middleware to make it more digestible when listened to?


I'm not blind, but I have a blind PhD. He seems most happy with just straight latex. I suspect this is because it strikes a good balance between requiring little work for him, or others, as it usually already exists.


Isn't that the whole point of this utility? Or am I missing something?


Well I was hoping it could take a while page, rather than require me to cut out the equations.


InftyReader [1] does this. Its results are good but definitely not perfect. Unfortunately it is seriously expensive, at US $400 per computer.

[1] http://www.inftyproject.org/en/software.html#InftyReader


I may be naive here, but is there anything preventing running regular OCR over a page, and then feeding whatever it couldn't deal with into this? Sure, the plumbing for this is probably missing, but it sounds more like a matter of picking up the shovel rather than inventing something.


I've done some work cleaning up documents for use by a blind student. OCR starts to fail really badly when math is involved. Common errors like O versus 0, any accents or additions to characters, small formulas embedded in sentences, graphs with captions, etc. Can all throw things off considerably.

Best case I could copy and paste paragraphs at a time from a PDF of the textbook (with copy protection removed). Worst case I was retyping or fixing every few words in a sentence.

I was working on this from about 2011-2013. Advances with image processing and machine learning have been significant since then, so there may be much better software available now.

If anyone has ideas or packages they'd recommend, I'd be interested.


It's not a stupid question. You need equation detection + equation OCR.

Mathpix only does the equation OCR part.

I've worked on this (for a PDF to HTML application), mail is in profile if you're interested.


Im curious if the developers are fans of The Big Bang Theory TV series? They were using a smart phone app...and of course was less useful due to it being fiction...

https://www.springfieldspringfield.co.uk/view_episode_script...


Wow!

What kind of sorcery is this!?

Is this using deep learning or "regular" OpenCV or similar?

I would assume it's a highly tuned deep learning algo, but I'm not knowledgeable enough to distinguish a deep learning algo from a pile of rocks...

Edit: Aha, someone already asked this and got an answer.

https://news.ycombinator.com/item?id=16535467


Suggestion: instead of making me download a pdf to see examples of what the results look like, maybe put them on the page directly. You can have a couple. Then put the details in the pdf.

Great software otherwise


This is fantastic!!

Bug report: it appears that multiline summation subscripts are not recognized correctly. For example, Eq. 8 of [1]. These are often created using \substack as part of amsmath.

Awesome tool!

[1]: https://arxiv.org/pdf/1802.01194.pdf


Good catch!!! We're working on it, should be fixed by April 1st


I assume you just got a lot of installs from India, because the large publishing houses contract out many re-typesetting jobs that are basically to take scans of technical texts and convert them back into LaTeX.

I strongly suggest you talk to the publishers about integrating your tech into their TeX.


Want a math-ish PDF and some LaTeX source for training on possible edge cases? Think I might get someone (or something) to read my dissertation this way...


What's your dissertation about?


yes please! nico@mathpix.com


This is awesome work. What's the process of vision -> latex conversion?


We published some research on one approach:

http://lstm.seas.harvard.edu/latex/

Here's how to do it with OpenNMT/PyTorch:

http://opennmt.net/OpenNMT-py/im2text.html


I am curious to know as well.


Yup! Please!!


I really appeciate the testimonials. All software deserves personality and a soul like this!


Any way you could make this available outside the Mac App store? Apple seems to have decided I did something horrible and unforgivable by moving to a different country after creating an account, thus making it impossible for me to use the store.


Good to know, thanks for posting this. We'll post the dmg file on our landing page this week or next.


Just a reminder, I'm still eagerly awaiting this-- I check your site everyday! :)


Awesome! Is the plan for it to be free forever, or what might the pricing look like? Maybe you would consider open sourcing the model?

Also it would be nice of some info on the process. Does work entirely locally, or is images uploaded to the cloud?


from the site: $0.005 per request, first 1000 requests are free


Where does it say that? All I see is the (free) MacOS App.


It's at the bottom of https://mathpix.com/api.html


We used their API to make a simple screenshot2latex tool (select screen region -> puts latex formula in clipboard). From my experience it still fails on a couple of fairly common things like:

- \mathcal letters (recognized as non-mathcal)

- long equations (not recognized at all)

- multi-line equations (not recognized at all)

The screenshot2latex tool: https://github.com/rmst/screenshot2latex/blob/master/scripts...


All three points have seen big improvements in the last week, especially the first too, check again


That's aweson OP. I can't imagine the number of times I've wished for something like this.

Coming from a grad student who hates writing equations in latex. I will probably try this out.


Is there any chance of a Linux client?


They have an api. You can write your own.


I was looking for an API that provides math OCR. Great, going to integrate it into our app soon :-) Let me know if you want to add us to your "trusted by" section.


sure! send the link to support@mathpix.com


Impressive. Any particular reason that you're using

    \left\{ \begin{array} ... \end{array} \right
instead of

    \begin{cases} ... \end{cases}

?


No particular reason, we were thinking to start returning cases syntax soon!


Mathematicians use operator overloading all the time. It would be nice to have a tool that explains to me what an equation actually means in a given context.


You are talking about the semantics of an equation while this tool is already satisfying when understanding correctly the syntax (in LaTeX).

There are actually a number of ongoing research projects to establish standards of semantical mathematical representations. Probably one of the best funded running projects (budget ~10MEUR) which has a work package on this topic is http://opendreamkit.org/ . Work is going on at https://mathhub.info/ from my knowledge. I would like to provide a deep link but the site seems to be in a broken state. Apparently people are working on it right in the moment.


I’ve long had this dream of building a system that would allow for symbolic algebra manipulations, where the representation knows about the semantics. The idea is to replace long and tedious pen-and-paper symbolic calculations, while still having direct control over each step of the calculation. In looking for something like this I’ve found specialized proof verifiers, which are doing something different. At the other end of the spectrum there’s Mathematica, which does symbolic manipulation but doesn’t understand enough semantics.

Are these projects aiming at something like what I’m describing? Or are they more about something else like verifying proofs?


Well, in principle a CAS like Mathematica (or in principle every mature alternative) allows you to implement such a workflow (i.e. naming mappings or an algebra on your objects). I think what Wolfram is concentrating on since a decade is accumulating "knowledge" into their system, i.e. exactly this kind of semantic information we talk about. However, as an end user I don't really see that, it seems to be more well-groomed behind Wolfram alpha.

On the other hand, there are these research projects which however seem to concentrate on standards rather then actually accumulating semantic knowledge.

I would love to see an adoption of hypertext and semantic mathematical notation in scientific papers. Instead of writing $E=m c^2$ in a (LaTeX) paper, we would instead define the symbols machine readably with a code like

   set E = physics/Energy
   set m = physics/Mass
   set c = physics/constants/speed-of-light
I have never seen actual scientific papers which do this kind of stuff, i.e. which are machine readable.


Isn't that usually what the document is for? :)

The verbalization if most LaTeX commands can help learn to read the equations. Sometimes.


This is insanely impressive. Great work. Wish tools like this existed when I was still in school...almost makes me want to go back and do some more math :)


I tried this on some of my (fairly neat) handwritten physics notes and it was mostly pretty impressive. Failures: a lot of lowercase deltas became 8s and it had no idea what to make of hbar (converting it to n, pi, k, or most often refusing to convert the equation entirely). A fair number of little typos, but you'd want to check all its work anyway.


It's not reliable at complex handwritten math yet, it does do printed advanced math pretty reliably, as well as simple handwritten math.


It didn't recognize hbar as 1?


Looks like it could be really useful. Have you considered not requiring billing info to try the API?


we do it to protect against spamming... send me a note at nico@mathpix.com if you want to try the API


The Newton quip is a missed opportunity to make a reference to Diamond, his favorite dog who burned up all his papers...

https://en.wikipedia.org/wiki/Diamond_(dog)


This is gold, great job. The one clear bug that I've noticed thus far is that it seems to correctly identify \hat{} but not finish it correctly when it also has an index, so that: \hat { y _ { i } } erroneously becomes: \hat { y } _ { i }


This changes everything concerning my university life for me. I always was on the verge of doing everything digitally, but it was always cumbersome to type out (La)Tex by hand and convert handwritten notes to digital versions.

Thank you so much!


This is something I would have definitely used if I were still a student, although I'm not a fan of it being a Chrome extension. I'm still curious enough to test it out.


There's also a macOS app.


Very cool! I had this exact idea of a service a few years back, but was one of those things I never found the time or motivation to actually do. The results seem very nice.


This is great! it is very useful for writing papers. I tried it with some of my equations! It actually improved it by adding more approrpiate braces than I had.


The example on the front page thinks the nu is a v. Still cool, though.


Read the text. The example on the front page contains both a \nu (frequency) and a v (velocity), and the tool distinguishes between them correctly.


Gr8! Would love a way to apply this to PDF and web eqns in Windows.


What is your use case?


I maintain an idea notebook in latex via Lyx and presently, while reading a math PDF I have to paste a screenshot in paint, clip there an equation, and paste the image in lyx. Very clumsy.


Can you send me an email? I might have something like that working soon.


Cool! Any plans to port to Windows and Linux also?


Just tried it out and it worked great. Nice work!


Add code generator to complete the product. ;-p


I'm sorry if it's off the topic. The testimonials really cracked me up. “ If I had known about Mathpix earlier, perhaps I would have had enough time to work out the Grand Unified Theory. ” - Albert Einstein


My favorite is Alan Turings quote:

> Mathpix's AI definitely passes THIS Turing test!


me to lol, really clever way to get attention.


This looks neat. Also, the Testimonials section made me laugh. Well done sir.


So I guess I can expect to see more low quality mathematical typesetting. The technology is great, but I don't see how it will help with high quality typesetting.

I can type an equation into LaTeX more quickly that I can photograph it and then go in and manually correct all the spacing issues. And there are spacing issues. The examples PDF has things that just look horrible. No small spacing or negative spacing to space out things like matrices and integrals etc. If I'm going to manually tweak it anyway I might as well do it manually from the start. Typing it up is something that gets quicker with practice like everything else.

But what I can actually see happening is people not tweaking the output manually. Either train yourself to use TeX properly, or let someone do it for you.


You may be correct, but your comment comes off sounding a bit elitist.

Typesetting does not always need to be absolutely perfect, sometimes it just needs to work.

For someone who isn't 100% fluent in TeX, making small edits is much easier than having to look up all the syntax and symbol names required.


It's not elitist at all. Anyone can learn TeX.


> Either train yourself to use TeX properly, or let someone do it for you

I will let you do it for me. I usually work late at night just before the deadline. What’s your phone number?


Is that a serious offer? I charge 50GBP per hour. Still interested?


If you can match $0.005 per request, I'll hire you.


I think the point is that for 50 GBP you will get a much higher quality typesetting than spending 1 GBP on the API.

As.usual, it depends on the use case. If you are producing a textbook or something that requires high quality typesetting, you will probably pay for it.

If not, you can use a tool like this one but you will have to accept certain margin of error.


"Either train yourself to use TeX properly, or let someone do it for you."

The same complaint could be raised about using a biro (vs calligraphy), using a printing press (vs handwriting) or using a wysiwyg editor (vs the old word processing paradigm).

In my view, this sort of accessibility is a great advance.


I disagree. Technology should either make things easier or better. Making things easier but worse is not progress. Your analogies are incorrect.

Calligraphy is not handwriting. It's an art form and very slow to execute.

The printing press isn't just an "easier handwriting". It's actually harder to use. But it's better. It's closer to calligraphy than handwriting, in fact.

I'm not sure what your argument about WYSIWYG text editors is. In my experience they produce absolutely horrible results because, ironically, they require the user to know more about typesetting than TeX does if high quality output is desired.


thanks for your comments, can you share what you want your Latex to look like? how do you want to space out matrices and integrals? we're working on the Latex formatting so we can improve this


I would have put a small space (\,) between the upright "det" and the matrix. An integral sign is usually followed by a negative small space (\!) and the "dx" is separated by a small space (\,).

Trouble is I'm not sure if these are absolute rules. I'm sure you can find pathological cases where the spacing will be wrong. Maybe the best thing would be to allow the user to modify the spacing easily after the automatic version is made.

I trust you guys have read the TeXBook? Knuth dedicates two chapters to subtleties of mathematical typesetting.


Do you have a guide that you can point to for the correct typesetting in LaTeX?


The best treatment is in the original manual for TeX: The TeXBook by Donald Knuth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: