Symbolic mathematics finally yields to neural networks

todd8 · on May 26, 2020

About 50 years ago I was fascinated to see a program that could do symbolic integration. Now, programs like Mathematica can do much harder, more complex integrals.

Professor Patrick Winston pointed out that most of the AI programs we were going to look at in his class (chess, natural language, 3D vision in block world, etc.) ended up, like integration, as simple code plus a database of facts.

Back then, there was general optimism about the future of AI. No one anticipated how slow progress would be while the capabilities of the hardware grew a million-fold.

The work referenced in the article is interesting. It appears to be another small step in advancing AI. It's a small, but difficult, step like almost every other advance in AI, and I admire the work done by those working in this field.

The problems of solving simple differential equations and symbolic integration at the first year calculus level are not really advanced math. Humans solve these problems with a relatively small bag of tricks that transform a symbolic integral into a simpler form. A program can do the same thing with an even more detailed database of transforms that can be attempted at each point in the search tree until a simple solution is reached.

The article claims that the new program can solve difficult integrals. This is interesting because hard to solve integrals are often associated with real physical phenomena. See for example the triple integrals of W. F. van Peype which arose while he was studying magnetism in different materials. These, relatively, plain looking definite integrals stumped some of the world's most famous mathematicians. See [1] and/or [2] for their interesting history.

[1] Paul J. Nahin, Inside Interesting Integrals, Springer, 2015. Section 6.5, The Watson/van Peype Triple Integrals.

[2] I.J. Zucker, 70+ Years of the Watson Integrals, http://www.inp.nsk.su/~silagadz/Watson_Integral.pdf

jiggawatts · on May 27, 2020

Even that first integral is a doozy! Mathematica spent minutes on it and spat out an answer that looks much more complex than the one in the paper: Integrate[Pi * Csc[x/2]^2 * EllipticK[(-Cos[x]) * Csc[x/2]^4], {x, 0, Pi}]/Pi^3

It can't even evaluate the answer numerically, it just freezes for minutes!

Edit: about 10 minutes later it spat out 1.393203930, so it can solve the integral, but not easily.

Update: This paper reminds me why I vehemently hate modern mathematical notation, with subscripts and superscripts can mean anything at all and everyone's idea of "convention" is different. On the second page there's an algebraic expression for the solution, which at first glance seems to be:

4/Pi^2 * EllipticK[Sqrt[1/2 - Sqrt[1 - w^(-2)]/2]]^2

But this evaluates to 1.76351 not 1.39320 as given in the paper. The "K^2" in the paper is not the "complete elliptic integral of the first kind" squared, it's some other function that squares its input somehow. Or something. But that begs the question of why formulate this in terms of a function of a square root squared!?

Helpfully, they provide the definition in terms of the hypergeometric function a bit lower down, but that appears to be wrong as well, providing yet another result if evaluated numerically.

resolvent · on May 27, 2020

The difference lies in the definition of `EllipticK` which uses quantity `k` whereas the paper uses `k^2`. Also, symbolically integrating in Mathematica is much slower. Instead use `NIntegrate` to skip straight to a numerical value.

Print[ NIntegrate[ 1/(1 - Cos[x] Cos[y] Cos[z]), {x, 0, Pi}, {y, 0, Pi}, {z, 0, Pi}, Method -> "LocalAdaptive"] / Pi^3 // Timing]

{0.121543, 1.3932012917627028}

Print[ NIntegrate[ Pi Csc[x/2]^2 EllipticK[-Cos[x] Csc[x/2]^4], {x, 0, Pi}] / Pi^3 // Timing]

{0.021917, 1.3932039076998044}

Print[ Gamma[1/4]^4 / (4 Pi^3) // N // Timing]

{0.000029, 1.3932039296856769}

Print[ 4 / Pi^2 EllipticK[1/2]^2 // N // Timing]

{0.000375, 1.3932039296856764}

For `NIntegrate` the default method is `GlobalAdaptive`. You can try different methods and rules to optimize for the function and bounds.

jiggawatts · on May 27, 2020

My guess was that the symbolic output of Integrate could then be evaluated to much higher numerical precision in a reasonable time, but I couldn't transform the result using simple Mathematica operations into something tractable.

It's a bit of a challenge to get either the first or second forms you provided to converge if requesting 10 or 15 digits of precision. However, your third form involving the Gamma function evaluates to very high numerical precision quickly. I can get 1,000 digits in under 50 milliseconds, which is "good enough"!

Your usage of NIntegrate tuning options shows a few things:

1) That these days I only use Mathematica as a glorified calculator, evoking a mental image of a 500-ton press being used to crack walnuts.

2) Even the best symbolic CAS in the world isn't magic, and requires hand-holding.

3) With the right knowledge even a very hard nut can be easily cracked!

I wonder why Wolfram Research hasn't added some basic multi-threading capabilities to Mathematica to try different "flags" in parallel, racing the various approaches on each CPU core to see which one wins...

zelphirkalt · on May 27, 2020

Wouldn't any solution result in a new trick to add to the bag of tricks, and then make the symbolic programs for integral calculation surpass again?

amw-zero · on May 27, 2020

Honest question. Are you fearful of the moral implications of AI? Whenever I hear someone that is fascinated about AI and only thinks about it as an intellectual pursuit, I’m curious if they are thinking at all about the consequences of powerful AI.

From where I sit, the bad outweighs the good.

p1esk · on May 27, 2020

I'm fearful, but I also know that if I don't build it, someone else will. And I'd prefer it would be me.

todd8 · on May 31, 2020

Things looked very different in the early 1970s; that's when I took my first AI class. Now, naturally, I worry about technology that can such deep and broad impacts on humans. Back then most of my programming was on punch cards submitted to mainframe operators behind a glass window.

fao_ · on May 27, 2020

These types of worries always strike me as worrying about keyhole surgery going wrong, before we are capable of making scalpels, anasthetic, or even video cameras -- or hell, before we even know what a tendon is or what the purpose of blood is. Or worrying about the challenger 2 explosion when we can't even make gunpowder. Or worrying about the logistics of flying cars in three dimensions and traffic crashes before we are even able to build a boiler engine to drive a train.

We are so far away from GAI at the moment that I don't for one second actually worry about the moral implications of General Artifical Intelligence.

I don't see the point in worrying about something when we know literally nothing about that thing, and barely have a path to making it. It's very likely that by the time we are capable of making GAI (Excusing the very probable idea that we will be able to simulate a brain, but not at any proper speed -- see the three body problem and the challenges with simulating literally any other physical systems), there will be half a dozen problems we do need to worry about, that we cannot forsee. There will also be half a dozen limitations that mean that our current worries are essentially worthless. It's the same with all new technology.

It's also interesting that people who tend to worry about GAI never worry about current levels of AI, especially in a military context. They seem entirely unconcerned with being worried about literally crappy and half-baked neural networks being deployed for use in drones. They seem entirely unconcerned with the lack of proper dataset balancing and sorting that ensures that current AI models do not have racial bias (or, indeed other types of bias).

Just last year I saw a twitter post about a startup that was re-creating literal phrenology, using AI to try and profile whether people were criminals or not based on facial shape. The typical Less Wrong / MIRI folks never seem to be worried about that, no, they spend their time in fear of Roko's Baselisk and other currently-impossible scenarios. They literally purged posts, threads, and comments that made any mention of that under the utter and complete fear that maybe in the far flung future a very bad simulation (Unless, their brains are cryogenically frozen, I guess, but it's very likely that brain structure would degrade under the immense timespan anyway) of them would be tortured for their current actions, by a good AI, that had apparently gone so insane that it thought that torturing low-fidelity simulations of people in the future could affect the past and cause it to be created faster.

Speculating about the future can be a positive thing, but I don't see how this is at all useful or healthy.

amw-zero · on May 28, 2020

You can’t imagine any reason to worry? At all?

Worrying about the future of surgical technology is very different. The end goal of surgery is to save a life or improve the quality of life, and it involves restoring a single person back to working order.

The end goal of AI is to _think_. The upper bound on that is horrifying. Once something can think it can build. Once something can build it can multiply. The upper bound on AI is replacing the human species.

I’m not saying I’m nervous about this happening next year. I know how terribly inept we are at true GAI. I’m thinking purely abstractly, and in that light I think we should more serious about ground rules for AI.

fao_ · on May 31, 2020

Can you re-read my post more closely and actually critique it. You chose one of my points, arguably the weakest (partly because it's an analogy -- analogies are mostly for flavour, they don't make a good argument but they help you to appreciate where I am coming from) and ignored the stronger criticisms I posted after that.

int_19h · on May 30, 2020

Why is the upper bound on thinking horrifying? We are currently the upper bound within our own domain, and on the whole, we've been getting better as we lifted that bound.

peterlk · on May 26, 2020

This is one of the most exciting things I've seen come out of AI research for my own interests. The way that they distributed the mathematical equations is called an abstract syntax tree (maybe they mentioned this, but I was excited-skimming), and is also how computer code is represented. It also happens that there is _a lot_ more computer code laying around in computer-readable format than high school and college math problems. With this, you get: metamorphic code, better optimizers, more abstract programming languages (maybe), and probably lots of other steps forward.

If anyone is working on AI+programming languages, please send me a message. I'd love to work with others on this problem.

tasdfqwer0897 · on May 26, 2020

We have been working on this recently on the Google Brain team.

We are working both on synthesizing programs from scratch (see https://arxiv.org/abs/2002.09030 for example) and on understanding computer programs using machine learning (see e.g. https://arxiv.org/abs/1911.01205).

I'm always happy to correspond with people about these topics.

logicchains · on May 27, 2020

Have you considered attempting to develop a "differentiable programming language" so that it's possible to do gradient descent directly on the space of programs? There was some work on building a "differentiable lambda calculus", https://core.ac.uk/download/pdf/82396223.pdf, but I haven't heard of anyone trying to use something like that in modern program synthesis.

m3at · on May 28, 2020

Late on the thread but you might be interested in this: https://www.tensorflow.org/swift

logicchains · on May 30, 2020

That's something different: it allows doing gradient descent on the space of paramaters for programs, not on the space of programs.

yazr · on May 26, 2020

Love to chat. Sent u an email (former HFT and kernel guy).

One topic which is feel is neglected is a good GCN (or any GNN) to operate on existing code trees. Most approaches seem to prefer seq or at most tree inputs.

Is this simply not finding yet a good network architecture, or is it a performance issue ?

siv_ran7611 · on May 27, 2020

I dont see any email listed on your profile. Would like to have a chat to work/learn on this stuff

Jugglerofworlds · on May 26, 2020

I'm planning on applying for PhD programs this fall to work in this area. There are only a few places in the world right now that I know of working on these types of problems. They are:

* Martin Vechev, ETH Zurich

* Dawn Song, University of California Berkeley

* Eran Yahav, Technion

* Miltiadis Allamanis, Microsoft Research Cambridge

If anyone knows other advisors looking for graduate students in this area, please let me know. Due to personal circumstances I can most likely not apply to ETH Zurich or Technion (I don't speak Hebrew anyway), which leaves me with only one potential advisor in a program that I really want.

There is also the Python writing model that Open AI showed recently at the Microsoft Build conference, so maybe there is some interest growing at other places as well.

I was also recently working on a deep learning decompiler but was unable to get my transformer model to learn well enough to actually decompile x64 assembly. I have the source code for the entire Linux kernel as training data, so it's not an issue with quantity. If anyone is interested in helping out with this project, please let me know in a comment.

p1esk · on May 26, 2020

I have the source code for the entire Linux kernel as training data, so it's not an issue with quantity

Linux kernel is only ~30M LOC. That's a really small dataset. For comparison, the reddit based dataset for GPT-2 is 100 times larger. Try using all C code posted on Github.

decompile x64 assembly

You can't "decompile" assembly. Either you decompile machine code, or you disassemble assembly code. The latter is easier than the former, so if you're trying to decompile executables, then perhaps you should train two models: one to convert machine code to assembly, and the other to convert assembly to C. Assembly code produced by an optimizing compiler might differ significantly from assembly code which closely corresponds to C code.

tsimionescu · on May 26, 2020

> perhaps you should train two models: one to convert machine code to assembly, and the other to convert assembly to C.

Is the step of going from machine code to gcc-produced assembly not trivial? Is gcc actually producing assembly code that an assembler needs to do more with than convert to the corresponding opcodes?

p1esk · on May 27, 2020

There are two kinds of assembly: 1. assembly that corresponds to optimized machine code, and 2. assembly that closely corresponds to the original C code. As I said, these two assembly versions might look very different depending on optimizations performed by the compiler. You can reduce the difficulty of learning the conversion from machine code to assembly at the expense of increasing the difficulty of learning the conversion from assembly to C code (and vice versa).

lmeyerov · on May 26, 2020

As part of the short list:

* Swarat: https://blog.sigplan.org/2020/04/15/synthesizing-neurosymbol... <- probably heaviest on the math side wrt PL people

* Ras Bodik (Berkeley -> UW): esp. w/ Pedro Domingos and all the MSR collaborators (Sumit Gulwani, ...) <- a bit biased b/c I was in the group while at Berkeley; Ras + Dawn are crazy creative

* Percy Liang (Stanford): Coming from the ML side and w/ a long-running interest here

Mehdi2277 · on May 26, 2020

UCL is another good choice for this area. I did some research in this area back in undergrad. There was a workshop called NAMPI (neural abstract machine program intelligence) on this subfield so I'd recommend going through the accepted papers for it and seeing where all the faculty come from.

Also general advice, after finding a few papers you like go through the papers they cite that are relevant to the subfield and also papers that cite them. That's one of the best ways of finding other related research that interests you.

gnolne · on May 26, 2020

I've listened to some talks by Song, and my impression is that she does not seem to have a strong technical grasp of what's going on, for what it's worth. There is a lack of clarity in her communication style. I'd try to work with Vechev, Liang, UW PLSE group if I were you...

idoby · on May 26, 2020

If the language barrier is the only reason keeping you from applying to the Technion, or any other university in Israel, graduate courses in Israel will generally be given in English if a non-Hebrew speaker is present. It's standard practice, just email the instructor in advance. Besides, Haifa is a very nice city.

You should also check out professors at Ben Gurion University, Tel Aviv University and the Hebrew University who might have similar interests, IIRC. Feel free to hit me up if there's some page in Hebrew that doesn't translate well.

tpetricek · on May 26, 2020

Another person to talk to would be Brooks Paige, who recently joined UCL: https://tbrx.github.io/ This is totally not my area of research (I'm a PL person), but I know Brooks from the Alan Turing Institute and I think he'd be a great advisor for this kind of work.

risubramanian · on May 26, 2020

Premkumar Devanbu at UC Davis works in this area. I've taken his graduate course on the subject.

NwtnsMthd · on May 26, 2020

Maybe don't discount the Technion so easily. All of the professors are fluent in English as is most of the student body. There are many graduate students who start studying there without knowing Hebrew.

Zardus · on May 27, 2020

There are a lot of groups working in this area. I'm going to pitch ours first, and then also point you to some others!

My colleagues and I run the SEFCOM lab at Arizona State University (https://sefcom.asu.edu/). Most relevant to your interests, Fish Wang (fellow SEFCOM professor) and I (Yan Shoshitaishvili) founded the angr program analysis framework (https://angr.io/) back in our gradschool days and have continued to evolve it together with our students in the course of our research at ASU. We're actually currently undertaking a concerted push into decompilation research, using both PL and ML techniques. This research direction is a passion of ours, and there's plenty of room for you here if you're interested!

Of course, we also do quite a bit of work in other areas of program analysis (including less overtly "mathy" techniques, like fuzzing) as well as other areas of cybersecurity. We are also quite active in the Capture the Flag community, if that is something that interests you!

Other places that do research in program analysis off the top of my head:

- Chris Kruegel (https://sites.cs.ucsb.edu/~chris/) and Giovanni Vigna (https://sites.cs.ucsb.edu/~vigna/) at UCSB (disclaimer: I got my PhD from them!)

- Davide Balzarotti (http://s3.eurecom.fr/~balzarot/) and Yanick Fratantonio (https://reyammer.io/) at EURECOM

- Antonio Bianchi (https://antoniobianchi.me/) (and, soon, Aravind Machiry) at Purdue

- Alexandros Kapravelos (https://kapravelos.com/) at NCSU

- Taesoo Kim (https://taesoo.kim/) at Georgia Tech

- Yeongjin Jang (https://www.unexploitable.systems/) at O(regon)SU

- Zhiqiang Lin (http://web.cse.ohio-state.edu/~lin.3021/) at O(hio)SU

- Brendan Dolan-Gavitt (https://engineering.nyu.edu/faculty/brendan-dolan-gavitt) at NYU

- Wil Robertson (https://wkr.io/) and Engin Kirda (https://www.ccs.neu.edu/home/ek/) at Northeastern

If you have questions about the PhD process or this research area, feel free to reach out: yans@asu.edu or @Zardus on twitter!

thomasahle · on May 26, 2020

They don't actually use trees in the paper though. https://arxiv.org/abs/1912.01412

They express everything in reverse polish notation, exactly so the transformer network can work on streams and not care about the nesting levels.

It's a bit odd that this article doesn't talk about that.

gavinray · on May 26, 2020

Forgive me for being entirely ignorant about the field, but I've thought about this frequently and would be grateful for a decent answer:

How theoretically feasible would it be to create Meta-AI.

My naive (incredibly over-simplified) notion goes something like this:

- You convert open-source code into a generic/universal AST/IR format, and figure out a means to represent function inputs and outputs (behavior)

- You use all of the OSS code in the world as data for this, to train a neural network to start making predictions about what kind of AST inputs in a program lead to what kinds of outputs (function synthesis)

- You tell the program to look at other deep learning/ML code, and to attempt to optimize it's own code, while running, inside of isolated processes/VM's.

Something like Erlang/Elixir or Lisp should be capable of this meta-programming and self-code modification while still running. And the BEAM would theoretically be able to isolate actors in the event that the epoch doesn't pan out very well and just kill that branch/lineage off.

Is this insane? I've wondered about this for years.

gnramires · on May 26, 2020

No it isn't, in fact this line of work has already a tradition seen here:

https://en.wikipedia.org/wiki/Genetic_programming#Meta-genet...

Although it uses genetic programming instead of neural networks. Those approaches are equivalent in power, the difference is only that neural networks are more amenable to parallelization and we know a very good baseline optimization algorithm (gradient descent and friends).

The problem is that coming up with an optimizer/architecturer is hard. So the benefit of running a meta-optimizer to solve a problem is difficult to realize versus a handmade neural network. The problem needs to be so large the network would need to re-engineer itself to solve it -- but think how many resources have already been spent in engineering neural networks (by humans no less, which are quite powerful optimizers!). Unless the problem is truly titanic (w.r.t. other current problems) the yields might be small.

What you can do in a similar vein is gather a large set of problems and train a 'Neural architect' or something like that that then can be applied many times to new problems. This allows sharing this cost over many new networks. I think it could make sense for governments and large organizations to get together and create this sort of thing. If you know the costs of training a large neural network, imagine the cost of training hundreds of millions to train some kind of neural architect (NA).

There are milder versions of this, where the architecture search itself isn't trained:

https://en.wikipedia.org/wiki/Neural_architecture_search

https://en.wikipedia.org/wiki/Automated_machine_learning

Of course even this approach has limitations (even if you theoretically allow it to create essentially arbitrary architectures), because it will still have difficulty "thinking outside the box" like we do using huge amounts of mathematical expertise and intuition (see EfficientNet -- the insight in that paper would be difficult to arise from a NA network) -- it's not the thing that will solve all of our problem forever; but it would be pretty significant (perhaps towards making large AI-based systems with multiple components, say self-driving cars and robots of the future).

gavinray · on May 26, 2020

Wow, thank you for the answer and the links. I had tried to search for something like this but I didn't have the right terminology.

I really hope to see something happen in this area before I die, just for the sake of seeing it happen.

I often wonder about whether neural networks might need to meet at a crossroads with other techniques.

Inductive Logic/Answer Set Programming or Constraints Programming seems like it could be a good match for this field. Because from my ignorant understanding, you have a more "concrete" representation of a model/problem in the form of symbolic logic or constraints and an entirely abstract "black box" solver with neural networks. I have no real clue, but it seems like they could be synergistic?

There's a really oddball repo I found that took this approach:

https://github.com/921kiyo/symbolic-rl

"Symbolic Reinforcement Learning using Inductive Logic Programming"

dunefox · on May 26, 2020

> You convert open-source code into a generic/universal AST/IR format, and figure out a means to represent function inputs and outputs (behavior)

I think this alone is a herculean task.

gavinray · on May 26, 2020

Definitely, it's _practically_ impossible, but I am just curious whether the idea/theory of it isn't.

0-_-0 · on May 26, 2020

> Is this insane?

No, in fact I belive some variant of this approach is what will eventually lead to AGI. (you can probably do a similar type of solution but approaching from the direction of neural networks where you can use backprop)

throw_away · on May 26, 2020

This vid from the MS Build conf is pretty amazing (skip to around the 30m mark for the mindblowing stuff): https://twitter.com/i/broadcasts/1OyKAYWPRrWKb

keenmaster · on May 26, 2020

"With this, you get: ... more abstract programming languages (maybe)."

That would be great. If you could guess, how would you go about using symbolic math and neural nets to create a more abstract programming language?

MaxBarraclough · on May 26, 2020

If you'll entertain my clueless guesswork:

If it could be used to greatly accelerate SMT solving/constraint solving, perhaps it would be possible to run a formal model as if it were code, i.e. fast and scalable constraint-based programming. That's about as high-level as you can get, short of a requirements document. I'm not sure if that's the kind of symbolic mathematics that's being studied here, as they seem to be looking at mathematically interesting expressions, rather than boring-but-enormous constraints.

Or, somewhat related, perhaps it could be used to help scale formal verification with languages like SPARK and ZZ.

orange3xchicken · on May 26, 2020

In general, I think neural networks really struggle with np hard problems. Current state of the art is bottlenecked wrt to even subtle generalization. The strong baselines have been variations on the pointer-network architecture for some time now. Even though they can do reasonably well on restricted problems (e.g. instances of sat/traveling salesman with fixed # of nodes), they struggle to generalize to instances of the problem with variable # of nodes - not to mention varying the problem itself. Some popular approaches also incl. applications of Graph CNNs & reinforcement learning. Don't really have much more than a cursory idea of the current directions though.

To me, more exciting is some kind of joint collaboration between modern ml systems & formal solvers.

bglazer · on May 27, 2020

You might be interested in the line of work around differentiable relaxations of formal solvers that was kicked off by Zico Kolter et al. They add a Max=SAT solver as a differentiable layer. This allows the neat trick of solving handwritten Sudoku puzzles by composing a ConvNet with the differentiable Max-SAT layer.

[1] https://arxiv.org/abs/1905.12149

orange3xchicken · on May 27, 2020

Thanks - that's great! I'm familiar with Kolter's work on adversarial robustness.

sharemywin · on May 27, 2020

I found this video interesting. It takes comments and generates code.

https://www.youtube.com/watch?v=fZSFNUT6iY8

KhoomeiK · on May 26, 2020

I believe Jacob Andreas at MIT is working on compositional neural net architectures for applications in programming language theory at the moment

_0ffh · on May 26, 2020

To me, the article somewhat misses the point of what's interesting here. Using ASTs to represent equations, or even whole programs has plenty of precedents in ML/AI. I'd have liked to know how exactly they translate these trees to a representation suitable for an ANN. Fortunately, the paper seems to be easy to find and access (it's [1], I guess).

[1] https://arxiv.org/abs/1912.01412

orange3xchicken · on May 26, 2020

It looks like they go from tree -> sequence via prefix notation. I'm curious why Lample decided on this seq2seq approach when it seems that there might be models which could be more naturally applied to the tree structure directly [1, 2]

[1] https://arxiv.org/abs/1902.07282 (an AST translation system)

[2] https://arxiv.org/abs/1609.02907 (GCNN)

shmageggy · on May 26, 2020

For the same reason people use huge transformers for sentences in natural language (which are also tree structured): they scale really well. If you have enough data, huge transformers have huge capacity. If you notice, this paper is entirely about how to cleverly generate a massive dataset. There is no novelty in the model -- they just use a standard approach described in two paragraphs.

n-m-j · on May 26, 2020

From your linked article, it seems they just use the trees to generate prefix notation.

globular-toast · on May 26, 2020

Symbolic mathematics (or computer algebra as it is more commonly known as) was one the original driving forces for AI. Differentiation turned out to be quite easy once things like ASTs and other data structures were developed to represent polynomials and other elementary functions. Integration is way more difficult and while AI research made great inroads the problem was eventually solved by algebraic methods. But the full algorithm (the Risch algorithm) is about 100 pages long and has never been fully implemented. The Axiom computer algebra system is the closest AFAIK, but the premature loss of Manuel Bronstein set it back a bit (that system is fascinating as well, it's a literate program in Lisp).

The AI approaches always had one great problem: if they can't find an integral you still have no idea whether an integral exists or not. The Risch algorithm, on the other hand, can tell you for sure if an (elementary) integral doesn't exist. Axiom is fully capable of saying "no", but can't always tell you what the integral is if it does exist.

Using an AST to represent expressions isn't novel, by the way. I implemented such a system as an undergrad computer science student (I also implemented complete integration of rational functions).

augustt · on May 26, 2020

Discussion about the paper a while back: https://news.ycombinator.com/item?id=21084748. Conclusion seemed to be that the comparison against Mathematica was unfair since Mathematica's execution time was capped at 30s.

hyperbovine · on May 26, 2020

Interesting, since in my experience Mathematica's execution time is also lower-bounded by 30s.

thomasahle · on May 26, 2020

I don't think that was why it was considered unfair.

The problem was rather that they tested on the same distribution the model had just seen 80 million examples from.

microtherion · on May 26, 2020

Here's a somewhat more technical critique of the approach (and of the evaluation cases the authors used): https://arxiv.org/pdf/1912.05752.pdf

newpycai · on May 26, 2020

> It is important to emphasize thatthe construction of LC is entirely dependent on the pre-existingsymbolic processors developed over the last 50 years by experts in symbolic mathematics.Moreover,as things now stand, extending LC to fill in some of its gaps (e.g. the simplification problemsdescribed in section 3) would make it even less of a stand-alone system and more dependent onconventional symbolic processors. There is no reason whatever to suppose that NN-based systemswill supercede symbolic mathematics systems any time in the foreseeable future.

That's the gem of the review.

newpycai · on May 26, 2020

And this paragraph

> It goes without saying that LC has no understanding of the significance of an integral or a derivativeor even a function or a number. In fact, occasionally, it outputs a solution that is not even a well-formed expression. LC is like the worst possible student in a calculus class: it doesn’t understandthe concepts, it doesn’t learned the rules, it has no idea what is the significance of what it is doing,but it has looked at 80 million examples and gotten a feeling of what integrands and their integralslook like.

neatze · on May 26, 2020

From article: “You need to be confident that it’s going to work all the time, and not just on some chosen problems” by mathematician Frédéric Gibou

If I understand correctly, mathematical solutions can be verified, while neural network solutions would be very hard, if not impossible to verify in reasonable time.

currymj · on May 26, 2020

There are three issues I can think he might be talking about:

1) if you get a solution, can you be sure that it's right? in this case, the neural network might spit out a wrong solution to the integral. However, this solution is easy to verify by just differentiating (which they do in the paper), so I don't think there's a problem here.

2) will the method always return a good answer? I don't think this is a requirement; Mathematica sometimes fails to integrate but people accept that.

3) Does the method work for larger, more interesting families of functions than the limited families tried in the paper? I think this is what Gibou is talking about, and seems like the strongest objection. This technique might just fail in some circumstances for reasons that are hard to anticipate.

neatze · on May 26, 2020

How does mathematica fails; does it returns false positive or mathematica returns "no idea how to do it" ?

currymj · on May 26, 2020

no false positives (except for rare bugs); it will usually either return some partial solution (like a simplified version of the input), or else just keep running for a very long time until you give up.

jfkebwjsbx · on May 26, 2020

It depends on the problem.

For integration, you can just derive.

For infinitely many other problems... verification is way harder.

6gvONxR4sf7o · on May 26, 2020

Even for integrals, derivation isn't a silver bullet. You need to derive and also test for equality. Determining whether two expressions are the same (or enough the same, like a being the same as a^2/a except for a=0) can be really hard.

AlexCoventry · on May 27, 2020

Can't you just check numerical equality at a few million random points?

empath75 · on May 26, 2020

is it usually the case that derivatives are easier to compute than integrals?

danbruc · on May 26, 2020

The general algorithm for calculating integrals [1] is rather complex and I guess not suitable for humans so that calculating integrals sometimes looks much more like a black art then calculating derivatives does. On the other hand one could argue that there are algorithms for doing both and so there is no real difference.

[1] https://en.wikipedia.org/wiki/Risch_algorithm

abstrakraft · on May 26, 2020

Yes. Most of the time, the reason is that there is a general form for computing derivatives of products ((xy)' = x'y + xy') and composite functions (chain rule) while no such general rules exist for integrals.

kccqzy · on May 26, 2020

Symbolically, yes that's almost always the case. Numerically it's the other way around.

caleb-allen · on May 26, 2020

It seems to me that this just kicks the can down the road--from calculation to verification.

ah- · on May 26, 2020

Sometimes verification is much easier than calculation.

libeclipse · on May 26, 2020

> The Facebook researchers compared their method to only a few of Mathematica’s functions —“integrate” for integrals and “DSolve” for differential equations — but Mathematica users can access hundreds of other solving tools.

> [...] it only included equations with one variable, and only those based on elementary functions. “It was a thin slice of possible expressions,”

> The neural net wasn’t tested on messier functions often used in physics and finance, like error functions or Bessel functions. (The Facebook group said it could be, in future versions, with very simple modifications.)

> Other critics have noted that the Facebook group’s neural net doesn’t really understand the math; it’s more of an exceptional guesser.

> Still, they agree that the new approach will prove useful.

uoaei · on May 26, 2020

> exceptional guesser

Considering neural networks are inherently maximizing probabilities and statistical descriptions of data, this should come as no surprise. This work has not dissolved the dichotomy between rules-based and statistical methods, but rather transmuted the syntax of rules-based expressions into a representation that can be exploited by statistical machines in a way that makes "guessing" more fruitful.

There are some examples near the end of the paper showing how the authors take an initially intractable expression and are able to simplify it with their approach so that Mathematica can actually perform the integral for them. It seems much more appropriate to market this method as a preprocessor for massive expressions to a more chewable size.

sdenton4 · on May 26, 2020

> exceptional guesser

This IMO describes how mathematics itself moves forward... A matematician is an extremely well-trained 'guesser' who is also able to sink a lot of time into formal verification.

The process is essentially: a) Find an interesting conjecture that you've got a strong guess to be true. b) Check for obvious (or less obvious) counterexamples, or conflicting theorems. c) Prove the thing is true.

A large part of the art of being a working mathematician is in part (a): you need to make a really good guess. An ideal conjecture is correct AND proveable AND leads to other interesting results, or says interesting things about bigger problems.

So what happens when we apply really good versions of current AI to this area? Picking out an 'interesting' conjecture is still Strong-AI-Complete: it requires lots of domain knowledge, and an understanding of what this particular conjecture would 'unlock.' But we could perhaps come up with good 'guessers' which quickly tell us whether a given idea might work out, perhaps saving a bunch of effort. Perhaps we could even get to the point of generating a proposed proof which can be fed to an automated proof checking system, allowing for inspection and modification by the human in the loop.

libeclipse · on May 26, 2020

I like this prospect:

> Another unsolved problem where this approach shows promise is one of the most disturbing aspects of neural nets: No one really understands how they work. Training bits enter at one end and prediction bits emerge from the other, but what happens in between — the exact process that makes neural nets into such good guessers — remains a critical open question.

> Symbolic math, on the other hand, is decidedly less mysterious. “We know how math works,” said Charton. “By using specific math problems as a test to see where machines succeed and where they fail, we can learn how neural nets work.”

neatze · on May 26, 2020

> “By using specific math problems as a test to see where machines succeed and where they fail, we can learn how neural nets work.”

XOR and Spiral benchmark was studied in neural networks since 70's.

odyssey7 · on May 26, 2020

The criticisms read as good starting points for subsequent research.

KKKKkkkk1 · on May 26, 2020

So what's the earliest known AI solver for calculus textbook problems? I bet it goes all the way to the 1950s.

This quote seems to be massively misleading:

But it’s clear that the team has answered the decades-old question — can AI do symbolic math? — in the affirmative. “Their models are well established. The algorithms are well established. They postulate the problem in a clever way,” said Wojciech Zaremba, co-founder of the AI research group OpenAI.

“They did succeed in coming up with neural networks that could solve problems that were beyond the scope of the rule-following machine system,” McClelland said. “Which is very exciting.”

rst · on May 26, 2020

Grace Hopper was experimenting with symbolic differentiation of functions in her A-2 compiler in the 1950s, and she may well not have been the first. My reference for this is her paper in the proceedings of a 1957 British conference on "Mechanisation of Thought Processes" -- if you set up an account at archive.org, they may let you page through it here: https://archive.org/details/mechanisationoft02nati/page/n9/m...

mkl · on May 26, 2020

Not AI, though (or almost every computer program would be "AI"). Differentiation can be done with simple deterministic rule following.

rst · on May 26, 2020

Earliest reference I'm aware of for integration is James Slagle's MIT Ph.D. thesis work, from 1961. https://dspace.mit.edu/handle/1721.1/11997

brianberns · on May 26, 2020

It seems that the neural net just spits out an answer, rather than deriving it step-by-step like humans do. That's interesting, but would still get you an F on a real calculus test.

_bxg1 · on May 26, 2020

I think the idea with this kind of thing is that ML can make pretty-good guesses really quickly, and then a formalized process can verify them (usually much more quickly than it could derive them). This hybrid model fits lots of different kinds of problems.

microtherion · on May 26, 2020

Yes, and this is particularly problematic IMHO in sequence-to-sequence translators, as were used here. When fed enough training data, they do an amazing job at aping their training, but they are prone to spout utter nonsense in edge cases, and they don't really (to my knowledge) have any indication that they should have less confidence in one result than in the other.

So best case this system gives black box answers that may or may not hold up in verification. That does not seem to be a very useful way to do mathematical research.

gamegoblin · on May 26, 2020

One could view each layer of the neural net as doing "steps". They are rather opaque steps, though.

thomasahle · on May 26, 2020

I think you'll notice that great human mathematicians and integrators also take pretty big jumps in their derivations, if they don't jump directly to the solution.

Going algorithmicly step by step is the clutches you use till you have built a strong enough intuition.

kccqzy · on May 26, 2020

Mathematica doesn't give you a step-by-step derivation for its integration either (although Wolfram Alpha sometimes does). That doesn't bother most people using the software.

svantana · on May 26, 2020

I think it's able to verify that the solution is correct, which is more than a lot of students manage.

nano_o · on May 26, 2020

You might find the Sledgehammer tool for Isabelle quite interesting. It has been using machine learning techniques to find proofs automatically since at least 2013. It uses previous proof to learn how to select facts to send to off-the-shelf automated provers in order to discharge a goal. See e.g. http://isabelle.in.tum.de/~blanchet/mash2.pdf

On issue I have with it, and that automated proof tools based on ML are going to have to solve, is that it's quite unpredictable. Even if it finds stunning proofs from time to time, it is hard to use an unpredictable tool efficiently.

samuel2 · on May 26, 2020

How about using http://us.metamath.org/ as a db for math theorems / definitions and do some heavy data mining there?

raphlinus · on May 26, 2020

There's some work on this already, including Holophrasm and work by OpenAI on proof shortening. These efforts are linked from the Metamath wiki:

https://github.com/metamath/set.mm/wiki/Automated-proving

samuel2 · on May 26, 2020

very cool, thanks for sharing

dwheeler · on May 26, 2020

OpenAI has managed to derive a number of Metamath proofs, some of which are smaller than the human-created proofs, using machine learning.

Here is a gource visualization of metamath proofs overtime in the set.mm database: https://m.youtube.com/watch?v=LVGSeDjWzUo

Note that near the end, one of the contributors is OpenAI, who is not a human contributor.

zengid · on May 26, 2020

This looks really interesting! Although, I'm not sure if this was "first". I know of another approach tried by Hinton and Sutskever in their paper "Using matrices to model symbolic relationships" [0]. I don't see a date, but I remember Hinton mentioning it in a talk from a few years ago..

[0] https://www.cs.toronto.edu/~hinton/absps/ilyamre.pdf

nullc · on May 27, 2020

Just need to build a tool that predicts the next version of a program from an earlier version and suggests improvements (ideally bug fixes).

Then you have http://www.scp-wiki.net/scp-914 for programs.

paulus_magnus2 · on May 27, 2020

It would be really interesting to see code simplification via neural network.

While working in enterprise I see plenty of code that after some massaging collapses into a much simpler form (for the programmer) and equivalent for the computer.

(fix typo)

clircle · on May 26, 2020

What would a confidence interval for this type of prediction look like?

bglazer · on May 27, 2020

A confidence interval is a sort of meaningless notion in this case. You can just do the derivative of the algorithm's answer, and see if you get back an expression that is equivalent to the input. If you did, then the answer is correct.

clircle · on May 29, 2020

I don't believe a model (neural network or otherwise) is capable of knowing that its predictions are correct.

enricozb · on May 27, 2020

Since ML outputs are usually "fuzzy", how does one check that the output is actually correct when using a technique like this in the field?