When you get good enough at mathematics, you can tell if your proofs are correct or not without asking a TA to grade them for you. Most mathematicians reach this level before they finish undergrad (a rare few reach it before they finish high school). While AI hasn't reached this level yet, there is no fundamental barrier stopping it from happening - and for now, researchers can use formal proof-checking software like Lean, Coq, or Isabelle to act as a grader.
(In principle, it should be also be possible to get good enough at philosophy to avoid devolving into a mess of incoherence while reasoning about concepts like "knowledge", "consciousness", and "morality". I suspect some humans have achieved that, but it seems rather difficult to tell...)
> When you get good enough at mathematics, you can tell if your proofs are correct or not without asking a TA to grade them for you.
This is simply not true - you can get a very good sense of when your argument is correct, yes. But having graded for (graduate, even!) courses, even advanced students make mistakes. It's not limited to students, either; tons of textbooks have significant errata, and its not as if no retraction in math has ever been issued.
These get corrected by talking with other people - if you have an LLM spew out this synthetic chain-of-reasoning data, you probably get at least some wrong proofs, and if you blindly try to scale with this I would expect it to collapse.
Even tying into a proof-checker seems non-trivial to me. If you work purely in the proof-checker, you never say anything wrong - but the presentations in proof checking language is very different from textual ones, so I would anticipate issues of the LLM leveraging knowledge from, say, textbooks in its proofs. You might also run into issues of the AI playing a game against the compiler rather than building understanding (you see elements of this in the proofs produced by AlphaProof). And if you start mixing natural language and proof checkers, you've just kicked the verification can up the road a bit, since you need some way of ensuring the natural language actually matches the statements being shown by the proof checker.
I don't think these are insurmountable challenges, but I also don't think its as simple as the "generate synthetic data and scale harder" approach the parent comment thinks. Perhaps I'm wrong - time will tell.
The error rate of human mathematical work is not zero, but it does go down exponentially with the amount of time that the mathematician spends carefully thinking about the problem. Mistakes tend to be the result of typos, time pressure, or laziness - showing your work to others and having them check it over does help (it's one of the reasons we have peer review!), but is absolutely not necessary.
If the error rate is low enough - and by simply spending a constant factor more time finding and correcting errors, you can get it below one in a million - then you do get a virtuous feedback loop even without tying in a proof-checker. That's how humans have progressed, after all. While you are right to say that the proof-checker approach certainly is not trivial, it is currently much easier than you would expect - modern LLMs are surprisingly good at converting math written in English directly to math formalized in Lean.
I do think that LLMs will struggle to learn to catch their mistakes for a while. This is mostly because the art of catching mistakes on your own is not taught well (often it is not taught at all), and the data sets that modern LLMs train on probably contain very, very few examples of people applying this art.
A tangent: how do human mathematicians reliably manage to catch mistakes in proofs? Going line-by-line through a proof and checking that each line logically follows from the previous lines is what many people believe we do, but it is actually a method of last resort - something we only do if we are completely lost and have given up on concretely understanding what is going on. What we really do is build up a collection of concrete examples and counterexamples within a given subject, and watch how the steps of the proof play out in each of these test cases. This is why humans tend to become much less reliable at catching mistakes when they leave their field of expertise - they haven't yet built up the necessary library of examples to allow them to properly interact with the proofs, and must resort to reading line by line.
I prefer his explanation to most other explanations because he starts, right away, with an analogy to ordinary probabilities. It's easy to understand how linear algebra is related to probability (a random combination of two outcomes is described by linearly combining them), so the fact that we represent random states by vectors is not surprising at all. His explanation of the Dirac bra-ket notation is also extremely well executed. My only quibble is that he doesn't introduce density matrices (which in my mind are the correct way to understand quantum states) until halfway through the notes.
MIT's PRIMES program does exactly this - they give advanced high school students a mentor who picks out a problem, gets them up to speed on what is known, and then they work on the problem for a year and publish their results. It tends to work best with problems which have a computational aspect, so that the students can get some calculations done on the computer to get the ball rolling.
If the Hanoi pieces alternate in color, there is another very easy algorithm: always avoid putting two pieces of the same color directly on top of each other. I noticed this by accident while watching someone else solve a Hanoi puzzle with alternating colored pieces. (The person I was watching didn't know this trick either. I suspect the manufacturers of the alternatingly colored puzzle also didn't know about it. So where did the knowledge come from?)
Have you tried to read any of the literature on the Risch algorithm? If you haven't, you might want to get started by taking a look at the paper "Integration in Finite Terms" by Rosenlicht [1] and chasing down some of the references mentioned in [2].
Of course, in the real world we don't give up on integrals just because they can't be expressed in terms of elementary functions. Usually we also check if the result happens to be a hypergeometric function, such as a Bessel function. If you want to get started on understanding hypergeometric functions, maybe try reading [3] (as well as the tangentially related book "A = B" [4]).
In 2017, Andrei Bulatov and Dmitriy Zhuk independently proved that every CSP template defines a problem which is either NP-complete or can be solved in polynomial time (they shared best paper at FOCS for this). However, the number of people who actually understand either of their proofs is still tiny, and no one has managed reduce the (enormous) running times of their algorithms. Zhuk just put out a much simpler argument with a cleaner abstraction of the crucial algebraic properties which are used in his proof.
evince recently added some features that I now find it hard to live without. The biggest is that if I mouseover an internal link in a pdf, it shows a little preview of where that link would take me - so if I'm reading a long document and the author says "by Theorem 3.3 it holds..." and if the pdf is recent enough for "Theorem 3.3" to be an internal link, then I can mouseover to remind myself of the statement of Theorem 3.3!
I've been reading it over the past few days (picked it up after seeing it mentioned here). It's fantastic, much better than many similar attempts to make quantum field theory comprehensible to mathematicians.
(In principle, it should be also be possible to get good enough at philosophy to avoid devolving into a mess of incoherence while reasoning about concepts like "knowledge", "consciousness", and "morality". I suspect some humans have achieved that, but it seems rather difficult to tell...)
reply