Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Existence problems are not optimisation problems

Several of the problems were existence problems, such as finding geometric constructions.

> It needs an optimisation function that can be incrementally improved in order to work towards an optimal result, not a binary yes/no.

This is not correct. The evaluation function is arbitrary. To quote the AlphaEvolve paper:

> or example, when wishing to find largest possible graphs satisfying a given property, ℎ invokes the evolved code to generate a graph, checks whether the property holds, and then simply returns the size of the graph as the score. In more complicated cases, the function ℎ might involve performing an evolved search algorithm, or training and evaluating a machine learning model

The evaluation function is a black box that outputs metrics. The feedback that you've constructed a graph of size K with some property does not tell you what you need to do to construct a graph of size K + M with the same property.

> a research mathematician is not trapped in a loop, mutating candidates for an evolutionary optimiser loop like the LLM is in AlphaEvolve.

Yes they are in a loop called the scientific method or the research loop. They try things out and check them. This is a basic condition of anything that does research.

> They have the agency to decide what questions to explore

This is unrelated to the question of whether LLMs can solve novel problems

> most of which (as the article says) can be approached using traditional optimisation techniques with similar results.

This is a mischaracterization. The article says that an expert human working with an optimizer might achieve similar results. In practice that's how research is done by humans as I mentioned above: it is human plus computer program. The novelty here is that the LLM replaces the human expert.





> finding geometric constructions

Finding optimal geometric constructions. Every problem is an optimisation because AlphaEvolve is an optimiser.

> This is not correct. The evaluation function is arbitrary.

You say this and then show details of how the score is calculated. AlphaEvolve needs a number to optimise, because it is optimiser. It can't optimise true/false.

> The feedback that you've constructed a graph of size K with some property does not tell you what you need to do to construct a graph of size K + M with the same property.

The feedback that you've constructed a graph of size K tell you that you've constructed a bigger graph than a competing solution that only constructed a graph of size K-1 and are therefore a more promising starting point for the next round of mutation

If you're trying to solve a "does there exist an X" problem, the information that none of your candidates found (or was) an X doesn't give you any information about which of them you should retain for mutation in the next step. You need a problem of the form "find the best X" (or, rather "find a good X") and for that you need a score of how well you've done. If you can find a score that actually improves steadily until you find the thing you're trying to prove the existence of then great, but generally these problems are "find the best X" where it's easy to come up with a load of competing Xs.

> The novelty here is that the LLM replaces the human expert.

That's not the claim at all. Tao said the benefits are scaling, robustness and interpretability, not that it can be operated by someone who doesn't know what they're doing.


> not that it can be operated by someone who doesn't know what they're doing.

It's operated by an LLM, not a human. There is no human in the loop.


The LLM doesn't write the objective function. It only generates candidate solutions to be evaluated by it. The human writes the objective function. That's how you define the optimisation to perform.

The objective function is the problem. The objective function is given in all problems.

In "find an X such that Y" the objective function is "is this an X? is Y satisfied?". In induction you have P and n to ratchet up the proof by increasing n such that P(n) holds.

Combining this reply with your previous one, it sounds like you're setting up a situation where the LLM can neither know the problem nor whether it's making progress on the problem. With those constraints no human could do mathematics either or really anything else.


What's the objective function for the Langlands Program?

It seems like the person you're responding to has decided a type checker that outputs 1 when the proof object is valid according to the rules & 0 otherwise should be considered a scoring function. The objective function in this case is not differentiable so the usual techniques from deep learning will not work & genetic search algorithms like AlphaEvolve can in theory be used in these cases. Someone still has to come up w/ the type system & verify the soundness of the rules b/c there is no finite codification of all valid & sound type systems like there are for problems in the linked blog post.

An evolutionary algorithm doesn't need a differentiable objective function, but it can't work with a simple "1 if you got it right, 0 otherwise". It is an iterative approach and at each step it needs to be able to distinguish "better" from "worse" candidates. You certainly can't just say to AlphaEvolve "off you go, I'll tell you when you got it right".

Taking the Collatz Conjecture I used as an example just now, you can trivially write an objective function that outputs 1 for a correct Lean proof of it and 0 for an incorrect one, but AlphaEvolve won't be able to work on that. It needs to be able to assess a collection of incorrect proofs to identify the most promising ones for the next step. I don't know how you'd even start on that, and it's certainly not what they've been doing with AlphaEvolve.


It can and it does work w/ such objective functions. Lots of people have used evolutionary algorithms to evolve chess playing neural networks¹ & they have been successful w/ very sparse reward signals where the the final trajectory is scored w/ 0 or 1 according to a win condition. You can say this is not likely to work for proof search & I'd be inclined to agree but the strategy has proven to work in simpler settings so whether it can be used in more complex settings is yet to be determined. If Collatz is not independent of existing axiomatic foundations then a brute force search will find a solution so any heuristics added on top of it that cut out paths to unsuccessful attempts will increase the probability of finding the proof object.

¹https://arxiv.org/abs/1711.08337


From that chess paper:

> Each position in the ECM test suite has a predetermined “best move”. Each chromosome processes all of the 879 positions, and for each position it attempts to find this predetermined best move as fast as possible.

> Instead of counting the number of correctly “solved” positions (number of positions for which the organism found the best move), we used the number of nodes the organism had to process in order to find the best move.

That isn't a 1-or-0 objective function, and the paper isn't an example of using an evolutionary loop in which the objective function doesn't give you any information on which candidates are better in a given iteration. Because that isn't possible.

Re brute forcing by evaluating the countable set of correct proofs within a given formal system, people have been trying that since computers were invented and it hasn't resulted in the magic proof factory yet. People continue to work on better and better heuristics for trimming the search and I understand some of the stuff people have been doing in that direction with Lean is actually useful now, but there hasn't been a huge breakthrough in it and nobody expects a system like that to spit out a proof of the Collatz Conjecture any time soon. More to the point of this discussion, it's not what AlphaEvolve does.

Anyway, I need to go to bed. It's been fun.


The same applies to proof search. Once you fix a finite foundational set of axioms the game proceeds exactly as in chess.

You can't use minimax with alpha-beta pruning for proof search, but that's sufficient to play chess at a high level. I don't see what you're seeing. Chess and mathematics are completely different kinds of problem.

In other words, proofs of existence are measure zero in the space of evolution?

> It seems like the person you're responding to has decided a type checker... should be considered a scoring function

To clarify, I didn't decide this, this is a valid scoring function in AlphaEvolve. The scoring function is generic and can even be an LLM writing prose giving feedback on the solution followed by another LLM scoring that prose numerically. There needs to be a numeric score to rank solutions. Typically type checkers give more output than 1 or 0, though. For example, they'll often give you information about where the first error occurred.

That doesn't mean it's a great scoring function or even a good one. But it is a scoring function and without any scoring function at all progress would be impossible. To the extent that math is about writing proofs, it's a valid and essential scoring function for any problem. In practice, to make progress you need more than just the ability to write a logical proof, you need to build on previous results, add extra conditions, compute examples, etc. But in the context of the discussion, the point is that there is always some way to measure progress, which is why AlphaEvolve includes this mechanism.

> Someone still has to come up w/ the type system & verify the soundness of the rules b/c there is no finite codification of all valid & sound type systems like there are for problems in the linked blog post.

This is true, but it's also true that typically mathematicians fix a logic or universe before getting down to work. So AlphaEvolve and human mathematicians are on equal footing in that respect.


> it's also true that typically mathematicians fix a logic or universe before getting down to work

This is true for programmers, it's not true for mathematicians. You can say programming is a subset of mathematics but mathematics is more than programming so proof search does not exhaust all the activities of a mathematician but it does exhaust all the activities of a programmer.


> it's not true for mathematicians

It actually is true for mathematicians.

Most of us work in ZFC. Some people choose to work in constructive logic. Some people choose to only use countable choice. But we always know what logic we're working in and which theorems we can invoke. E.g. if you choose not to accept the axiom of choice you also have a sense of which results depend on Zorn's lemma and have to work around them. All of this is normal background for mathematicians.

So there's no need to allow the foundations of mathematics to vary unless you're working in logic and need to quantify over foundational systems or compare them. You would certainly never start a problem and then switch the logic midway through. That sort of thing wouldn't even be coherent.


You seem to have a very limited understanding of mathematics. You're welcome to stick to it but I'd recommend learning what mathematics is actually about before thinking you can reduce it all to proof search in a single & fixed finite axiomatic system.

> You seem to have a very limited understanding of mathematics. You're welcome to stick to it but I'd recommend learning what mathematics is actually about

You'd think I'd know what mathematics is about since I have a Ph.D. in it...

> fixed finite axiomatic system

ZFC is not a finite axiomatic system. nowhere did we specify that axiomatic systems have to be finite

> single...fixed

As I said above, there are people who study different logics. However the topic of conversation is individual problems. An individual problem does not switch logics mid stream and anyone working on a problem works in the context of a logic.


You don't need a degree to understand mathematics but plenty of people w/ degrees think they understand what it's all about. Seems a bit odd you have a degree in mathematics but lack the basic understanding of what counts for mathematical activity. I don't think you should be proud of that but then again I don't know what they teach in PhD departments these days so maybe that's what they told you you needed to think to get the degree.

I see. It sounds like you're coming at this from an outsider perspective as someone who hasn't formally studied math and doesn't know what research level mathematicians are taught.

There's nothing wrong with taking an interest in a subject as an amateur. I thoroughly support it. But knowing what is taught in PhD departments and what mathematicians do is a precondition for understanding whether what AlphaEvolve is doing is similar to it.

If I wanted to know whether some AI tool was doing what humans do for, say, song writing, the first thing I'd do is figure out how song writing is taught, how professionals think about song writing, etc. And this is true regardless of the fact that I enjoy, support and often prefer outsider musicians.


I think you're making too many assumptions. Something they should have taught you while you were getting your degree but again, I'm not sure what they teach these days b/c it's been a while since last time I was listening to a graduate lecture on mathematics in a classroom.

The Langlands Program is a research program not a single mathematical problem. It consists of many open conjectures with various levels of progress toward them. However, I think it's quite a victory that the goal posts have moved from "LLMs can't solve problems" to "LLMs probably can't solve all open conjectures in Langlands Program in one shot.

But as I said way up above, if you have the statement of any particular problem you can just use the function that evaluates proofs as your objective function. If you were to do this in Lean, for example, you'd get compiler output that contains information you can use to see if you're on the right track.

In addition to the output of the proof program you'd probably want to score sub-computations in the proof as metrics. E.g. if you want to show that a map has finite fibers, you may want to record the size of the fibers, or a max over the size. If you need to know an element is not contained in some Levi subgroup then you may want to record information about the relevant Levi decomposition. This mimics things that humans know to score their progress as they're doing computations.


If you want a simple, well defined problem that a child can understand the statement of, how about giving me the objective function for the Collatz Conjecture?



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: