Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AlphaEvolve did not perform equally well across different areas of mathematics. When testing the tool on analytic number theory problems, such as that of designing sieve weights for elementary approximations to the prime number theorem, it struggled to take advantage of the number theoretic structure in the problem, even when given suitable expert hints (although such hints have proven useful for other problems). This could potentially be a prompting issue on our end,

Very generous from Tao to say it can be a prompting issue. It always surprises me how easily it is for people to says that the problem is not the LLM, but them. With other types of ML/AI algorithms we dont see this. For example, after a failed attempt or lower score in a comparison table, no one writes "the following benchmark results may be wrong, and our proposed algorithm may not be the best. We may have messed up the hyperparameter tunning, initialization, train test split..."





Even without such acknowledgments it is hard to get past reviewers ("Have you tried more extensive hyperparameter tunning, other initializations and train test splits?"). These are essentially lab notes from an exploratory study, so (with absolutely no disrespect to the author) the setting is different.

Of course people don't say it, but there are many cases where reported algorithmic improvements are attributable to poor baseline tuning or shoddy statistical treatment. Tao is exhibiting a lot more epistemic humility than most researchers who probably have stronger incentives to market their work and publish.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: