Hacker News new | past | comments | ask | show | jobs | submit login

That's fair. But look up the recent experiment on SOTA models on the then just released USAMO 2025 questions. Highest score was 5%, supposedly SOTA last year was IMO silver level. There could be some methodological differences - ie USAMO paper required correct proofs and not just numerical answers. But it really strongly suggests even within limited domains, it's cheating. I'd wager a significant amount that if you tested SOTA models on a new ICPC set of questions, actual performance would be far, far worse than their supposed benchmarks.





> Highest score was 5%, supposedly SOTA last year was IMO silver level.

No LLM last year got silver. Deepmind had a highly specialized AI system earning that




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: