Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This simply isn’t the case. Humans actually perform better on ARC-AGI-2, according to their website: https://arcprize.org/leaderboard


The 100.0% you see there just verifies that all the puzzles got solved by at least 2 people on the panel. That was calibrated to be so for ARC-AGI-2. The human panel averages for ARC-AGI-1 and ARC-AGI-2 are 64.2% and 60% respectively. Not a huge difference, sure, but it is there.

I've played around with both, yes, I'd also personally say that v2 is harder. Overall a better benchmark. ARC-AGI-3 will be a set of interactive games. I think they're moving in the right direction if they want to measure general reasoning.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: