Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reasoning models do a lot better at AIME than non-reasoning models, with o3 mini getting 85% and 4o-mini getting 11%. It makes some sense that this would apply to small models as well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: