Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would absolutely take you on a blind test between 4.0 and 4.5 - the improvement is significant.

And while I do want your money, we can just look at LMArena which does blind testing to arrive at an ELO-based score and shows 4.0 to have a score of 1318 while 4.5 has a 1438 - it's over twice likely to be judged better on an arbitrary prompt, and the difference is more significant on coding and reasoning tasks.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: