It's weird that Opus4 is the worst at one-shot, it requires on average two attem... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

XCSme 23 days ago | parent | context | favorite | on: Claude 4

It's weird that Opus4 is the worst at one-shot, it requires on average two attempts to generate a valid query.

If a model is really that much smarter, shouldn't it lead to better first-attempt performance? It still "thinks" beforehand, right?

riwsky 23 days ago [–]

Don’t talk to Opus before it’s had its coffee. Classic high-performer failure mode.

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact