Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean for this particular benchmark, yes.

You'd have to put it in an agentic loop to perform corrections otherwise.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: