Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have no proof, but these deep thinking modes feel to me like an orchestrator agent + sub agents, the former being RL‘d to just keep going instead of being conditioned to stop ASAP.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: