Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reinforcement Learning changes this though - remember Move 37?

The issue is you need verifiable rewards for that (and a good environment set-up), and it's hard to get rewards that cover everything humans want (security, simplicity, performance, readability, etc.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: