Hacker News new | past | comments | ask | show | jobs | submit login

This is not human feedback reinforcement learning, it is just traditional supervised reinforcement learning where the finetuning sets consist of problems and the correct answers. They do not call it supervised though because they have to say it is different than how they were finetuning until now.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: