This is not human feedback reinforcement learning, it is just traditional superv...

freehorse 70 days ago | parent | context | favorite | on: OpenAI Reinforcement Fine-Tuning Research Program

This is not human feedback reinforcement learning, it is just traditional supervised reinforcement learning where the finetuning sets consist of problems and the correct answers. They do not call it supervised though because they have to say it is different than how they were finetuning until now.