Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's no traditional human reinforcement.

Models like gpt3 get turned into models like ChatGPT through RLHF (reinforcement learning from human feedback), by fine-tuning the model further on prompts in the style we'd like them to respond in, typically

User: question

Bot: Response

This is done by handcrafting or modifying data from places like stack exchange.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: