Hacker News new | past | comments | ask | show | jobs | submit login

RLHF is a band aid on not having enough data that fits your own biases and answers you want the model to give.



It won't give answers at all if you don't train it to. It will output more questions because that's a more obvious completion to an incoming question.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: