Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Like RLHF but the HF part is GPT4 instead.


How do you ensure the student model learns robust generalizations rather than just surface-level mimicry?


No idea as I don't work on that, but my guess would be that the higher the 'n' the more model A approaches model B.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: