Like RLHF but the HF part is GPT4 instead. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		moralestapia 7 months ago \| parent \| context \| favorite \| on: The Illustrated DeepSeek-R1 Like RLHF but the HF part is GPT4 instead.

KarraAI 7 months ago [–]

How do you ensure the student model learns robust generalizations rather than just surface-level mimicry?

moralestapia 7 months ago | [–]

No idea as I don't work on that, but my guess would be that the higher the 'n' the more model A approaches model B.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact