Has this been tried with reinforcement learning (RL)? As the OP notes, it is pla... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cs702 64 days ago \| parent \| context \| favorite \| on: LLM Daydreaming Has this been tried with reinforcement learning (RL)? As the OP notes, it is plausible from a RL perspective that such a bootstrap can work, because it would be (quoting the OP) "exploiting the generator-verifier gap, where it is easier to discriminate than to generate (eg laughing at a pun is easier than making it)." The hit ratio may be tiny, so doing this well would be very expensive.

epcoa 62 days ago [–]

Run ML of any combination and form in a for loop for higher order is one of the most obvious avenues. If it worked you would have heard about it a long time ago.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact