The question is: How do we get LLMs to have "Eureka!" moments, on their own, whe...

epcoa · 2025-07-16T11:50:06 1752666606

The step 3 has been shown to not work over and over again, the “find interesting connections” is the hand wavy magic at this time. LLMs alone don’t seem to be particularly adept at it either.

cs702 · 2025-07-16T13:53:31 1752674011

Has this been tried with reinforcement learning (RL)? As the OP notes, it is plausible from a RL perspective that such a bootstrap can work, because it would be (quoting the OP) "exploiting the generator-verifier gap, where it is easier to discriminate than to generate (eg laughing at a pun is easier than making it)." The hit ratio may be tiny, so doing this well would be very expensive.

epcoa · 2025-07-17T20:09:25 1752782965

Run ML of any combination and form in a for loop for higher order is one of the most obvious avenues. If it worked you would have heard about it a long time ago.