And also the breakthrough that let AlphaGo and AlphaStar make the leaps that they did.
The trouble is that those board games don't translate well to other domains. But if the game space can operate through the realm of language and semantics, then the hope is that we can tap into the adversarial growth curve, but for LLMs.
Up until now, everything that we've done has just been imitation learning (even RLHF is only a poor approximation "true" RL).
The trouble is that those board games don't translate well to other domains. But if the game space can operate through the realm of language and semantics, then the hope is that we can tap into the adversarial growth curve, but for LLMs.
Up until now, everything that we've done has just been imitation learning (even RLHF is only a poor approximation "true" RL).