It's trying to maximize a reward function. It's not just predicting the next wor...

		charcircuit 5 months ago \| parent \| context \| favorite \| on: A new Google model is nearly perfect on automated ... It's trying to maximize a reward function. It's not just predicting the next word.