> Language models definitely do generalize to some extend and they're not "stoch...

PoignardAzur · on July 19, 2024

> Do you have any good sources that explain this?

The most famous result is OthelloGPT, where they trained a transformer to complete lists of Othello moves, and the transformer generated an internal model of where the pieces were after each move.

The rough consensus is that if you train a model to predict the output of a system for long enough with weight decay and some nebulous conditions are met (see "lottery ticket hypothesis"), eventually your model develops an internal simulation of how the system works because that simulation uses fewer weights than "memorize millions of patterns found in the system", and weight decay "incentivizes" lower-weight solutions.

michaelt · on July 18, 2024

I don't have explanations but I can point you to one of the papers: https://arxiv.org/pdf/2309.12288 which calls it "the reversal curse" and does a bunch of experiments showing models that are successful at questions like "Who is Tom Cruise’s mother?" (Mary Lee Pfeiffer) will not be equally successful at answering "Who is Mary Lee Pfeiffer’s son?"

spookie · on July 18, 2024

Isn't that specific case just a matter of not having enough data _explicitly_ stating the reverse? Seems as if they are indeed stochastic parrots from that perspective.

mlyle · on July 19, 2024

You know, I'm not sure that humans are so good at that kind of reverse task. Information can be very easy to access from one direction but very hard to reach from others. We're not databases.

layer8 · on July 18, 2024

Yes, and your conclusion is correct.

moffkalast · on July 18, 2024

> language already inherently contains the „generalization“

The mental gymnastics required to handwave language model capabilities are getting funnier and funnier every day.