Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Language models definitely do generalize to some extend and they're not "stochastic parrots" as previously thought, but there are some weird ways in which we expect them to generalize but they don't.

Do you have any good sources that explain this? I was always thinking LLMs are indeed stochastic parrots, but language (that is the unified corpus of all languages in the training data) already inherently contains the „generalization“. So the intelligence is encoded in the language humans speak.



> Do you have any good sources that explain this?

The most famous result is OthelloGPT, where they trained a transformer to complete lists of Othello moves, and the transformer generated an internal model of where the pieces were after each move.

The rough consensus is that if you train a model to predict the output of a system for long enough with weight decay and some nebulous conditions are met (see "lottery ticket hypothesis"), eventually your model develops an internal simulation of how the system works because that simulation uses fewer weights than "memorize millions of patterns found in the system", and weight decay "incentivizes" lower-weight solutions.


I don't have explanations but I can point you to one of the papers: https://arxiv.org/pdf/2309.12288 which calls it "the reversal curse" and does a bunch of experiments showing models that are successful at questions like "Who is Tom Cruise’s mother?" (Mary Lee Pfeiffer) will not be equally successful at answering "Who is Mary Lee Pfeiffer’s son?"


Isn't that specific case just a matter of not having enough data _explicitly_ stating the reverse? Seems as if they are indeed stochastic parrots from that perspective.


You know, I'm not sure that humans are so good at that kind of reverse task. Information can be very easy to access from one direction but very hard to reach from others. We're not databases.


Yes, and your conclusion is correct.


> language already inherently contains the „generalization“

The mental gymnastics required to handwave language model capabilities are getting funnier and funnier every day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: