The author (bafflingly) seems to have completely missed the point- since anything they state up to page 15 (at which point I stopped reading) does not refute Chomsky's points at all. The author talks about LLMs and how they generate text and then goes on to talk about how it refutes Chomsky's claim about syntax and semantics. However it does not since Chomsky's primary claim is about how HUMANS acquire language.
The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way. [edited page = 15]
> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.
Also, the LLMs are cheating! They learned from us. It's entirely possible that you do need syntax/semantics/sapience to create the original corpus, but not to duplicate it.
Let's see an AlphaZero-style version of an LLM, that learns language from scratch and creates a semantically meaningful corpus of work all on its own. It's entirely possible that Chomsky's mechanisms are necessary to do so.
Humans learn from the structure of the world -- not the structure of language.
LLMs cheat at generating text because they do so via a model of the statistical structure of text.
We're in the world, it is us who stipulate the meaning of words and the structure of text. And we stipulate new meanings to novel parts of the world daily.
What else is an 'iPhone' etc. ? There's nothing in `i P h o n e` which is at all like an iphone.
We have just stipulated this connection. The machine replays these stipulations to us -- it does not make them, as we do.
There's nothing qualitatively less "in the world" about a language model than a human. Yes, a human has more senses, and is doubtless exposed to huge categories of training data that a language model doesn't have access to - but it's false to draw a sharp dichotomy between knowing what an iPhone looks like, and knowing how people talk about iPhones.
Consider two people - one, a Papau New Guinea tribesperson from a previously uncontacted tribe who is allowed to handle a powered-down iPhone, and told it is an "iPhone", but is otherwise ignorant of its behavior - the other, a cross-platform mobile software developer who has never actually held a physical iPhone, but is intimately familiar with its build systems, API, cultural context etc. Between the two of them, who better understands what an iPhone "is"?
You make a good point about inventing words to refer to new concepts. There's nothing theoretically stopping a language model from identifying some concept in its training data that we don't have a word for, inventing a word for it, and using it to give us a perspective we hadn't considered. It would be very useful if it did that! I suspect we don't tend to see that simply because it's a very rare occurrence in the text it was trained on.
LLMs don't have any senses, not merely fewer. LLMs don't have any concepts, not merely named ones.
A concept is a sensory-motor technique abstracted into a pattern of thought developed by an animal, in a spatio-temporal environment, for a purpose.
LLMs are just literally an ensemble of statistical distributions over text symbols. In generating text, they're just sampling from a compressed bank of all text ever digitised.
We aren't sampling from such a bank, we develop wholey non-linguistic concepts which describe the world, and it is these which language piggy-backs on.
The structure of symbols in a book has nothing to do with the structure of the world -- it is we who have stipulated their meaning: there's no meaning to `i`
> A concept is a sensory-motor technique abstracted into a pattern of thought developed by an animal, in a spatio-temporal environment, for a purpose.
Hi, since human linguistics is the sole repository of linguistic conceptualism, can you please show me which of the neurons is the "doggie" neuron, or the "doggie" cluster of neurons? I want to know which part of the brain represents the thing that goes wag-wag.
If you can't mechanically identify the exact locality of the mechanism within the system, it doesn't really exist, right? It's just a stochastic, probabilistic model, humans don't understand the wag-wag concept, they just have some neurons that are weighted to fire when other neurons give them certain input stimuli tokens, right?
This is the fundamental problem: you are conflating the glue language with the implementation language in humans too. Human concepts are a glue-language thing, it's an emergent property of the C-language structure of the neurons. But there is no "doggie" neuron in a human just like there is no "doggie" neuron in a neural net. We are just stochastic machines too, if you look at the C-lang level and not the glue-language level.
There's a pile of work on multimodal inputs to LLMs, generally finding that less training data is needed as image (or other) data is added to training.
Text is an extremely limited input stream, but an input stream nonetheless. We know that animal intelligence works well enough with any of a range of sensory streams, and different levels of emphasis on those streams - humans are somehow functional despite a lack of ultrasonic perception and primitive sense of smell.
And your definition of a concept is quite self-serving... I say that as a mathematician familiar with many concepts which don't map at all to sensory motor experiences.
Sensory-motor expression of concepts is primitive, yes, they become abstracted --- and yes the semantics of those abstractions can be abstract. I'm not talking semantics, i'm talking genesis.
How does one generate representations whose semantics are the structure of the world? Not via text token frequency, this much is obvious.
I dont think the thinnest sense of "2 + 2 = 4" being true is what a mathematician understands -- they understand, rather, the object 2, the map `+` and so on. That is, the proposition. And when they imagine a sphere of radius 4 containing a square of length 2, etc. -- I think there's a 'sensuous, mechanical, depth' that enables and permeates their thinking.
The intellect is formal only in the sense that, absent content, it has form. That content however is grown by animals at play in their environment.
> Consider two people - one, a Papau New Guinea tribesperson who is allowed to handle a powered-down iPhone, and told it is an "iPhone", but is otherwise ignorant of its behavior - the other, a cross-platform mobile software developer who has never actually held a physical iPhone, but is intimately familiar with its build systems, API, cultural context etc. Between the two of them, who better understands what an iPhone "is"?
But then also consider the following: a human being from 2006, and an LLM that has absorbed an enormous corpus of words about iPhones that is also granted access to a capacitive-touchscreen friendly robot arm and continuous feed digital camera (and since I'm feeling generous, also a lot of words about the history and architecture of robot arms and computer vision). There is no doubt the LLM will completely blow the human out of the water if asked trivia questions about the iPhone and its ecosystem.
But my money's on the 2006 human doing a lot better at switching it on and using the Tinder app...
No doubt. I don't think anyone's arguing that LLMs have richer, deeper understanding of anything just yet. On the other hand I also don't think it would prove much to vaguely connect a language model to a robot arm and then ask it to do non-language tasks.
> Humans learn from the structure of the world -- not the structure of language.
You'd be surprised. Many researchers believe that "knowledge" is inseparable from language, and that language is not associative (labels for the world) but relational. For example, in Relational Frame Theory, human cognition is dependent on bidirectional "frames" that link concepts, and those frames are linguistic in nature. LLMs develop internal representations of those frames and relations, which is why they can tell you that a pool is bigger than a cup of water, and which one you would want to drink.
In short, there's no evidence that being in the world makes our knowledge any different from an LLM. The main advantages we have at the moment are sensory learnings (LLMs are not good at comparing smells and flavors) and the ability to continuously train our brains.
The cooccurrent frequency between text tokens in everything ever written is a limited statistical model of however language is used in humans.
It almost doesnt matter what your theory of language is --- any even plausible account will radically depart from the above statistical model. There isn't any theory of language which supposes it's an induction across text tokens.
The problem in this whole discussion is that we know what these statistical models are (models of association in text tokens) -- yet people completely ignore this in favour of saying "it works!".
Well "it works" is NOT an explanatory condition, indeed, it's a terrible one. If you took photographs of the night sky for long enough, you'd predict where all the stars are --- these photos do not employ a theory of gravity to achive these.
LLMs are just photographs of books.
There's a really egregious pseudoscience here that the hype-cycle completely suppresses: we know the statistical form of all ML models. We know that via this mechanism arbitrarily accurate predictions, given arbitrarily relevant data, can be made. We know that nothing in this mechanism is explanatory.
This is trivial. If you video tape everything and play it back you'll predict everything. Photographing things does not impart those photographs the properties of those things -- those serve as a limited assocative model.
It’s odd to see people doomwaving two general reasoning engines.
It’s especially hard to parse a dark sweeping condemnation based on…people are investing in it? It doesn’t have the right to assign names to things? Idk what the argument is.
My most charitable interpretation is “it cant reason abour anything unless we already said it” which is obviously false.
> one of which is an average 14 year old, the other an honors student college freshman
The point is that they're not those things. Yes, language models can produce solutions to language tests that a 14 year old could also produce solutions for, but a calculator can do the same thing in the dimension of math - that doesn't make a calculator a 14 year old.
Yes, the AI isn’t literally a 14 year old, and we should do an anthromorphization acknowledgement. Thank you for pointing it out, it can waste a lot of time when you get sloppy with language in AI discussions.
I removed the reference, in retrospect, it’s unnecessary. No need to indicate the strong performance, we’re all aware.
You may not have said it directly but implied, for example if we said A to B, and B to C, the model would have learned the relation and tell you A will go to C, doesn't mean all the sudden it can reason. It's all already in the language and when it has learned enough of numerous forms of A to B, B to C, the relation it's built makes it to give A to C. Yet A to C may very well be some epiphany that we have never thought about. One advantage is the model never get sloppy, it remembers everything, it may overreact/overthink hence hallucination, but it doesn't overlook things or bias like human do (until alignment of course). This is why we're often surprised by the model, but we probably knew it too jut being blind about certain things sometimes so never made the connection.
The heavy investment is what makes this truth uncomfortable - it does not make this truth true (or false).
The point is not so much that we already said it, more that the patterns it encodes and surfaces when prompted are patterns in the written corpus, not of the underlying reality (which it has never experienced). Much like a list of all the addresses in the US (or wherever) will tell you very little about the actual geography of the place.
Sure you did, all animal do. Without language, human would live just fine, evidently all animal live this way, deaf people can live, can reason, can triage, may not be sophisticated but they all the underlying reality in their heads, probably gained from try and fail, experiences.
>Humans learn from the structure of the world -- not the structure of language.
No we don't. Humans don't experience or perceive reality. We perceive a nice modification of it and that's after excluding all sense data points we simply aren't capable of perceiving at all.
Your brain is constantly shifting and fabricating sense data based on internal predictions and that form the basis of what you call reality. You are not learning from the structure of the world. You are learning from a simplified model of it that is fabricated at parts.
Structure in the “world”? You mean the stream of “tokens” we ingest?
This just comes down to giving transformers more modalities, not just text tokens.
There is nothing about “2” that conveys any “twoness”, this is true of all symbols.
The token “the text ‘iphone’” and the token “visual/tactile/etc data of iphone observation” are highly correlated. That is what you learn. I don’t know if you call that stipulation, maybe, but an LLM correlates too in its training phase. I don’t see the fundamental difference, only a lot of optimizing and architectural improvements to be made.
Edit: and when I say “a lot”, I mean astronomical amounts of it. Human minds are pretty well tuned to this job, it’ll take some effort to come close.
Children raised in isolation will spontaneously develop language
I don’t have a reference handy now (someone can probably do better) but I believe one way to see this is via the hearing impaired or hearing and sight impaired
The classic example which I think you’re referring to is Nicaraguan Sign Language, which developed organically in Nicaraguan schools for deaf children where neither the children nor the teachers knew any other form of sign language. It’s a fascinating story, a complex fully developed language created by children. Seems to indicate that this is indeed a very inate capability among humans in larger groups: https://en.wikipedia.org/wiki/Nicaraguan_Sign_Language
Yeah because all the other things are innate; visual/spatial awareness, touch, smell, vocalization…
Humanoids went millions of years literally learning to navigate 3D space and sense “enough heat, food, water” etc
Nomadic tribes had built shared resource depots millennia before language.
I can see the color gradients of the trees and feel muscles relax without words.
Human language beyond some utilitarian labels just instills mind viruses that bloom into delusions of grandeur.
90% of human communication is unspoken. Neuroscience shows our brains sync behavior patterns with touch and just being in a room.
Reality is full of unseen state change every moment that we have no colloquial language for; human language is hardly the source of truth and the “North star” of human society in reality.
Of course it is; it’s just a comment on a social media forum.
There’s just as little science language motivates me to work. Most of the language society relies on is hallucinations; fiat currency, nation states, constructs like “Senate” and Congress, corporatism, brands, copy-paste of historical terminology, not evidence they’re immutable features of reality.
What we recite has nothing to do with what we are. I find the appeals to non-existent political truisms primate gibberish.
It seems perfectly clear to me many facts of society are just memorized and recited prompt hacks. Language is the goto tool for propagandists, to obscure sensory connection to reality.
There is over 100 years of propaganda research available, too much for me to sort through, but scientific measure of such is not new; new to anyone unaware of it but not to humanity.
Nonetheless, it's the primary reason people are called to back to office, the in person face time is valued much more than remote chat/email. It's not a coincidence all leaders favor face time, hiring is done via the process of "interview", literally asking the "view". Need more proof?
For obvious ethical reasons, we have never perfectly conducted this experiment, but Nicaraguan Sign Language comes close. The government created a school for deaf children who did not have a common language. Many of the children did not know any language prior to attending. Although, they did have "home sign" systems, which are basic signing systems spontaneously developed between deaf children and their speaking care takers.
We can also look at creole languages. When distinct linguistic groups come into contact without a common language they will, in some cases, develop "pidgin" languages to communicate. These languages are created by the adults who are already fluent speakers of their native language. What is interesting is what happens with children born into such communities. They grow up hearing the pidgin language, but they do not learn the pidgin. Instead, the language they learn is a creole. It is based on the pidgin, but more complete and consistent with the way human language works in general. In effect, like all children, they are relying on their innate knowledge of language. The pidgin fills in the language specific parameters, but when the pidgin contradicts an aspect of innate language the children (for the most part), just don't aquire that portion of the language.
The common thread in both cases is (as you say) children: in both creoles and NSL, the early versions of the language are somewhat primitive (being developed by older children and adults), but when children are exposed to it from an early age, the languages acquire much more complexity and sophistication, becoming every bit as rich as "regular" languages. It fits well with the hypothesis that the early years of life are crucial for language development, and that there is indeed something innate in most humans for deep structural understanding of language. It's not just statistical pattern recognition.
The best example I'm aware of is Nicaraguan Sign Language [1].
> Steven Pinker, author of The Language Instinct, claims that "The Nicaraguan case is absolutely unique in history ... We've been able to see how it is that children—not adults—generate language, and we have been able to record it happening in great scientific detail. And it's the only time that we've actually seen a language being created out of thin air."
Individual humans aren’t, but we’re talking about the emergent properties of a swarm of humans. 6 gigabytes of dna describes an entity that, if copied 60 times and dropped on an island, will produce a viable language in inly a few decades. We haven’t found a NN architecture with this property for language, but we have for , eg, chess, starcraft, and go
This might be too overarching of a statement but "kind" learning environments like games that have set rules and win conditions are very far from the turing completeness of human language.
Humans are, in the sense that we're working off the checkpoint passed to us from our parents: the bare minimum language centers our brain is evolved to produce regardless. The sequence of checkpoints evolves effectively from scratch, but individual humans adapt the checkpoint for their environment from birth.
I like to think of the brain as the general model we've trained with evolution, and the person's experiences as the specialization.
I mean have you seen the whole "encode a poem in made up emoji" followed by "decode the poem from emoji" stuff? I think it's not unreasonable to think with the right prompts LLMs could do this.
The LLMs that are doing this are trained on trillions of examples of human language. This is not remotely a counterexample. Now if an LLM can invent a new full language from scratch, without any training data of existing languages (like how AlphaZero learned to play Go), then that will be impressive, and a difference in kind.
Um but you have the example of English. Modern English was based on Middle English, which in turn is based on Old English, but greatly influenced by Norman on account of the invasion, as well as by Norse
"In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
“What are you doing?”, asked Minsky.
“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.
“Why is the net wired randomly?”, asked Minsky.
“I do not want it to have any preconceptions of how to play”, Sussman said.
Minsky then shut his eyes.
“Why do you close your eyes?”, Sussman asked his teacher.
“So that the room will be empty.”
At that moment, Sussman was enlightened."
That’s a red herring. Ex post facto the activity of language use as a construction must supersede some enaction by capable actors.
This is a complete tautology in terms, and any dispute must therefore be over the ontology in which it is expressed. To elaborate how something came to be is not required to express that something is.
What makes capable actors capable in the first place is not derived from thin air. There's a reason a capable actor came to be where there was none generations prior. That reason did not come about without reaction to some outside vector.
Evolution optimized for language learning abilities before the full capabilities came. Evolution is not thin air.
The «air» is here an allegory for «large language corpus». It is self-evidently true that as we move backwards through history we eventually pass the first language utterance for any given definition of language or utterance. From here on out, the «air» is indeed «thin». There are still other things, but no «air» (i.e. «language corpora»).
No, humans learn their languages through observation and generation - watching older people speak, and then imitating it. They get corrections, too, when they mispronounce or misuse words.
They mean at one point there were no older people to listen to in order to learn language because language did not exist yet. We somehow transitioned from grunts to words.
We either have the capacity to make boats and traverse the oceans or we don’t. These days almost nobody makes boats, but we sure used to and still can. What’s your point?
Haha I love how you stay so polite while being a pompous a#$. A real double bind! I would apologise for being obtuse but I must say the Gloves Are Off. Good day to you sir!
A kind of corollary that I'm sure others have thought of: if llms are so smart and human thought is nothing more than a big language model, why can't they (llms) make up their own training data. Any discussion about how they are "thinking" the way we do is BS, I don't know how so many people who know better have been conned.
> I don't know how so many people who know better have been conned.
The simple reason is: because they don't actually "know better". Maybe they are knowledgeable and skilled in some area, but that doesn't mean they are knowledgeable and skilled in everything.
Absolutely a false blanket statement, especially if someone is trying to make a tabula rasa AI that can invent its own language. Far from being punished for it, this would be the AI's reward function!
It's been a while since I've engaged with the ideas, but it's my understanding that Chomsky was discussing the ability for children to learn the language of the society they're raised in. LLMs are not cheating at that task in the manner you describe.
They do seem to need a significantly larger corpus, though, so it's not clear that it actually refutes Chomsky.
And let’s not forget that they are trained on a giga cluster of machines!
I am not sure how brute forcing a chess using Monte Carlo Tree Search, or solving Checkers via exhaustive search, would refute a theory about how people with efficient, low-power-consumption brains that grow organically, are able to master Chess.
Eh, it's certainly true that we're throwing tremendously more hardware and power at the problem with an LLM than with a toddler; that's only relevant to whether LLMs refute Chomsky to whatever degree his argument relied on hardware or power consumption (explicitly or implicitly) and my impression is that it didn't.
Wait so then what is his argument? Because you can always postulate that a large enough computer can simulate every human and therefore can learn stuff too — thus you don’t need a human to learn language, nyeh!
Obviously, all that stuff ChatGPT says about feelings and emotions came from humans writing it!
As I understand it, the question was whether humans learn language starting from a "blank slate" or whether there's some meaningful structure built in (that's therefore common across all human languages). Chomsky's argument was that children are not exposed to enough example sentences to learn all the rules they learn. It's an argument from the information content of the corpus made available to the language learner, irrespective of the capabilities of the learner.
Therefore, if we're able to do the same thing by simply applying more resources, that would undermine his argument in a way that doing the same thing with a vastly larger corpus (whatever the resources we throw at it) doesn't.
I should note that this is based on recollections from 20+ years ago and no serious engagement with the article at hand, so, uh, appropriate salt.
That makes no sense. A “vastly larger corpus” would not have the property of “not being exposed to enough example sentences”. If anything, the amount of sentences is far far more than any human would ever encounter — let alone a child.
Right, that's the point - success with a huge corpus doesn't tell us it would have been possible with a realistic corpus, so doesn't refute the argument.
Man I remember when people downplayed AlphaGo because it didn't teach itself unsupervised. "Nothing to see here". Only took them a few months to do AlphaZero.
AlphaZero works on chess, shogi and go and other perfect information games with discreet moves and board s.
LLMs need pairwise linear input that is composed of independent and identically distributed data.
Feed forward neural networks are effectively DAGs thus semi-decidable.
LLM require a corpus, that data is generated by humans and isn't a perfect information game.
If you dig into how many feedforward neural network can be written as a single pairwise linear function in lower dimensions, you can help build an intuition on how they work in higher dimensions that are beyond our ability to visualize.
AlphaZero being able to build a model without access to opening books or endgame tables in perfect information games was an achievement in implementation, it was not a move past existential quantifiers to a universal quantification.
LLMs still need human produced corpus because the search space is much larger than a simple perfect information game. The game board rules were the source of compression for AlphaZero, while human produced text is the source for LLMs.
Neither have a 'common sense' understanding of the underlying data, their results simply fit a finite subset of the data in the same way that parametric regression does.
As there are no accepted definitions for intelligence, mathematics is the only way to understand this.
VC dimensionally and set shattering is probably the most accessible to programming backgrounds if you are interested.
It's entirely possible that making AlphaGo teach itself with no training examples is much, much easier than doing so for an LLM. Not all problems in AI have the same hardness!
> Also, the LLMs are cheating! They learned from us.
Ding ding ding, right answer!
Now I leave you with a thought: given that the AI is learning from us, is it possible to have an AI that's much more intelligent than their "source of truth"?
I just asked an LLM to create a language and provide a demonstration and this is what it said. Call it a stochastic parrot if you want, but I’m pretty sure a linguist can prompt it to properly invent a language.
Sure, I can invent a new language for you! Let's call it "Vorin" for the purposes of this demonstration.
Vorin is a tonal language with a complex system of noun classes and a relatively simple verb conjugation system. It is spoken by the Vorin people, who live in a mountainous region of an imaginary world. Here is a sample conversation in Vorin:
Person 1: Ní lé màrì?
Person 2: Lé vàlà, nì síkòní.
Person 1: Vàlà síkòní?
Person 2: Àmà, síkòní là nì tékàlá.
Translation:
Person 1: How are you?
Person 2: I'm fine, thank you.
Person 1: Are you working?
Person 2: Yes, I'm working right now.
In Vorin, the tone of each word is very important, and can change the meaning of the word entirely. For example, "lé" with a high tone means "you", while "lé" with a low tone means "fine". In addition to tone, Vorin has a system of noun classes, which are indicated by prefixes on the nouns themselves. For example, "màrì" means "you" if it is in the second person singular class, but it can also mean "house" if it is in the third person singular class.
Vorin's verb conjugation system is relatively simple, with only a few tenses and moods. Verbs are also marked for subject agreement, which means that the verb ending changes depending on the subject of the sentence. For example, "nì síkòní" means "I'm fine", while "là síkòní" means "he/she/it is fine".
Overall, Vorin is a complex and interesting language with a rich grammatical structure and unique tonal system.
1."lè" with a low tone means fine, but "nì síkòní" means "I'm fine", while "là síkòní" means "he/she/it is fine"?
2. The example of "noun prefixes" doesn't include different noun prefixes?
3. The example of "verb ending changes" doesn't include different verb endings?
4. Since the second clause of the second phrase apparently means "I'm fine", the first word must mean "you", but then the second word is repeated in the next sentence which has nothing to do with thanking someone? And "síkòní" gets sprinkled into three sentences with no common concepts?
This just looks, predictably, like gibberish spit out by an LLM.
You're missing the point though, which is that that LLM was already trained on trillions of examples of human languages. Once it already knows many languages, inventing one more isn't hard. (Though I don't think it's actually "invented" this language to any great depth.)
Let's see it invent language from scratch without being presented any examples of languages to use as training data. That's what humanity did. Can LLMs do it? I doubt it, not yet.
The results were... not all that impressive. There were significant issues getting it to consistently apply the rules of the language it had created, even from one prompt to the next -- and after a certain point, it decided to just give Arabic translations instead of the conlang it was supposed to be making up.
Perhaps a more dedicated "prompt engineer"/linguist-type might be able to get better results, but the problem here seems to be similar to the problem trying to get ChatGPT to do arithmetic and other extreme sports. When trying to get it to do anything other than generating one-off syntactically-correct responses to simple prompts in already-existing human languages, it falls down horribly.
I don't see how this proves anything. This isn't how human languages came about, there's a lot of conlang content online, and this doesn't describe a complete language anyway.
This sounds like a backwards explanation to explain why the same words have different meanings.
Maybe the LLM should write the tones too, so it sounds like a child inventing a language and when you point out its logical inconsistencies it invents new rules to fit.
Thank you! It is like arguing that a human engaging in the creation of a landscape portrait using the traditional method of oil painting has been "refuted" by a computer generating vector graphics from statistical descriptions of the same scene. Both yield art but they are clearly different in interesting ways. Neither refutes or outmodes the other. Or maybe I'm wrong and actually trees have refuted mushrooms!
That is my point though: the notion of one plant refuting another is absurd if they both still thrive. Likewise the underpinnings of how LLMs work does not refute the conceptual framework of universal grammar. Especially as refutation means to disprove (not improve, increment, evolve or augment). One more analogy: jiu jitsu does not refute aikido even if it is better in an MMA context. When reaching for refutation prefer synthesis.
What about the fact that trying to use Chomsky's approach to universal grammar does not result in working language models yet a statistical approach does?
The Norvig-Chomsky debate is kind of old at this point:
Chomsky has been adding parameters to his theory to handle exceptions in a way that mimics the endless series of conditional statements appended to knowledge systems of yore.
In neuroscience, predictive processing has gained immense favor and can explain language in ways that have nothing to do with innate grammar.
Exactly how well did "building a bird" work for building flying machines? Birds use the same principle as a fixed wing when it comes to soaring flight. "Building a bird" without the principles of an airfoil and just mimicking the flapping wings does not result in flight.
…he has? Isn’t his modern term “minimalism”, where he tries to simplify things as much as possible? Regardless, continuing to study the field in no way implies that he’s backed down or meaningfully evolved his basic theories of Universal Grammars. He’s very much still confident in them.
Re: predictive processing, in what way does that relate to language…? Even if you apply it to language in a way not mentioned in the linked article at all, I don’t see even the rough shape of how it would refute (/be mutually exclusive with) generative grammars. Maybe I’m just missing something because I don’t know much neuro?
Re: building a bird… yeah that’s their point, you don't try to build a bird, you try to study birds. Chomsky cares about what we are, not building machines to do our drudgery. I don’t think I agree entirely with that singular focus, but you see the appeal, no?
Gina Kuperberg at Harvard has done extensive work on how predictive coding operates in language processing. Her lab has explored how people generate predictions during language comprehension and how this is reflected in neural activity.
I'll point out that neuroscientists have yet to find the "generative grammar" part of the brain but have seen evidence of a very large network of neurons...
Re: building a bird… I didn’t come up with the analogy and if we all want to stop talking about building birds I would be the first to agree.
But I can’t help by think about all of the whacky ideas like antigravity vital forces that biologists contrived to explain how birds could fly and that it took Bernoulli and the rigorous study of those principles that led to the airfoil… which is how birds actually soar through the air.
BTW, what the fuck happened to these forums in the last few years? It seems like most people base their opinions on how opposite they are to Sam Altman and Elon Musk’s as opposed to any geeky principles of discovery. I highly doubt that most of y’all would have been ardent supporters of generative grammar five years ago… but slap the word LLM on something and boy howdy!
It’s kind of nice, I learn quite a bit defending good ideas. All y’all get is fake internet points.
I fully understand any and all
downvotes! Have fun!
You got me, I believe in the most cited living academic’s theories on cognition to stick it to Elon Musk. So glad you used your geeky principles of discovery to divine that one…
The only reason I was talking about "building a bird" in the first place is because that is what the person I was responding to was talking about. At this point I don't even understand what this analogy is supposed to be. Is the bird supposed to be "language"? Or "humans"? Or "human language"? What's the point of this study, just to chit-chat about things? What proof do we have the Chomsky's theories of language are true? Neuroscience has actual research that backs the claims of predictive coding and language acquisition and comprehension! Some names in case you're curious: Gina Kuperberg, Ellen Lau, Florian Jaeger and Roger Levy, just to name a few of the more influential researchers.
He starts with the original Transformational-generative grammar in the late 50s, and then as more and more exceptions and complexities of language were found that didn't fit neatly into the transformational-generative model, the theory was extended to account for these exceptions, creating the Extended Standard Theory (syntactic constraints, X-bar theory), later revised to Revised Extended Standard Theory (restrictions upon X-bar theory, complementizer). Then we've got the addition of the Principles and Parameters theory (head-initial, head-final) in the 80s, which led to the Minimalist program as things had started to get out of hand.
> It's much easier to build an airplane to fly than a bird.
Not really. You could probably put together a flapping bird toy (can buy these mass manufactured too) in about a couple of months of trial and error. Not quite as sophisticated as feathers but the principles are the same. You probably couldn't build an airplane.
For people who don't understand this, the reason humans refer to "Alex" much later in a story is not because they are statistically recalling that they said "Alex" dozens or hundreds of words earlier (as the LLM is described doing in the paper), but because they have a world-model they are actively describing, where "Alex" refers to an entity in that world-model. We know that the LLM is only saying "Alex" because it appeared earlier, but we also know humans don't work like that, so how can the LLM's generation of language say anything about how humans acquire and use it?
Chomsky was saying specific things had to be in the brain because it was impossible to do things otherwise. LLMs shoot this argument down even if they aren't how the brain does it.
I wouldn't be shocked to find that humans don't learn from syntax and semantics, all told. We certainly aren't doing that with our kids, as they learn. And when they start picking up language, it is rapid and impressive. Note that it comes before they can speak, too. Seeing kids ability to understand some complicated directions when they can only do rudimentary sign language is eye opening.
> The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way.
we actually don't know what is inside LM too, so it is possible LM statistically learns syntax and semantics, and it is major part of output quality.
Imagine being told all you need to do to learn Spanish, is to read a 300,000 word Spanish dictionary end to end so that you can probalistically come up with 1000 conversational phrases. Anyone who has learned a language can tell you it just doesn't work like that. You don't work by accumulating a massive dataset and training on it. No one can hold such a massive dataset of anything in their head at once.
If you give it a small enough training set or a big enough neural network, it will directly memorize the whole thing. You have to intentionally make its brain too small to do that in order to force it to find patterns in the data instead.
We could use programming languages as a counterpoint.
LLMs can code in the same way they can use natural languages. But we know that programming languages have structure, we made them that way, from scratch, using Chomsky's theory no less.
Saying that because LLMs can learn programming languages using a different approach and therefore disprove the very theory they are built on is absurd.
Anyways, the paper is long and full of references, I didn't analyse it, does it include looks inside the model? For example, for LLMs to write code correctly, the structure of programming languages must be encoded somewhere in the weights of the model. A way to more convincingly disprove Chomsky's ideas would be to find which part of the network encodes structure in programming languages, and show that there is nothing similar for natural languages.
It is far, far more likely that the way humans learn language resembles LLMs than it does Chomsky’s model, however.
Biology is intrinsically local. For Chomsky’s model of language instinct to work, it would have to reduce down to some sort of embryonic developmental process consisting of entirely of local gene-activated steps over the years it takes for a human child to begin speaking grammatical sentences. This is in direct contrast to most examples of human instinct, which disappear very quickly as the brain develops.
Really the main advantage that Chomsky’s ideas had is that no one could imagine how something simpler could possibly result in linguistic understanding. But large language models demonstrate that no, actually one simple learning algorithm is perfectly sufficient. So why evoke something more complex?
In order to believe this, you'd need to be able to imagine a specific test of something that an LLM could not do under any circumstances. Previously, that test could have been something like "compose a novel sonnet on a topic". Today, it is much less clear that such a test (that won't be rapidly beaten) even exists.
You could use a Markov chain to generate poetry with rhyme and meter[1]. Granted, it wouldn't be a very good one, but that just makes an LLM a refinement to older probabilistic methods.
As for something LLMs are unlikely to do under any circumstances, there's already a fairly obvious example. They can't keep a secret, hence prompt injections.
Do you really believe that an LLM that can keep a secret cannot be made? I suspect that we could do this trivially and the "LLMs can't keep a secret" is a specific product of finetuning for helpfulness.
I think AGI is a questionable concept. We still don't have a good definition of what intelligence really is, and some people keep moving the goal posts. What we need is AI that fills specific needs we have.
If we simply AGI to be "general purpose AI", then my argument is - maybe the approach of LLMs works fine enough for textual generation, but it is not a path towards "general purpose AI".. and what we are going to have is different approaches for different niche use cases.
I'm less convinced there's any unified solution for "general purpose AI" before us here.
And I'm convinced we don't even want "general purpose AI". We want AI for a variety of specific purposes. Admittedly these LLMs are a lot broader than I ever imagined, but they're still limited to generating text. I wouldn't want ChatGPT to drive my car.
> Wordcels are people who have high verbal intelligence and are good with words, but feel inadequately compensated for their skill. The term "cel" denotes frustration over being denied something they feel they deserve.1 Shape rotators are people with high visuospatial intelligence but low verbal intelligence, who have an intuition for technical problem-solving but are unable to account for themselves or apprehend historical context.2 The use of the terms has skyrocketed online in the past few months, especially in the last few days.0 The term "wordcel" is derived from incel and is used to describe someone who has high verbal intelligence but low "visuospatial" intelligence, whose facility for and love of complex abstraction leads them into rhetorical and political dead-ends.
The fact that you can replicate coherent text from probabilistic analysis and modeling of a very large corpus does not mean that humans acquire and generate language the same way. [edited page = 15]