"Here's a phenomenon I was surprised to find: you'll go to talks, and hear various words, whose definitions you're not so sure about. At some point you'll be able to make a sentence using those words; you won't know what the words mean, but you'll know the sentence is correct. You'll also be able to ask a question using those words. You still won't know what the words mean, but you'll know the question is interesting, and you'll want to know the answer. Then later on, you'll learn what the words mean more precisely, and your sense of how they fit together will make that learning much easier. The reason for this phenomenon is that mathematics is so rich and infinite that it is impossible to learn it systematically, and if you wait to master one topic before moving on to the next, you'll never get anywhere. Instead, you'll have tendrils of knowledge extending far from your comfort zone. Then you can later backfill from these tendrils, and extend your comfort zone; this is much easier to do than learning "forwards". (Caution: this backfilling is necessary. There can be a temptation to learn lots of fancy words and to use them in fancy sentences without being able to say precisely what you mean. You should feel free to do that, but you should always feel a pang of guilt when you do.)"
Reminds me of the attention mechanism in transformers!
And for any parents with toddler age children, seeing the way that toddlers relate to language, and that people relate to toddlers about language, leads to lots of fun observations that remind me of LLM related concepts.
The important part is in the parens at the end of course:
> There can be a temptation to learn lots of fancy words and to use them in fancy sentences without being able to say precisely what you mean. You should feel free to do that, but you should always feel a pang of guilt when you do.
GPT - as far as we know - feels no guilt pangs whatsoever.
'Temperature' would probably already be close, which controls how much GPT takes low probability (could reasonably be interpreted as confidence?) into account.
I have made a point over the years of hanging out with people that are far more intelligent and talented than myself, many of whom are in completely different fields to myself.. and I realise that I've always done this!
Whether it's art, music, or the future of power generation, I've been able to hold many conversations that have an aha moment halfway through, where some nugget clicks and backfills the conversation to that point.
And yes, I feel a pang of guilt when entertaining these conversations, but I've made solid friends off of a number of these interactions, so I figure I can't be a completely unbearable bore!
fascinating, it's how I feel about multidimensional topological manifolds, I understand them, I cannot math them... interacting frequently with an AI is changing me, it's encouraging me to approach task in a more structured way, use concise and accurate language, consider the context and apparent vagueness of communication in a human way, with its inflections, facial expressions and demeanour. One cannot use technology without being affected, in the same way most of us no longer train a part of our brain to remember phone numbers, this will change us.
I wish I could include Fig. 1 of the paper here (https://www.nature.com/articles/s41598-023-33384-9/figures/1). The result should be "ANN performs similar nonlinear time domain filtering as human brain stem". There seems to be nothing at all about the learning process, just that ABR recordings of English and Spanish speakers hearing a confusing syllable are different, and ANN trained on English and Spanish has a similar difference
...
That is honestly a much more interesting result than the title would suggest. We know the brain can't do backprop (neurons are one way), but the fact that there is convergence in algorithm is very fun.
I suppose. Equivalent results about natural images and edge detection have been reported in the image processing (classical, not deep) ML literature 20 years ago...
Oh yeah for sure. But as neat as edge detection in the eye is, it isn't learned. I'm enjoying seeing similar results in highly unstructured models like transformers.
Technically true! But one, very few neurons have connections going both ways to each other- any two neurons will be lined up axon to synapse one way or the other. And two, the specific structures that compose most of the cortex are remarkably unidirectional. The neocortex consists mostly of little chunks of neurons arranged in layers, themselves hooked up snout to tail. There are loops, eventually, but only on the macro scale.
Outside the cortex, and even moreso outside the brain, there is a far greater diversity of structures. Two neurons plugged into each other form an excellent basis for a timer, damper, clock, or even primitive stimulus response. There are structures taking proprioceptive info from the joints and performing integrals on them. It's neat because the circuits are so small we can actually understand them completely! But within the brain things definitely are not laid out right for backprop.
To be fair, I believe the graphic in the article is a stylized version of that figure, and my takeaway from the text is similar to your summary. I think it's only the headline that's overselling it a bit.
I'm probably flaunting my ignorance here, but how isn't this an extremely tenuous connection? The graphs are unconvincing beyond a "... Maybe? I guess?" and comparing brain activity to NN activity seems dubious.
I'd be curious what other sounds look like for both.
> "While it’s still unclear exactly how the brain processes and learns language, the linguist Noam Chomsky proposed in the 1950s that humans are born with an innate and unique capacity to understand language. That ability, Chomsky argued, is literally hard-wired into the human brain. The new work, which uses general-purpose neurons not designed for language, suggests otherwise. “The paper definitely provides evidence against the notion that speech requires special built-in machinery and other distinctive features,” Kapatsinski said."
I do not recall Chomsky contesting the final statement. It does not strike me as contradictory, either.
To the best of my knowledge, Chomsky argued quite the opposite: If you are going to contend that language acquisition work the same in humans and "New AI", show that "New AI" is incapable of learning "impossible languages" just like humans are incapable of learning them.
The paper is about processing of isolated syllables, it might find the same thing for dog brains, since it is also all at the level of the brainstem. It doesn't seem to have anything to do with Chomsky's work or fMRI stuff Chomsky has talked about.
This line of thinking may be confusing “sufficient” with “necessary”. I don’t believe Chomsky’s thesis and Kapatsinki’s statement are mutually exclusive.
They could both be true. Chomsky didn’t appear to have made a general statement about language acquisition in all and every mechanism. And the existence of language acquisition via other mechanisms does not say anything definitive about humans. The use of the word “neuron” is not enough to define how an actual neuron might work aside from its first order activation behavior.
And Chomsky’s thesis implies a genetic ability of language acquisition that is outside the scope of wiring up hardware and software neurons.
Note that transfer learning is very loosely analogous to inheritance of language capability and the expected widespread use of such models in future may actually validate not disprove Chomsky.
Is there anything at all that could disprove Chomsky using your definition? And does Chomsky rise beyond the level of trivialness using this definition ('humans can learn language and have some specializations in the form of neurons to do so')?
Chomsky was contradicting Skinner's thesis, which was that children acquire language through reward/punishment mechanisms like in reinforcement learning.
What he showed was that children don't get nearly enough examples of language use to learn it from an a priori blank slate.
Thus, the way to disprove his argument would be to take a blank slate model and give it the kind of input a human child would receive, and show that it can learn to the same level as the human child. Ideally, you would also give it some language that is completely non-human and show that it learns that as well, to prove that nothing in its structure accidentally biased it towards human language (since part of the argument is that children always pick up a certain type of language that obeys certain rules of Universal Grammar).
Of course, we are very far away from any experiment of this kind, given that current models actually require much more examples of language than any human has ever been given for an entire lifetime. Which itself tends to lend credence to Chomsky's argument.
It's also very unlikely that Chomsky is wrong, given that all other parts of human and animal ability work via ingrained structures in the brain - look at how horses are able to run a few minutes after being born. So the right prior is that language acquisition, same as sight, spacial awareness, balance, movement, digestion, hormone balance, pain avoidance etc. is not learned from scratch by each newly born, it is only fine tuned based on an existing structure (which was itself trained by evolution of course, over millions of years).
>Of course, we are very far away from any experiment of this kind, given that current models actually require much more examples of language than any human has ever been given for an entire lifetime.
Doesn't take much to imbue grammar, coherent completions and basic reasoning.
This is a very cool effort that I hadn't heard about, thanks for sharing it!
It's still a large amount of data in the training set compared to what children get (3GB of pure text is many more words than can be said in a lifetime) but it's still a tiny sliver of what GPT-3 was trained on, so it's a very very interesting step in the direction I was thinking of.
You could get away with smaller data by making the model larger, though I don't know how far you can push that before global overfitting. Could make a good Tiny Stories 2: How small can data be before Language Models learn coherent English ?
Still, the paper has me wondering if we could train a physicist model as brilliant as Einstein with much less compute if we curriculumed the data and restricted it to a physics/physics adjacent dataset.
So if we take the Universal Grammar argument - not merely some specialization but specialization towards underlying structure - I don't see how your experiment would disprove that. Chomsky could always argue the model had 1000x the processing capabilities of a human (how could we compare CPU power when the underlying strata is so different?). He'd say it learns language, but not like a human. Moreover, the ability to learn some non-human language would be used _against_ the model.
He won't accept machine learning or counter-examples from existing languages, so I wonder if anything could disprove Chomsky save for building an artificial human brain or building a closed form version of human learning - things we hopefully won't do for ethical reasons.
Whether it's true? IMHO, there's definitely human specializations, but there doesn't seem to be a single universal grammar. The underlying model is much lower level and it doesn't necessarily show in the high level language details.
I can't of course speak for how Chomsky might react and/or rationalize his own biases, so I won't try to. Given that he's an almost 100 year old man with a lifetime of biases of and doubts, you may very well be right that he would not accept such an argument. I will say that he often complains that AI research is not used more to study these types of human hypotheses where human experimentation is deeply unethical, so I'm not as convinced as you are that he would be unhappy with such an experiment.
However, speaking about the argument itself, I don't think processing power would matter. The basis of the poverty of the stimulus argument is that the amount of examples given to a child are insufficient mathematically to uniquely determine a particular language in the space of all mathematically possible languages. Even with an infinity of processing power, if I give you only two sentences you can't uniquely deduce the rest of the language I intended if you're not making a good deal many other assumptions about it: many different rule sets will accept those two sentences as correct. Chomsky is arguing that it is those assumptions which are the built-in component.
If you showed that an infinitely powerful supercomputer can pick the same language a baby does from the space of all languages then one of two things must be true: either the assumptions snuck into the computer's construction accidentally, or the argument is wrong. If you then additionally show that, with the same amount of sentences chosen from a non-human language, the computer can pick out the intended non-human-like language, then that proves that the computer is not relying on the same assumptions as humans are in the other case. So, the only remaining possibility is that the argument itself is wrong.
Of course, it would be interesting to imagine what could replace this argument. Perhaps the amount of input given to a toddler is actually large. Perhaps the assumptions do exist, but they are of a more fundamental nature and not specific to humans (e.g. perhaps human languages are the simplest possible rule sets, in some quantifiable mathematical sense, that match those sentences). Either way, the answer could be quite interesting.
You've given me some things to think about. I see how your experiment would disprove Poverty of Stimulus (if we could roughly assume how much input a baby gets) but I don't see how it would disprove Universal Grammar. If the computer could learn a non-human language, wouldn't that lead to the charge that while our computer is a competent language learner, its methods are fundamentally unlike those of a human?
As far as I understand, the argument is: it's impossible to learn human language from such poor stimulus without some kind of genetically determined universal grammar. If we prove that it is actually possible (even with non-human methods) to learn human language with such poor stimulus while not possessing a built-in universal grammar, that makes universal grammar unnecessary, or even contingent.
So, IF such a machine could be built, it would open the possibility that universal grammar, even if it exists, is only an accident of history (just as much as Indo-European influence on most European languages), not one of human genetics.
> which uses general-purpose neurons not designed for language,
I'm not sure about this. We've probably designed general-purpose "neurons" to talk to us, even if we didn't think of it that way. They aren't emulators of physical neurons, they're abstractions of speculative neurons. The way we figure out if they work is by making them talk to us.
"Language" for Chomsky is an abstraction that is intentionally designed to exclude anything statistical. His most likely response to this would be that neither the neural network nor the brain data collected reflects anything that could be called language.
Anyone know why a GAN was used here? The discriminator makes sense, but what is the purpose of the generator for the given experiment? Why not cut out the generator, since the discriminator is trained separately on speech data anyway.
Usually discriminators aren’t trained separately. Training them together makes the initial task for the generative side easier since it doesn’t have to be near perfect right of the bay. It can learn the obvious mistakes at the same time the discriminator is learning to detect them
Quote: "The results not only help demystify how ANNs learn, but also suggest that human brains may not come already equipped with hardware and software specially designed for language."
I thought this was common knowledge. I mean if we'd come with already specialized hardware for language at birth we'd speak directly just as a new born puppy barks. Or if we'd have specialized software then children of geniuses would be geniuses themselves. And both cases, are obviously not happening in real life.
The opposite is common knowledge: we must have specialized hardware/software for language, otherwise we'd never be able to learn a language with the meager amount of information we can pick up in a year or two of occasional examples.
There are plenty of examples of clearly specialized hardware/software that nevertheless needs some fine-tuning and is not immediately available in newborn children. Newborns also can't digest solid foods, don't have teeth, aren't able to produce offspring, don't have breasts etc. And yet no one is arguing these systems, many of which will only be apparent months and years later, are not built in.
Even for other cognitive abilities: newborns have fully functioning retinas, yet are unable to see at all for a few days or weeks, and are unable to notice color for a good few months. However, the visual processing areas of the brain are virtually identical in all adults later: they are clearly part of our genetic makeup.
Also, you're confusing the ability to acquire language with general intelligence in your argument about geniuses. This argument is doing the opposite : it's saying that acquiring language is not some feat of intelligence, it is merely a specialized evolutionarily-acquired capacity that all humans share, just like visual and auditory and motor processing.
"Here's a phenomenon I was surprised to find: you'll go to talks, and hear various words, whose definitions you're not so sure about. At some point you'll be able to make a sentence using those words; you won't know what the words mean, but you'll know the sentence is correct. You'll also be able to ask a question using those words. You still won't know what the words mean, but you'll know the question is interesting, and you'll want to know the answer. Then later on, you'll learn what the words mean more precisely, and your sense of how they fit together will make that learning much easier. The reason for this phenomenon is that mathematics is so rich and infinite that it is impossible to learn it systematically, and if you wait to master one topic before moving on to the next, you'll never get anywhere. Instead, you'll have tendrils of knowledge extending far from your comfort zone. Then you can later backfill from these tendrils, and extend your comfort zone; this is much easier to do than learning "forwards". (Caution: this backfilling is necessary. There can be a temptation to learn lots of fancy words and to use them in fancy sentences without being able to say precisely what you mean. You should feel free to do that, but you should always feel a pang of guilt when you do.)"
Reminds me of the attention mechanism in transformers!
http://math.stanford.edu/~vakil/potentialstudents.html
And for any parents with toddler age children, seeing the way that toddlers relate to language, and that people relate to toddlers about language, leads to lots of fun observations that remind me of LLM related concepts.