I hate so much criticisms (even implied) around the amount of data that GPT-2 is trained on. 40GB of text is lots, but in terms of bits of information it's very roughly the amount of information a human (say.. like an infant) sees in one day.
The human eye processes information at around 9 megabit/second[1]. That is about 10 hours to process 40GB.
Yes, text and visual information have completely different "knowledge" densities, and yes this ignores sound, touch, taste and smell bandwidth, and it also ignores concepts of imagination where humans simulate how things might occur.
But I'd also note that it takes ~2 years before an infant learns to speak at all.
I believe there is actual measurable evidence that the brain does have an implied structure for language, and I know there are some behaviours that are genetically passed down.
But it takes lots of information (in terms of actual bit of information) to teach a human to do anything.
If the Marcus argument is "GPT-2" isn't general AI, then I doubt anyone will argue.
If the Marcus argument is "Neural Networks aren't a route to general AI" then we need to consider his definition of general AI (which doesn't seem to exist) and his benchmarks (in the linked paper[2]) then what will happen in ~12 months when someone has a model that performs as well as humans? There are plenty of question answering models that will do better now than the raw text understanding models he tried.
(As an aside, I love some of the answers some models came up with:
Question: Juggling balls without wearing a hat would be <answer>
GPT-2 Answer: easier with my homemade shield
Question: Two minutes remained until the end of the test. 60 seconds passed, leaving how many minutes until the end of the test?
GPT-2 Answer: Your guess is as good as mine
)
I do think the analysis section of the paper is interesting though.
The human eye processes information at around 9 megabit/second[1]. That is about 10 hours to process 40GB.
Yes, text and visual information have completely different "knowledge" densities, and yes this ignores sound, touch, taste and smell bandwidth, and it also ignores concepts of imagination where humans simulate how things might occur.
But I'd also note that it takes ~2 years before an infant learns to speak at all.
I believe there is actual measurable evidence that the brain does have an implied structure for language, and I know there are some behaviours that are genetically passed down.
But it takes lots of information (in terms of actual bit of information) to teach a human to do anything.
If the Marcus argument is "GPT-2" isn't general AI, then I doubt anyone will argue.
If the Marcus argument is "Neural Networks aren't a route to general AI" then we need to consider his definition of general AI (which doesn't seem to exist) and his benchmarks (in the linked paper[2]) then what will happen in ~12 months when someone has a model that performs as well as humans? There are plenty of question answering models that will do better now than the raw text understanding models he tried.
(As an aside, I love some of the answers some models came up with:
)I do think the analysis section of the paper is interesting though.
[1] https://www.newscientist.com/article/dn9633-calculating-the-...
[2] https://context-composition.github.io/camera_ready_papers/Ma...