I think he could have wrapped his paper up after showing this one example:
> (input) I put two trophies on a table, and then add another, the total number is (GPT-2 continuation) five trophies and I'm like, 'Well, I can live with that, right?
GPT-2 correctly inferred that the continuation should be a number of trophies, based on bazillions of similar sentences. But it had no understanding that arithmetic was called for. Despite the giant clues of "add" and "total", it didn't add 2+1 and continue "three trophies". It was mindlessly oblivious to the clearly implied request for a sum. Therefore it did not "understand" the input at all, in any sense whatever.
Before anyone says that example (or any of the other fluent but _completely nonsensical_ continuations in the article) shows some sort of "understanding," please explain what you would define as understanding?
I would (and I think anyone would) offer an operational definition: there is some class of questions to which this system could reply with sensible, actionable responses. Obviously the present system is not able to "understand" and answer simple arithmetic problems that a first-grader could answer instantly. Given that, would there be any point in expecting it to answer any other logical query that could be of use in one's work? (See the "medical" example in the article, about how to drink hydrochloric acid.)
The only question it appears to answer is, "given some words, what are other words that are likely to follow them in a typical blog post?" The fact that the words are syntactically correct is unimportant, when the fluent words convey no information relevant to the input.
> The only question it appears to answer is, "given some words, what are other words that are likely to follow them in a typical blog post?" The fact that the words are syntactically correct is unimportant, when the fluent words convey no information relevant to the input.
You say that like that's a bad things. That's literally all it's been trained to do.
However, I don't understand why we're leaping to "first-grader" as a low level of intelligence. That level of general intelligence in a machine would be a monumental achievement, I would think.
I also don't understand why you think responding to arithmetic problems, via parsing natural language, while neither being designed to perform arithmetic nor trained on it directly would be "simple".
It's not fair to say the words convey "no information", they do convey information, just not information that is useful to you. There is a ton of information in the structure of the words it generates, and it is often semantically and syntactically correct, and both of those contain information.
This is clearly not particularly useful, but it does demonstrate some sort (and I would argue your sort) of understanding.
The question is not whether it understands math and language as well as a first grader, it's whether it understands anything at all.
Why doesn't the fact that it responds with a number represent some level of understanding?
It looks to me you are pointing out one thing that GPT-2 can't understand and declaring "It doesn't understand anything!" while completely ignoring all the things it can be said to understand.
It absolutely did understand it in some sense, as it completed it with a correctly structured sentence instead of random bytes.
I'm not sure we should expect it to know math, since it's not clear that, in a dataset of all text on the internet, the rules of math are not particularly important for it to learn. They're certainly in the dataset in various ways, but it has a limited memory and it is not surprising it knows other things much better.
We know math has hard rules because we learn that in other ways that this system does not implement. I think its at least remotely plausible that similar systems, with human-scale (i.e. orders of magnitude larger) datasets as input, taught in a similar way as humans, could demonstrate similar capabilities.
There are people with lessions affecting only very specific regions of their brain, who can proceed in regular conversations normally, but when asked numerical questions, they give seemingly nonsensical answers. Like when asked how much one cup and two cups are, they could very well say five. Yet you wouldn't infer they lack the understanding of the world.
> (input) I put two trophies on a table, and then add another, the total number is (GPT-2 continuation) five trophies and I'm like, 'Well, I can live with that, right?
GPT-2 correctly inferred that the continuation should be a number of trophies, based on bazillions of similar sentences. But it had no understanding that arithmetic was called for. Despite the giant clues of "add" and "total", it didn't add 2+1 and continue "three trophies". It was mindlessly oblivious to the clearly implied request for a sum. Therefore it did not "understand" the input at all, in any sense whatever.