Thanks for your reply. A couple responses to advance the conversation. As a side...

Thanks for your reply. A couple responses to advance the conversation.

As a side note, it's worth mentioning that apparently, from other responses, it seems we have little idea how much arithmetic GPT-3 has learned, and it may not be much.

Anyway, I think the important distinction between my perspective and Overconfident Pessimism, which you attribute to me, is that I'm not talking about (im)possibility of achievement, I'm talking about scientific methodology or lack thereof.

In other words, I'm not saying (here) that some NLP achievements are impossible. I'm saying that we are not rigorously testing, measuring, and verifying what we are even achieving. Instead we throw out superficially impressive examples of results and invite, or provoke, speculation about how much achievement probably must have maybe happened somewhere in order to produce them.

We have seen several years of this pattern, so this is not a GPT-3 specific criticism; it's just that particular quote so neatly captured patterns of lack of scientific rigour that we have seen repeatedly at this point.

Probably the first example was image recognition. Everyone was amazed by how well neural nets could classify images. There was a ton of analogous speculation -- along the lines of 'we're not sure, but the speculation is the networks figured out what it really means to be a panda or a stop sign and encoded it in their weights.' The terms "near-human performance" and then "human-level performance" were thrown around a lot.

Then we found adversarial examples and realized that e.g. if you rotate the turtle image slightly, the model becomes extremely confident that it's a rifle. So, obviously it has no understand of what a turtle or a rifle is. And obviously, we as researchers don't understand what those neural nets were doing under the hood, and that speculation was extremely over-optimistic.

Engineering cool things can absolutely be a part of a scientific process. But we have seen countless repetitions of this pattern (especially since GANs): press releases and impressive-looking examples without rigorous evaluation of what the models are doing or how; invitations to speculate on the best-possible interpretation; and announcing that the next step is to make it bigger. I think this approach is both anti-science and misleading to readers.