>> But GPT-3 is much more successful, including at giving correct answers to ari...

>> But GPT-3 is much more successful, including at giving correct answers to arithmetic problems that weren't in its training set.

That's not exactly what the GPT-3 paper [1] claims. The paper claims that a search of the training dataset for instances of, very specifically, three-digit addition, returned no matches. That doesn't mean there weren't any instances, it only means the search didn't find any. It also doesn't say anything about the existence of instances of other arithmetic operations in GPT-3's training set (and the absence of "spot checks" for such instances of other operations suggests they were, actually, found- but not reported, in time-honoured fashion of not reporting negative results). So at best we can conclude that GPT-3 gave correct answers to three-digit addition problems that weren't in its training set and then again, only the 2000 or so problems that were specifically searched for.

In general, the paper tested GPT-3's arithmetic abilities with addition and subtraction between one to five digit numbers and multiplication between two-digit numbers. They also tested a composite task of one-digit expressions, e.g. "6+(4*8)" etc. No division was attempted at all (or no results were reported).

Of the attempted tasks, all than addition and subtraction between one to three digit numbers had accuracy below 20%.

In other words, the only tasks that were at all successful were exactly those tasks that were the most likely to be found in a corpus of text, rather than a corpus of arithmetic expressions. The results indicate that GPT-3 cannot "perform arithmetic" despite the paper's claims to the contrary. They are precisely the results one should expect to see if GPT-3 was simply memorising examples of arithmetic in its training corpus.

>> So what changed? We aren't sure, but the speculation is that in the process of training, GPT-3 found that the best strategy to correctly predicting the continuation of arithmetic expressions was to figure out the rules of basic arithmetic and encode them in some portion of its neural network, then apply them whenever the prompt suggested to do so.

There is no reason why a language model should be able to "figure out the rules of basic arithmetic" so this "speculation" is tantamount to invoking magick.

Additionally, language models and neural networks in general are not capable of representing the rules of arithmetic because they are incapable of representing recursion and universally quantified variables, both of which are necessary to express the rules of arithmetic.

In any case, if GPT-3 had "figure(d) out the rules of basic arithmetic", why stop at addition, subtraction and multiplication between one to five digit numbers? Why was it not able to use those learned rules to perform the same operations with more digits? Why was it not capable of performing division (i.e. the opposite of multiplication)? A very simple asnwer is: GPT-3 did not learn the rules of arithmetic.

_________

[1] https://arxiv.org/abs/2005.14165