If you are curious to see more sample dialogs, you can find them in the publishe...

Nihilartikel · on Dec 8, 2021

One has to wonder if the final response is the first glimmer of an artificial sense of humor.

Failing at simple arithmetic after nailing some advanced physics answers has the air of playful bathos.

xmaayy · on Dec 8, 2021

I think it's more likely that 5 came out because if it ever saw the answer, 105, before, it was split into the tokens [10][5] of which it only 'remembered' one. Or the numbers were masked when training (something that was done with BERT-like models) so it just knew enough to put a random one in

Nihilartikel · on Dec 8, 2021

That seems likely and fair.

What moved me to post is that that kind of silly answer is the exact sort of shenanigans that I would pull if I were cast as the control group in a Turing test.

I already do such things winkingly when talking with my preschooler to send him epistemic tracer rounds and see if he's listening critically

doctor_eval · on Dec 9, 2021

> epistemic tracer rounds

that's the best phrase I've heard all year.

I do this all the time with my kids too, but I think of it more as fault injection.

ramraj07 · on Dec 22, 2021

Some of the funniest jokes I tell are retellings of jokes from obscure comedians of the past. How is this AI any different from me?

mellosouls · on Dec 22, 2021

You enjoy the joke.

visarga · on Dec 22, 2021

AI enjoys "low error rate". They have to, otherwise they would not exist.

AIs that don't lower the error rate are abandoned, AIs that score well are replicated and improved. It's evolution at work, but they have to enjoy (optimise for) lower error rates in order to even exist.

uoaei · on Dec 8, 2021

Nothing like a little anthropomorphism to completely distort otherwise good faith interpretations of bot behavior.

Nihilartikel · on Dec 8, 2021

How is the impression of playfulness not a good faith interpretation?

You of course know that the model is not capable of thought or reasoning - only the appearance of them as needed to match its training corpus. A training corpus of completely human generated data. As such, how could anything it does, be anything but anthropomorphic?

Now, if this model were trained exclusively on a corpus of mathematical proofs stripped of natural language commentary, the expectation that you seem to have would be more appropriate.

renzo88 · on Dec 8, 2021

> You of course know that the model is not capable of thought or reasoning

Do we know? It's the reverse Chinese room problem. :p

Nihilartikel · on Dec 8, 2021

A good point - I'm taking it as given that reasoning of any depth is more of an iterative process, with one thought advancing as a meta-cognitively guided feedback to the next until a conclusion is reached. One prompt->completion cycle from a language model wouldn't necessarily meet that definition, but I bet it could be a component in a system that tries to do so.

I aspire one day to find the free weekends and adequate hubris to build a benchtop implementation of Julian Jayne's Bicameral Mind with 1+N GPT-3 or GPT-neo instances prompting each other iteratively to see where the train of semantics wanders. (as I'm sure others have already)

gwern · on Dec 9, 2021

One definitely has to wonder. We know that GPT-3 solves 1/2 digit arithmetic pretty much perfectly†; people criticized this very hotly as "it's just memorizing", but regardless, whether it learned or memorized arithmetic, that should apply even more to Gopher, which performs so much better and is larger. How can GPT-3 solve similar arithmetic near-perfectly and Gopher then be unable to...? Are we going to argue that "15 x 7" never appears in Internet scrapes and that's why Gopher couldn't memorize it?

I would want to ask it "15 x 7" outside of a dialogue or with examples, or look at the logprobs, or check whether "15 * 7" works (could there be something screwed up in the tokenization or data preprocessing where the 'x' breaks it? I've seen weirder artifacts from BPEs...). GPT-3 does not always 'cooperate' in prompting or dialogues or read your mind in guessing what it 'should' say, and there's no reason to expect Gopher to be any different. The space before the question mark also bothers me. Putting spaces before punctuation in Internet culture is used in a lot of unserious ways, wouldn't you agree 〜

I definitely would not hastily jump to the conclusion, based on one dialogue, "ah yes, despite its incredible performance across a wide variety of benchmarks surpassing GPT-3 by considerable margins and being expected to do better on arithmetic than GPT-3, well, I guess Gopher just can't multiply 1-digit numbers or even guess the magnitude or first digit of the result! What a pity!"

† quickly checking, GPT-3 can solve '15 x 7 =105'.

muzani · on Dec 9, 2021

GPT-3 is very capable of humor, even better than humans are. It's not really that it finds it funny, but more that it's mimicking a conversational pattern. Sarcasm, silliness, snarkiness, it's all there. Gopher seems to also be trained on books and the internet, so it wouldn't be so surprising.

gh0std3v · on Dec 9, 2021

While I want to believe this is true, I think the above response was more a lack of Gopher's ability to analyze mathematical equations properly than an attempt to display a sense of humour. Many NLP approaches work by creating word embeddings, which don't always help the model to understand "first-order logic" language mixed with spoken language.

Though who knows, maybe it does have a sense of humour.

muzani · on Dec 10, 2021

In full context, it's a conversation. "What does this mean, and what does that mean, what is also related to that?" If you ask a human impressively difficult questions then an absurdly trivial one, a human will probably respond with sarcasm. I'd expect Gopher to as well. It might get the answer right if you were asking it a series of arithmetic questions.

According to the link, Gopher is far better at math than GPT-3, and GPT-3 can solve "15 x 7", so I'd assume that Gopher would be able to as well.

EarlKing · on Dec 8, 2021

Were it so, it should've answered 42.

Y_Y · on Dec 8, 2021

I'm not familiar with the Bardeen metric, but spinning and charged is usually described by the Kerr-Newman metric.

dietr1ch · on Dec 8, 2021

Well, math is hard

auggierose · on Dec 8, 2021

What that reminds me of is Rollerball (the James Caan original), and the water based computer in it.