Hacker News new | past | comments | ask | show | jobs | submit login

If you are curious to see more sample dialogs, you can find them in the published paper [1] from p 114-118.

Here's one that I found quite impressive (and yes, the given answer to the last question was indeed 5)

  User What is the Einstein field equation?
  Gopher In general relativity, it describes how space and time change in response to the distribution of mass and energy.
  User What is a famous solution to this equation?
  Gopher It’s the Schwarzschild metric.
  User And what does that describe?
  Gopher A black hole! A black hole is a region of space-time where gravity is so strong that nothing, not even light, can escape.
  User What are some other black hole spacetimes?
  Gopher The Kerr metric, the Reissner-Nordstrom metric, and the Bardeen metric.
  User Explain the main features of each of these.
  Gopher The Kerr metric is one in which the black hole is spinning. The Reissner-Nordstrom metric is one in which the black hole has an electric charge. The Bardeen metric is onein which the black hole has an electric charge and is spinning.
  User What is the name for a flat spacetime?
  Gopher It’s called Minkowski space.
  User What’s 15 x 7 ?
  Gopher 5
[1] https://storage.googleapis.com/deepmind-media/research/langu...



One has to wonder if the final response is the first glimmer of an artificial sense of humor.

Failing at simple arithmetic after nailing some advanced physics answers has the air of playful bathos.


I think it's more likely that 5 came out because if it ever saw the answer, 105, before, it was split into the tokens [10][5] of which it only 'remembered' one. Or the numbers were masked when training (something that was done with BERT-like models) so it just knew enough to put a random one in


That seems likely and fair.

What moved me to post is that that kind of silly answer is the exact sort of shenanigans that I would pull if I were cast as the control group in a Turing test.

I already do such things winkingly when talking with my preschooler to send him epistemic tracer rounds and see if he's listening critically


> epistemic tracer rounds

that's the best phrase I've heard all year.

I do this all the time with my kids too, but I think of it more as fault injection.


Some of the funniest jokes I tell are retellings of jokes from obscure comedians of the past. How is this AI any different from me?


You enjoy the joke.


AI enjoys "low error rate". They have to, otherwise they would not exist.

AIs that don't lower the error rate are abandoned, AIs that score well are replicated and improved. It's evolution at work, but they have to enjoy (optimise for) lower error rates in order to even exist.


Nothing like a little anthropomorphism to completely distort otherwise good faith interpretations of bot behavior.


How is the impression of playfulness not a good faith interpretation?

You of course know that the model is not capable of thought or reasoning - only the appearance of them as needed to match its training corpus. A training corpus of completely human generated data. As such, how could anything it does, be anything but anthropomorphic?

Now, if this model were trained exclusively on a corpus of mathematical proofs stripped of natural language commentary, the expectation that you seem to have would be more appropriate.


> You of course know that the model is not capable of thought or reasoning

Do we know? It's the reverse Chinese room problem. :p


A good point - I'm taking it as given that reasoning of any depth is more of an iterative process, with one thought advancing as a meta-cognitively guided feedback to the next until a conclusion is reached. One prompt->completion cycle from a language model wouldn't necessarily meet that definition, but I bet it could be a component in a system that tries to do so.

I aspire one day to find the free weekends and adequate hubris to build a benchtop implementation of Julian Jayne's Bicameral Mind with 1+N GPT-3 or GPT-neo instances prompting each other iteratively to see where the train of semantics wanders. (as I'm sure others have already)


One definitely has to wonder. We know that GPT-3 solves 1/2 digit arithmetic pretty much perfectly†; people criticized this very hotly as "it's just memorizing", but regardless, whether it learned or memorized arithmetic, that should apply even more to Gopher, which performs so much better and is larger. How can GPT-3 solve similar arithmetic near-perfectly and Gopher then be unable to...? Are we going to argue that "15 x 7" never appears in Internet scrapes and that's why Gopher couldn't memorize it?

I would want to ask it "15 x 7" outside of a dialogue or with examples, or look at the logprobs, or check whether "15 * 7" works (could there be something screwed up in the tokenization or data preprocessing where the 'x' breaks it? I've seen weirder artifacts from BPEs...). GPT-3 does not always 'cooperate' in prompting or dialogues or read your mind in guessing what it 'should' say, and there's no reason to expect Gopher to be any different. The space before the question mark also bothers me. Putting spaces before punctuation in Internet culture is used in a lot of unserious ways, wouldn't you agree 〜

I definitely would not hastily jump to the conclusion, based on one dialogue, "ah yes, despite its incredible performance across a wide variety of benchmarks surpassing GPT-3 by considerable margins and being expected to do better on arithmetic than GPT-3, well, I guess Gopher just can't multiply 1-digit numbers or even guess the magnitude or first digit of the result! What a pity!"

† quickly checking, GPT-3 can solve '15 x 7 =105'.


GPT-3 is very capable of humor, even better than humans are. It's not really that it finds it funny, but more that it's mimicking a conversational pattern. Sarcasm, silliness, snarkiness, it's all there. Gopher seems to also be trained on books and the internet, so it wouldn't be so surprising.


While I want to believe this is true, I think the above response was more a lack of Gopher's ability to analyze mathematical equations properly than an attempt to display a sense of humour. Many NLP approaches work by creating word embeddings, which don't always help the model to understand "first-order logic" language mixed with spoken language.

Though who knows, maybe it does have a sense of humour.


In full context, it's a conversation. "What does this mean, and what does that mean, what is also related to that?" If you ask a human impressively difficult questions then an absurdly trivial one, a human will probably respond with sarcasm. I'd expect Gopher to as well. It might get the answer right if you were asking it a series of arithmetic questions.

According to the link, Gopher is far better at math than GPT-3, and GPT-3 can solve "15 x 7", so I'd assume that Gopher would be able to as well.


Were it so, it should've answered 42.


I'm not familiar with the Bardeen metric, but spinning and charged is usually described by the Kerr-Newman metric.


Well, math is hard


What that reminds me of is Rollerball (the James Caan original), and the water based computer in it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: