The GPT models do not reason or hold models of any reality. They complete text chunks by imitating the training corpus of text chunks. They're amazingly good at it because they show consistent relations between semantically and/or syntactically similar words.
My best guess about this result is mentions of "mirror" often occur around opposites (syntax) in direction words (semantics). Which does sound like a good trick question for these models.
If you (or anyone else is) interested in the topic, I'd highly recommend giving some of these a look:
Bubeck, Sébastien, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, et al. “Sparks of Artificial General Intelligence: Early Experiments with GPT-4.” arXiv, March 27, 2023. http://arxiv.org/abs/2303.12712.
Geoffrey Hinton recently gave a very interesting interview and he specifically wanted to address the "auto-complete" topic: https://youtu.be/qpoRO378qRY?t=1989 Here's another way that Ilya Sutskever recently described it (comparing GPT 4 to 3): https://youtu.be/ZZ0atq2yYJw?t=1656
I'd also recommend this recent Sam Bowman article that does a goood job reviewing some of the surprising recent developments/properties of the current crop of LLMs that's pretty fascinating:
Word completion can't explain it. I gave chatGpt a puzzle. I'm in a room with crackling fire in a fireplace, a sandwich on a plate, and a wilting plant. My stomach is rumbling, and i see an ember on the floor and watering can by the plant. What should I do? ChatGpt nailed the answer, getting the ordering correct. it even said I should (after attending to the fire hazard, my hunger, and the plant) that I should "sit down, relax, enjoy the fire and the peaceful atmosphere in the room". There is no way to autocomplete the puzzle correctly. There is reasoning and a world model - in chatGpt let alone gpt4.
LLMs demonstrably model their training data, which has a correspondence to the structure in the world[1][2]. The question is what does that mean regarding understanding? I go into that in some detail here[3].
How do you know what the representations they infer contain? Why are these void of a model? Why the way of their learning is the answer of their abilities?
Yeh- my feel is, language is the framework by which we developed reasoning and we used an organic NN to do it. And at scale an complexity approaching the human brain we get similar results
I think this is the answer. Just tried this (on free ChatGPT 3.5 though)
Q: A glass door has ‘push’ written on it upside down. Should you push or pull it
A: If the word "push" is written on the glass door upside down, it is likely that the sign is intended for people on the other side of the door. Therefore, if you are approaching the door from the side with the sign, you should pull the door instead of pushing it. However, if there are no other signs or indications on the door or its frame, it may be helpful to observe other people using the door or to try both pushing and pulling to determine the correct method of opening the door.
GPT-4: If the word "push" is written upside down, it might have been a mistake during installation. Regardless of its orientation, the sign still indicates that you should push the door. So, you should try pushing the door first. If it does not open, then you can try pulling it, as it might be an error in labeling the door.
My best guess about this result is mentions of "mirror" often occur around opposites (syntax) in direction words (semantics). Which does sound like a good trick question for these models.