Human: A glass door has 'push' written on it in mirror writing. To open the door should you 'push' or 'pull' it?
Assistant: Since the word "push" is written in mirror writing on the glass door, you should actually "pull" the door open instead of "push" it. Mirror writing is a writing method where the characters are reversed, so when you see the word "push" written in mirror writing, it is actually "pull" in the normal writing orientation.
I asked GPT4 to draw a dog or a skull in openscad and even though the end result was buggy, commenting things in the code here and there and making some volumes transparent I figured out he got it okay. For instance the dog had two eyes two ears one long nose (potatoids). It understood the symmetry of both pairs but was unable to place them at the right place. It's not like it was just misaligned, things were in the wrong planes, but they where there. As if he hadn't understood a face is like one face of a cube.
I think things like this (or simpler things like asking ChatGPT for ascii art of a circle) really show the difference between LLMs and humans. The issue is that it’s a language model rather then an image one, so it doesn’t understand the concept of ‘looks like a dog’.
Image models don't understand it either, they only know the typical "look" of something but not the correct proportions or number of parts. If you have the word "wheel" in the prompt they might turn every circle-like shape in the image into a car wheel because it cannot selectively apply parts of the prompt to parts of the image.
At least the few models I tinkered with all had this issue, and without some additional guidance that understands scene composition and anatomy/proportions in three dimensions this probably won't fundamentally improve.
Maybe, but consider the post above where GPT4 gets confused about the blind man on the other side of the door, while Vicuna-13b seems to figure it out. I accept that GPT4 gave a better answer in this case, but its level of understanding about the scene under different scenarios still seems limited.
Human: A glass door has 'push' written on it in mirror writing. To open the door should you 'push' or 'pull' it?
Assistant: Since the word "push" is written in mirror writing on the glass door, you should actually "pull" the door open instead of "push" it. Mirror writing is a writing method where the characters are reversed, so when you see the word "push" written in mirror writing, it is actually "pull" in the normal writing orientation.