This is what I get out of Vicuna-13b: Human: A glass door has 'push' written on ...

jonplackett · on April 9, 2023

The difference for me with GPT-4 is its ‘understanding’ of the scene and its explanation of WHY you should push or pull.

It talks an out a door with people approaching from different directions. It has some idea of what those people would be thinking.

That seems different to just ‘mirror writing means do the opposite’.

blatant303 · on April 10, 2023

I asked GPT4 to draw a dog or a skull in openscad and even though the end result was buggy, commenting things in the code here and there and making some volumes transparent I figured out he got it okay. For instance the dog had two eyes two ears one long nose (potatoids). It understood the symmetry of both pairs but was unable to place them at the right place. It's not like it was just misaligned, things were in the wrong planes, but they where there. As if he hadn't understood a face is like one face of a cube.

j5155 · on April 10, 2023

I think things like this (or simpler things like asking ChatGPT for ascii art of a circle) really show the difference between LLMs and humans. The issue is that it’s a language model rather then an image one, so it doesn’t understand the concept of ‘looks like a dog’.

alpaca128 · on April 10, 2023

Image models don't understand it either, they only know the typical "look" of something but not the correct proportions or number of parts. If you have the word "wheel" in the prompt they might turn every circle-like shape in the image into a car wheel because it cannot selectively apply parts of the prompt to parts of the image.

At least the few models I tinkered with all had this issue, and without some additional guidance that understands scene composition and anatomy/proportions in three dimensions this probably won't fundamentally improve.

blatant303 · on April 10, 2023

I got it to extrude a cylinder into a sinusoidal, guiding it by feeding it back screenshots of the scene converted to ascii.

FLT8 · on April 9, 2023

Maybe, but consider the post above where GPT4 gets confused about the blind man on the other side of the door, while Vicuna-13b seems to figure it out. I accept that GPT4 gave a better answer in this case, but its level of understanding about the scene under different scenarios still seems limited.

lostmsu · on April 12, 2023

There's also a comment where GPT4 was able to answer the question correctly. Seems like there's some kind of statistics in play.