Maybe it's sarcasm, but the author seems to downplay these images. Are we really that jaded? This seems like criticizing a dog that learned to speak only basic english...the fact that any of this is even possible is stunning.
Could not agree more. We keep moving the goalposts for what constitutes intelligence, perhaps that is a net good, but I can't help feeling that we are taking incredible progress for granted. Both DALL-E and large language models (GPT-3, PaLM) demonstrate abilities that the majority of people doubted would ever be possible by a computer 20 years ago.
Imagine telling someone in 2002 that they can write a description of a dog on their computer, send it over the internet and a server on the other side will return a novel, generated, near-perfect picture of that dog! Oh - sorry, one caveat, the legs might be a little long!
True if you would have asked a nerd. If you had asked a person on the street, they would have answered, “Why, of course. Can’t computers do more complicated stuff already today?”
This isn’t true, the quality of images generated by DALL-E are really good, but they are an incremental improvement and based on a long chain of prior work. See e.g. https://github.com/CompVis/latent-diffusion
Also Make-A-Scene, which in some ways is noticeably better than DALL-E 2 (faces, editing & control of layout through semantic segmentation conditioning): https://arxiv.org/abs/2203.13131#facebook
Just about all the AI achievements are really impressive in the talking dog quality as you say.
The thing is that current has been producing more and more things that stay at the talking dog level (as well as other definitely useful things, yes). So the impressiveness and the limits are both worth mentioning.
I don’t think people are jaded. I just think it’s going to take a while for these types of generated images to pass the Turing test, and that’s why it doesn’t feel that impressive yet. It’s very clear they’re made by an AI. Which isn’t bad. It’s just obvious in a way that isn’t fooling anyone. There’s a very specific way that AI generates things like hands, other limbs, eyes, facial features etc. Whoever figures out how to fix that, and then fix the “fractal” like style that’s a hallmark of AI generated images will win the AI race. Maybe it’ll be openai. Maybe it’ll be someone else.
The images already look better than what like 99.9% of humans would be able to produce and it produces them orders of magnitudes faster than what any human could ever hope to produce, even when equipped with Photoshop, 3d software and Google image search.
The only real problem with them is that they are conceptually simple. It's always just a subject in the center, a setting and some modifiers. It's doesn't produce complex Renaissance paintings with dozens of characters or a graphic novel telling a story. But that seems more an issue of the short text descriptions it gets, than a fundamental limit in the AI.
As for the AI-typical image artifacts, I don't see that as an issue. Think of it like upscaling an image. If you upscale it too much, you'll see the pixels. In this case you see the AI struggling to produce the necessary level of detail. Scale the image down a bit an all the artifacts go away. It's really no different then taking a real painting done by a human and getting close enough to see the brush strokes. The illusion will fall apart just the same.
Given this is a LessWrong article, my default assumption would be "author is being matter-of-fact while talking about a subject they're ridiculously excited / concerned about".