Hacker News new | past | comments | ask | show | jobs | submit login
DALL-E 2 generates images of Kermit The Frog in various films (twitter.com/hvnslstangel)
390 points by jayalammar on June 3, 2022 | hide | past | favorite | 192 comments



How is this so good? If you told me this was done by a very talented human, I’d still be surprised at how good they are. That must count as a sort of Turing test, no?


DALL-E is indeed superhuman in its ability to create images from simple prompts.

This shows the limits of the Turing test. To pass it a program must not only be smart enough, it must be dumb enough too.

Pulling what DALL-E does is a tell-tale sign it’s most likely not human, and would make it fail the test.


Well, most of the pictures (if not all), while astoundingly good as an idea, have the tell-tale signs of being AI-generated, the kinds of mistakes a human would never make.

For example, looking at the WALL-E one [0], you can clearly see that the hands and feet aren't actually separated properly. There is also plenty of missing "logic" around the armpits. These are the kinds of mistakes a human can't make - especially one that is so adept at drawing the other parts so perfectly.

[0] https://twitter.com/HvnsLstAngel/status/1531512163738669057/...


How long does it take to generate these pics? No human is that good at producing art of this quality.

There may still be a few anatomic mistakes, but many artists also make some. The picture quality, lightning, and the way it capture the graphic essence and mood of these movies is just amazing and beyond what even the most talented artists can pull out in the same time frame.


2021: This machine fails the Turing test, it's too dumb. Look at how crappy this generated art is.

2022: This machine fails the Turing test, it's way too smart! No human could be this good at creating art.


The Kermit Test.


Love this name. ;-)


I'd guess these are curated, but I have no idea how many tries were discarded, as I'm still waiting for the invitation. Even in the best examples you can find fused fingers, deformities, melting, and slight errors that a human artist would need to touch up before it truly passes as a human work.


The amount of creativity here is astounding. Just imagine all the decisions the AI made in incorporating Kermit into the movies: the clothing it's chosen, how the character wears the clothing, the facial expressions, how to make Kermit himself look similar to the other movies characters. Should he be lanky? pudgy? Even simple decisions like obliviously Kermit in Wall-E is going to be a robot, it has to figure out what he looks like as a robot, what his mouth looks like, that his eyes should be enlarged.

It gets a lot of things wrong, like I'm not sure why kermit has a plastic texture in many of the pictures. If you showed me ten pictures of Kermit and ten frames of total recall, and for some reason 8 of your pictures had a plastic Kermit, and asked me to combine them in my head, I'd probably imagine something on-par or worse than what Dalle has managed to do. But I wouldn't be able to show anyone what I'd made!


Sorry to be this guy but that is not creativity. It’s using what already exists, not conjecturing something new.

Contrast with real creativity (what people can do but machines currently cannot) where you conjecture something completely new.

For example, Copernicus conjecturing the idea that the Earth revolves around the Sun. No machine learning model would have gotten there because it would have been trained on a bunch of data that said the Earth was the center of the universe.


These are fun discussions because words like "artistic creativity" have a colloquial meaning that could only apply to humans since the dawn of humanity. Now you have an image of Kermit in Wall-E. I have never seen or conceived of an image of Kermit in Wall-E. Let's assume that adorable robot Kermits do not exist in the training data to be spit out like a search algorithm.

The image is new, it did not previously exist. It is a creation, a very vague idea of a few words that was created in full realization.

So it sees like the only difference between the "Not creativity" that Dall-E is doing and "Real Creativity" that humans do is tht humans are the ones doing it?

I agree there's this concept of expanding the frontiers of human aesthetic capability that has slow-marched from cave paintings till post-modernism. That there are a very few artists that invent completely new styles that the rest of us copy and remix. It's questionable that Dall-E can do that, but I'm also not sure that it can't do that.


>So it sees like the only difference between the "Not creativity" that Dall-E is doing and "Real Creativity" that humans do is tht humans are the ones doing it?

The differentiator is whether the result is worthy to look at for humans, that's all.

In case of the OP, you forget that the human had to predict that the combination of two would be interesting for other humans, and then construct the prompt, possibly selecting the best pictures. That's who did most of the work here, and it was effortless for the neural network of a human. Could DALL-E analyze the world autonomously without human intervention at all? No, it's an open loop system.

Novelty hinges on the ability to conceptualize things, not the execution. Sure, DALL-E 2 shows a glimpse of conceptualization internally as it works with compressed descriptions (concepts, abstractions) of things it draws. But it's super limited and not flexible enough to create new ideas, it doesn't have either short-term or long-term memory, it doesn't change, it has all knowledge about Kermit and Blade Runner pre-baked, and so on. You have to re-train it from scratch every time you want it to remember something truly new, there's no feedback loop to do that. Human ability is still much more powerful.

DALL-E 2 is almost at the point where it can supplement the human conceptualization with AI execution, though. Possibly in a couple iterations it will be there, with more believable results. In very limited cases, of course - as it's temporally unstable (it makes a totally new image each time), cannot correct the output from new details provided by a human (like a professional concept artist could), etc.


> you forget that the human had to predict that the combination of two would be interesting for other humans, and then construct the prompt, possibly selecting the best pictures. That's who did most of the work here

If the Twitter user claimed that the text prompts themselves were generated by asking GPT-3 for "An interesting sequence of text prompts to feed an image generation AI" or something like that, I would have believed them.

Presumptively, I imagine it's harder to create a model that generates images matching a certain human-language input prompt than to create an image generation model with no language component and have it pick its own scenarios internally. I don't think the former is done to palm off "most of the work" to humans, but rather because people want an easy way to see create own ideas so there's more demand for it.

> You have to re-train it from scratch every time you want it to remember something truly new, there's no feedback loop to do that.

As far as I'm aware, this isn't true. Deep learning is perfectly compatible with fine-tuning an existing model using new data. OpenAI/MS have been doing this with Codex to improve it based on Copilot telemetry and code from new languages/libraries.


Humans don't generate new ideas from nothing. Your "real creativity" doesn't exist. Everything is dependent on what came before, and therefore derivative to some extent.

Copernicus got his idea after gathering a lot of data, explicitly and implicitly, training his internal model of the world.


Even when it may appear that a human brain "generated something from nothing" (that is: you completely fail to account from where it could have possibly derived its output), you can always fall back to "genetic memory". Even a newborn has a lot going on that was "pre-programmed" by genetics, like a form of hardcoded training. :p


The scientific and creative methods are both entirely centered around evaluating data or existing creative elements and combining it in novel ways. Science adds the goal of making and testing hypotheses but nothing about Copernicus's conjecture would have been possible without observing existing information. I'm including e.g. "new" astronomical observations in the realm of existing information because even a human is simply observing and measuring it.

I fail to see what the difference is between 99% of existing "creativity," which is essentially arranging existing ideas into novel combinations, and what DALLE2 does.


"Good artists borrow, great artists steal"

Creativity is a very vague word, I'm sure we can come up with definitions of it that let humans keep sole domain over it. But breakthroughs often come from combining domains and concepts, very very rarely do we ever jump out of one local maxima into another, and I'm not even convinced that Copernicus counts as that. There's a reason why there are so many examples of the same breakthrough happening in multiple places in the world independently - innovation is a slow gradual collaborative process and not plateaus waiting for men of genius to have a spark of inspiration.

Also I'm not convinced that a computer couldn't have discovered the earth revolves around the sun - it's hard to make machine learning jump out of local maxima, but it does happen, and I can see some hidden layers becoming far more efficient at predicting outcomes by stumbling across a model that centered the sun. That being said - there likely are examples of things that computers couldn't have theoretically figured out the model for, but I'm hard pressed to think of one.


That is actually a misquote from people who stole.


That's great.


This is a common fallacy.

Call it moving goalposts, no true Scottsman, the AI effect, whatever. The behavior is as follows: an argument over whether an ill-defined attribute is possessed by a computer is defended or attacked with useless semantics since nobody can agree on what any of the words mean anyway.

Creativity, intelligence, consciousness.

It doesn't matter what you say, you cannot define these concepts with the same clarity you use to defend that the concept is missing.

Saying there is no creativity because its just a neural net extrapolating from data is like saying there's no god because its all just atoms: what is god and why would the existence of atoms have anything to do with it.

Learn from Wittgenstein: Worüber man nicht sprechen kann, darüber muss man schweigen.


This single quote from Wittgenstein might just be the dumbest statement ever made. How would we ever make progress if we did not discuss what we don't understand, and critique and debate new creative ideas?


By not taking the quote literally, but trying to understand it in its context.

The quote is a judgement about what can be logically achieved through discourse, and what cannot (and should not). He draws this boundary to separate the things we can reason about and the things that fall beyond reason and so cannot be reasoned about at all. If you cannot reason about something, you should not bring it up in (philosophical) discourse.

He does this to settle philosophical debates around, for instance, mystical subjects that were fought over to exist or not exist. The conclusion for him is: we do not have the logical tools and concepts to reason about the problem, so we should stop wasting time talking about it.

It's not about lack of knowledge or intelligence, its about acknowledging the limits of reason and discourse, and forgoing any conclusions made beyond those limits. It's a pursuit of truth.

At least, this is how I understand it, I'm not a philosopher.


It’s not controversial to say we have to stick to reason and logic in the pursuit of truth.

But there is nothing about understanding the difference between human and machine intelligence that is forbidden by any reason or logic that I am aware of.

Certainly there is a lot we don’t know about that topic. But unless that knowledge is specifically blocked by logic or the laws of physics, it is knowable. The only thing missing is the knowledge required to understand it.

The only source of knowledge is conjecture and criticism, and discussion with others is a great tool for that process. The more the better.

So unless you believe we know all there is to know about a topic right now, there is always the possibility that some new knowledge could help us get closer to the objective truth about it. And it’s guaranteed that new knowledge will come from creative new ideas and criticism of existing ideas.

Saying something is unknowable is akin to believing in a mysterious god creator, or fairies. But more importantly, it is a way to stop or limit the progress of our understanding of the universe. It’s like when parents say “because I said so”. It shuts down discourse and understanding and leads to closed societies.


You're entirely misunderstanding me and the quote, as they both agree and align with what you're saying completely.

Again, the quote is not about avoiding or ignoring valid logical discourse, criticism, hypotheses, or any other logically coherent verbalization. The quote says that if the words are ambiguous, don't use them in logical arguments. Meaningless here does not mean hypothetical or conjectural or wrong, but simply ill-defined. This interpretation leaves plenty of room for everything you have described and are afraid would be criticized by the quote.

So, to get back to why I brought it up: creativity is an ill-defined concept in the context of logical reasoning, and so should not be used in logical reasoning as if it was an attribute that we can empirically or theoretically judge any entity to be in possession of.

This is why all replies to the above comment get lost in semantics: what is creativity? This is ill-defined, and hence based on your personal preference you may arrive freely at any conclusion you wish. Hence using the word "creativity" during a logical argument or truthful statement is guaranteed to produce conflict when multiple interpreters are present. Multiple logical interpreters can never be in conflict about purely logical reasoning, this is why we can have math.

Again, this is NOT saying that anything is unknowable, it is simply saying that whatever creativity is, or is composed of, is not clearly defined. This means it cannot be knowable or unknowable until it is clearly defined, and specifically only if done so in a rigorous enough definition to use it during reasoning.

To get back to your last point: doing what I described above is specifically to further the progress of our understanding of the universe, and specifically to attain new knowledge and understanding, because it cuts down on the discourse that does not contribute to that goal. It's the difference between arguing about whether machines have creativity and building DALL-E.



Artist is using ideas already existed and creates something new out of them. It has always been this way since cave wall art.

I am 100% sure Copernicus was not the first to suggest a heliocentric system, but he was the one who put enough energy into proving it and defend that theory.


>Sorry to be this guy but that is not creativity. It’s using what already exists, not conjecturing something new.

Such a cute point of view, completely wrong but cute. Please go find the original images of Kermit in Blade Runner and WallE that were just copied here.

>For example, Copernicus conjecturing the idea that the Earth revolves around the Sun. No machine learning model would have gotten there because it would have been trained on a bunch of data that said the Earth was the center of the universe.

If the model were trained on actual observations of planetary trajectories it would trivially recreate keplers laws, newtons laws etc.


The smugness of your reply annoyed me. But I feel somewhat satisfied that you’re just misinterpreting what I said.

I never said copying existing images, I said it’s using what already exists for inspiration. That is not the same as creativity.

I want to see what Dall-E comes up with when you ask it to create something new. Maybe “a new type of animal” or “an apartment building made to withstand radiation”. Basically anything that it can’t use existing images and ideas to create. Hell, I’d like to see someone ask GPT-3 for a prompt that Dall-E would fail at and pit them against each other.

The point is a human could come up with those trivially. I think this system would struggle. Because it’s not capable of creating anything new, only combining things that already exist.


>The point is a human could come up with those trivially. I think this system would struggle. Because it’s not capable of creating anything new, only combining things that already exist.

This is the flaw in your thinking. Humans are doing the same thing, I don't know why some people think the brain is so magical. We live in a materialist reality and to think the brain is anything other than a complex computer is farcical.


>If the model were trained on actual observations of planetary trajectories it would trivially recreate keplers laws, newtons laws etc.

Should be easy to prove.



from your link:

"Renner stresses that although the algorithm derived the formulae, a human eye is needed to interpret the equations and understand how they relate to the movement of planets around the Sun."


Even if that had been the only example rather than just one among them, it would still demonstrate the claim that an AI can discover various laws of physics just from observing data.


> like I'm not sure why kermit has a plastic texture in many of the pictures

That might be the known bug with low-resolution textures: the DALL-E 2 paper notes that the details in very complex scenes can be bad, and thinks it's because you start with a 64px image which is necessarily bad for details (64px is really small!) and upscale with dumber models from there https://cdn.openai.com/papers/dall-e-2.pdf#page=17 I think this explains the issues with images where the 'skin' or 'fur' looks really creepy (eg. all the semi-nude bears).


Loved the David Lynch ones. I'm disappointed the image it generated for Eraserhead[1] didn't have Kermit as the baby[2]. I'm curious as to what it would generate for "Kermit the Frog in David Lynch's Dune".

[1]: https://twitter.com/HvnsLstAngel/status/1531774195234791424?...

[2]: https://duckduckgo.com/?t=ffab&q=eraserhead+baby&iax=images&...




> Kermit the Frog in David Lynch's Dune

DALL-E would just shoot back a still with Kyle McLachlan in it. He's already so Kermit like!


Does that mean the developers put The Big Lebowski through the model during training? Where did they get all the movies and TV shows from? And does copying so directly from the source material open them and users up to copyright infringement liability?


I was assuming it was just movie posters and screenshots from online articles/reviews/imdb etc, and not any analysis of the video itself (I might be wrong though - using video would make the number of available input frames grow by orders of magnitude, not that there is a lack of pictures online).

E.g. The Shining picture of Jack Nicholson with the door isn't representative of the "look" of the film, but very much an iconic still frame and basically what you see in a Google image search for "The Shining".


They probably did, but even if they didn't it may have come from the terrabytes of data they scraped from the internet. OpenAI doesn't care. They claim that it's derivative enough to go under fair use. And whether it is or isn't, I guess their calculation is that the risk is worth taking to be the first to develop these algorithms, which is a huge head start if the courts decide that it does count as fair use.


Hilarious parodies like these are copyright infringement, yes, but also open-and-shut fair use defense. (You're confusing the issue of transformativeness and fair use defense.)


I'm imagining the defense will be: "Your honor, it can't be infringement because I ran it through an ML model first."

Especially if the movie(s) that are eventually generated this way are ripping whole scenes or sequences out of other films, a la copilot.


I feel like if they don’t have licenses for all of their source material, then the model should be required to be released into the public domain.

It’s like extremely expensive piracy that is bad for artists and bad for the environment.

I wonder if the reason OpenAI, Google, etc don’t release these things isn’t so much that they’re worried about racist/offensive output, but instead they’re worried about people using it to create images of, say, Mickey Mouse and drawing the attention of his lawyers? It’d be better for AI companies to keep all of this stuff in a legal gray area for as long as possible.


Being bad for artists and the environment is not illegal.

If you look at a movie poster, your brain does not need to be released in the public domain. Even if you sketch it from memory.


> Being bad for artists and the environment is not illegal.

Yet we have copyright laws and environmental protection laws to protect both.

> If you look at a movie poster, your brain does not need to be released in the public domain. Even if you sketch it from memory.

Because I’m not a machine. I’m contrained by physics, whereas these models are not.

Copyright laws were made to protect artists from IP theft. If I make a sculpture, it’s not trivial to copy that and steal from me, so creating copyright laws to protect sculptors would be a hard sell.

But a painting, a book, a song, etc are easy to steal, especially with technology. Copying and selling someone else’s painting is similar to copying and selling someone else’s sculpture, yet the scale of theft is obviously different (mass producing sculptures is much harder than making unlimited copies of an mp3)

These AI models are a new type of theft, and likewise need new types of legal protections for artists.


> Yet we have copyright laws and environmental protection laws to protect both.

Yes, but we don't have copyright laws to protect the environment. So harm to the environment is not a copyright argument.

> Because I’m not a machine. I’m contrained by physics, whereas these models are not.

I don't even know what that means.

> But a painting, a book, a song, etc are easy to steal, especially with technology.

The AI does not merely duplicate training samples. That said, effort is also unrelated to copyright.


> hese AI models are a new type of theft, and likewise need new types of legal protections for artists.

Meh. This means that most artists will also struggle to make something as good as these models and will have to find something else to do for a living. Its not the first time in history that an occupation is obsoleted. Projectionnists in cinemas have mostly dissapeared.


If these models are trained on human artists, what’s going to happen if humans stop making original art? It’s obvious from the OP that even a single film is enough to add a new stylistic capability to Dall E, so even though it might have been trained on hundreds of years of human artwork, a single modern piece can still influence significantly.

> This means that most artists will also struggle to make something as good as these models

This makes me think that you’re lacking a fundamental understanding of what art is. The “demand” for art is never going to disappear, but these models are simply going to disrupt the supply by flooding the market with derivative works at a massive scale.

The internet posed a similar threat back when it first came out, but laws were changed to combat that (e.g. the DMCA). I fail to see how this situation is any different?

A country that doesn’t get on top of this issue will doom itself to cultural irrelevance in the long term, IMO.


I would hope that the spirit of the law would be considered in this. This is a clear application of fair use. Are the owners of the IP losing money on letting someone generate new characters in their movie/show?


> Are the owners of the IP losing money on letting someone generate new characters in their movie/show?

Typically, creators are very protective of this sort of thing, unless it stays in the area of fan art. If anyone tries to seriously monetize this kind of output, I'm sure we'll see a lot of cases.

Imagine what Disney would do if you used DALL-E to create an animated feature film in the style of Mickey Mouse, but with cats instead of mice, and they found out you used actual footage from, say, Fantasia to train an AI model. No idea if they would win, but I'm certain they'd sue.


Wouldn't the images be a transformative work? But then again there were those recent music cases where the infringement was only a small number of notes in sequence. This will almost inevitably end up in court because I'm not sure there is a comparable case.


Dall-E still blows my mind every time. To someone who is plugged into these things, how is this possible?

I understand computers and I understand back-propagation but this... it feels like magic to me.

Can someone indulge me in a short explanation of how this works and how is it this good?



More eye candy here: https://np.reddit.com/r/dalle2/top/?sort=top&t=year

Very good one: https://np.reddit.com/r/dalle2/comments/u5kkty/a_fluffy_baby...

“a masterful impressionist portrait painting of a little doggey who is worried he may not be a good boy”: https://nitter.net/MarkRich388/status/1532482006809866240


These are mind bogglingly good


If you ran it 1000 times and picked the best, you might get all good ones. I would want to see all 1000. It's like stock picker ads (person who called the market says XXX) where you only show the lucky ones.


Are you aware that this is how human artists work, too? They make countless works that aren't comparable to their top works. Even more so if you count the practice when they were just starting out. I think some people just refuse to believe that real art can be generated without humans and selectively look for things that confirm their pre-determined conclusion. Witnessing this always feels like witnessing someone before the wheel was invented, when things were rolled and logs, assuring everyone that technology will never beat manual labor because look at how cumbersome it is to work with logs.


There is a crucial difference: the human artist does this choosing themselves.

If DALL-E just spits out 1000 images and then a human goes through them and picks the best 2-3, and those are good - it's impressive, but the human was still a crucial part of the process. On the other hand, if DALL-E were to generate 10 billion images, and choose the best 2-3 itself and give those as output, and if at least one of those 2-3 would be consistently great, then DALL-E could be indeed considered to be creating (good) art.


> On the other hand, if DALL-E were to generate 10 billion images, and choose the best 2-3 itself and give those as output, and if at least one of those 2-3 would be consistently great, then DALL-E could be indeed considered to be creating (good) art.

It's worth noting that the OpenAI samples for DALL-E 1 used CLIP to rank generated samples, and got a big boost from that. For many model architectures, you can run them in reverse to do 'image -> caption', and 'score the caption' quality: if 'the caption is bad', that indicates your image was screwed up and low-quality (introduced by Cogview). DALL-E 2 doesn't use either approach, or finetuning on user choices like InstructGPT, and I dunno if OA is going to implement any of these, but there is a wide universe of techniques applicable here to improve quality and we should keep that in mind (https://www.gwern.net/Forking-Paths) if we are going to make any assertions more sweeping than "this specific model, at this very instant, with this particular interface, is only at this level of quality".


> the human artist does this choosing themselves.

I disagree. The human artist's tastes at least partially originate in other people, both individuals and general societies/cultures, and oftentimes the artist directly incorporates feedback into future work. Are you aware that students in art school, music conservatories, etc constantly get feedback from instructors and peers? I reject your premise entirely unless you can give me an example of a human that created art without ever having been influenced as a human being by any other human being. Otherwise I believe it's just what I said before: concluding first that AI can't create art and finding reasons second.


Does DALL-E incorporate (or even receive) feedback about which of the pictures it generated were better? It of course does not, and it currently has no function to do so. IF it incorporated this feedback and changed its weights based on it, I would agree with you that the situation could be comparable.

Until then, my point remains: DALL-E is currently like an (extraordinarily good) hat that you can put words in and extract phrases out of. A human chooses what words to put in and which of the phrases they take out are better. Unlike pulling words out of a hat, the network has some criteria by which it produces phrases, but that's not enough to call it an artist.

This is not meant to minimize how good the achievement of this network is. The level of fidelity and even understanding of the prompts is extraordinary. But its purpose is not to be creative, it is to find a point on a hyperplane that matches the input it received. It is currently at the level of a tool - though there are potential advancements that could yet turn it into an artist in its own right.


We're basically just arguing semantics at this point. You define art/artist differently than I do. We won't make further progress with this discussion, but regardless it's great to see that technology is coming so far


Who's to say that Dall-E 3.0 won't do exactly that? This is not the peak of AI art generation and it will only continue to improve.


Sure, it might. But until then, I wouldn't say it makes sense to consider the AI as "being the artist".

When DALL-E x.0 does that, and when it also generates similar quality from much higher-level prompts ("paint a sad picture", or "social commentary on BLM" or something like this, instead of a description of what the picture should show and in what style), then I for one will be in complete agreement that it's indeed an artist in itself.

Personally, I don't expect this to happen in the next few decades, as I don't think the current approaches are very promising for the type of intelligence that you would need to actually do this type of reasoning, but that remains to be seen, and I am fully confident that it will happen some day.


>paint a sad picture

Fairly certain you can get results using this prompt with no issues.

Dalle 1.0 -> 2.0 was a shocking improvement, I expect the jump to 3.0 to be equally jarring if not more so


It doesn’t matter if these images were one in 10,000, the facts that any exist that are this good is crazy.


Yeah exactly. The fact that a person who could never create such a picture on their own, now just has to go through a bunch of images and select one to get this result is already amazing. The goal posts keep shifting, glass half empty.


It reminds me of pop articles about the P=NP question which simplify it down to, "is being able to recognize creativity the same as being creative?" Turns out for practical purposes, yes (the simplification question not the mathematical question), as long as you can offload the work to an AI on a powerful compute cluster!


I bet I could give a phrase to a collection of art students in some university and DALL-E, and the human art would likely be more creative than a single run of DALL-E. What distinguishes the humans is that you can tell them to make art of their own choosing and they will, but DALL-E is unlikely to create anything interesting with no input.

Van Gogh invented Starry Night without any prompting despite it not being a real scene (much less anything he had ever seen and such abstraction was very rare in the 1880s). Picasso made Les Demoiselles d’Avignon in 1907; it was so radical even his fellow artists were unable to comprehend it.

It doesn't change the fact that DALL-E is pretty amazing tech, but it's still as far behind human ability as any AI is today. It is way way better than what came before, but that's true of most technologies.


It's amazing and wonderful, but it's not the AI that's "creating art", it's the human choosing the best AI output that's doing the artistic part (often togehter with the person who input the prompt to the AI).

It's just like Dada poets did 100 years ago: you use a mechanical process to generate quasi-random output, and then you choose some of this output to present to other humans. The way you provide input to the mechanical process (e.g. what words you choose to put in the hat) and the curation are the real creative part of the art, not the mechanical process generating the text itself.


And in reality it's up to the user to pick one 1 of 8 for Dalle-2, usually they're all at least decent.


Agreed that people should pay attention to cherry-picking of model outputs.

For this one in particular, here are a few more results for Battlestar and The Office:

https://twitter.com/Miles_Brundage/status/153247388947686195...


Imagine a crowd sourced metadataset of which model output is "good". What happens when we try to incorporate that knowledge into the pre-existing model? Will it learn to generate more "good"looking output?


Yes. This is already done in research and commercially (https://openai.com/blog/instruction-following/).


Yes, and it still feels a lot like Searle's Chinese Room. It's as if it skips a dozen steps. Well, that's exactly what happens, of course. But it does show that the network can match linguistic descriptions to images extraordinarily well.


Do you find the Chinese Room argument convincing?

Do you feel that the human mind is more than an "appropriately" trained "biological" neural network?

What do you consider the limits of a DALL-E like system compared to a "true" mind?

My personal opinion is that the Chinese Room argument is fancy handwaving that crucially relies on never being explicit about what it means by "understanding", combined with an appeal to intuition.

I strongly believe that there is nothing "magical" about the human mind or brain (that could not be replicated artificially), and thus that a comparably trained, appropriately designed system ("DALL-E successor") OR a copy OR a simulation of a human brain would be all just as capable and "understanding"/"conscious" as another human...


> My personal opinion is that the Chinese Room argument is fancy handwaving that crucially relies on never being explicit about what it means by "understanding", combined with an appeal to intuition.

bingo


There doesn't need to be anything "magical" about the human mind for human "understanding" not to be particularly closely approximated by performing mathematical transformations on a huge corpus of raster data, even though the latter approach produces very useful results (see also the utility of a pocket calculator vs attempting to skip a billenium or so of natural evolution to grow an organism motivated to learn multiplication in a petri dish, or parse human minds to figure out the elements of brain state associated with performing a calculation)

GPT-3 is far, far better at generating descriptions of loving humanity than the average dog. But it's pretty obvious that the dog's goal formation and excitement hormones aren't particularly similar to letter-by-letter ASCII output probability calculations on a database, and that GPT-3 has no more grasp of the dog's love of humanity in general and this human in particular than the dog has of Shakespeare. "Thought" and intelligible results are essentially orthogonal, which limits the feasibility of training the former...


> My personal opinion is that the Chinese Room argument is fancy handwaving

It isn't hand-waving. It's against it, really. It's a thought experiment that encourages a sceptic attitude towards jumps in understanding mental processes. The operator in the Chinese Room doesn't understand Chinese. While the translations are excellent, he or she wouldn't be able to go out in the street and ask for a glass of water if their life depended on it. Hence, a computer that mechanically translates Chinese cannot be automatically assumed to understand Chinese.

The argument doesn't need to explain exactly what understanding means. We all (sort of) know what it means. The same goes for e.g. attention. That's what makes it so hard to define what strong AI is and how to verify it. The Turing Test famously tries to decide this (without defining anything, I might add), and the Chinese Room is a good argument against it being the proper test.


> The argument doesn't need to explain exactly what understanding means. We all (sort of) know what it means.

Then what does "understanding" a language mean? Your "asking for a glass of water on the street" example implies to me that you demand a system with: a) capability for internal intent b) ability to express intent in target language to grant it "understanding".

Basically, first you deliberately construct a system not capable of intent (pure stateless query-response algorithm), then you deny that it "understands" Chinese on the basis of being unable to express intent. That does not hold.

Give me a precise definition for what you mean by "understanding" and I'll dismantle the Chinese Room for you.


> Give me a precise definition for what you mean by "understanding" and I'll dismantle the Chinese Room for you.

You probably can't define "chair" in an exact enough sense. It would merely show the shortcoming of the definition rather than refuting the Chinese Room.

> a system not capable of intent

If I now would require a definition of "intent" I would be just as childish.

> That does not hold.

First, the system isn't stateless. It has the operator who has to remember what has happened before in order to come up with a correct translation.

Second, your argument implies that understanding requires intent. That alone is a tall order to prove. But it's the point of the Chinese Room: the mechanism doesn't understand, doesn't have intent, nothing. Yet it performs exactly as an intelligent, understanding, intentional translator. So would you not agree with the conclusion: you can't judge intent by only looking at I/O behavior?

Note that Searle (probably, this is just my understanding) doesn't mean that the Room is useless or "dumb" (in most senses of the word).


The sleight-of-hand in the Chinese room is that Searle asks us whether the man in the room understands Chinese. Of course not. This is like asking whether my CPU knows how to decode h264. The real question is whether the embodied process instantiated by the actions of the man, along with the other involved components in the room, understands Chinese. But the argument doesn't touch this claim.


Kermit in Sopranos did it for me. Hilariously accurate.



There are browser extensions to do this for you. Specifically ‘Redirector’ for FF.


If these pictures are not touched up this is superhuman. My mind hasn’t been this blown in a long time. What a time to be alive.


This is THE technology of the 2020s. It's insane. I'm in absolute awe at how amazing this AI is.


Disagree — GPT-3 type AIs are.

But people who do viral news and posts don't...read. So, their impact will continue to go unnoticed in comparison to DeEpFaKeS and Dall-E.


This is amazing.

We are at the precipice of someone releasing a $100M blockbuster movie just based on the language in the script with zero cost beyond compute.

What will this mean for the future of entertainment…


I suspect at some point in the next 5-15 years we will begin to see AI generated entertainment perfectly tailored to a person's preferences.


I can already envision everyone talking over each other trying to say how their personal AI generated TV show is the best show they have ever seen.

Just imagine how much lonelier the world is going to feel when people don't even have entertainment in common anymore.


We are kind of already there (not having entertainment in common). It is hard to find a TV series or film that everyone in my friendship circle knows equally well -- the last one was Breaking Bad, and that was over 10 years ago. Everyone finds their niche interest I suppose. Its almost the same with music -- the last decade where our musical interests converged were the 90ies.


Interesting point. I would love to see my personal perfect TV show.

But I would also love to see the perfect show for a combination of people, like my loved ones, so we can watch together.

And lastly, I would love to see the perfect show for a large majority of the population, because there is something special about a shared experience like Game of Thrones or Friends. To this day I can shout “pivooooot” and most everyone gets it.


Maybe the loneliness is a good thing? It seems like everyone (in America at least) hates each other, so extreme isolation may be just what we need in this country.


I think it's probable that everyone hates eachother because we're so isolated that we fail to form empathetic bonds with the people around us. More isolation doesn't seem like the solution.


>> I suspect at some point in the next 5-15 years we will begin to see AI generated entertainment perfectly tailored to a person's preferences.

Kermit in Debby Does Dallas. Kermit in the Graduate. Kermit with 2 broken flippers. Oh the depravity. I'm not sure getting high quality visualizations of any random passing thought is a good idea ;-)


When I think about such a possibility, https://c.tenor.com/emURWFXTpvIAAAAC/pearl-jam-do-the-evolut... comes to my mind.


On the one hand, the ability for Star Trek's Holodeck to create large amounts of content from a few terse natural language instructions is look less and less implausible.

On the other hand, I feel like this will ultimately be kinda like traditional procgen algorithms: once you've seen enough of what it produces it all starts feeling very bland and same-y. Sure, the AI may be able to produce a feature-length movie based on the input "What if Nicolas cage had played The Terminator and Aaron Sorkin wrote the script?", but somehow none of it would be surprising or interesting to you, it would lack the novelty and playfulness of a good human creative work, and it likely would be very shallow in its themes.

On the gripping hand though, perhaps in achieving that level of sophistication we inadvertently create something more alive and aware than we intended and instead of merely trying to produce satisfactory results it actually attempts to express itself in ways that resonate with us.


it would lack the novelty and playfulness of a good human creative work, and it likely would be very shallow in its themes.

Simply tune the parameters associated with novelty and playfulness and you’ll get the desired result.

There’s nothing inherent in human creativity that can’t be replicated by an AI. Most creative work is derivative and remix’s prior art.

This is a good short video on the phenomenon of remixing https://m.youtube.com/watch?v=MZ2GuvUWaP8


> There’s nothing inherent in human creativity that can’t be replicated by an AI.

I didn't say otherwise, I'm just skeptical the hypothetical system in question would be sophisticated enough to produce it in faster-than-human time scales.


> Simply tune the parameters associated with novelty and playfulness

Nah, you should be able to add “…but with some novelty and playfulness” to the prompt.


> The Terminator and Aaron Sorkin wrote the script?

You can't handle the future! We live in a world that has time machines, and those time machines have to be manned by robots! Who's gonna do it? You?


Waiting for the next iteration of Copilot to use this technology.

"Like Facebook but like make it not suck"

AI: "Here you go!"


In this thread:

1. People thinking it's amazing (me) 2. People thinking it's not creative enough e.g. "It’s using what already exists, not conjecturing something new." 3. People thinking it's too creative e.g. "This looks nothing like Kermit"


What about ... Kermit in the Muppet Show?


Not to detract from the accomplishment, but none of these are Kermit. The "Kermit" part of the query seems mostly to have accomplished querying for "humanoid frog"


Some of them look a lot like Kermit! But many do not, depending on how far away the target aesthetic is from "muppet." A smarter Dalle2 could do a better job at preserving the "essential Kermit-ness" perhaps. But maybe it's more impressive to adapt to a completely new aesthetic?


That's actually one of the more interesting and impressive things: the AI has substituted in "humanoid frog" and in some cases given it movie-appropriate features like Matrix sunglasses ot Pixar-style eyes which are probably unique to the image rather than simply pasting photographs of the original muppet into movie poster settings which would be an acceptable lower level response to the brief.


Seems like it adapted Kermit's features to fit with the world of the movie, as if he was actually a character from that movie.

I don't have access to DALL-E 2, but I wonder if a prompt like "A cameo from Kermit the Frog in ..." would give more literal Kermits.


No those are clearly Kermit the Frogs


If I hadn't been prompted with the words Kermit, no way would I have guessed that they were supposed to be Kermit. For example, Kermit has a characteristic shape of his pupils. None of the examples have that.


Instead of arguing about "what is" in re: subjective judgements, it can sometimes be useful to phrase things more literally, as in:

"To me those are clearly Kermit the Frogs."

"To me those are clearly not Kermit the Frogs."

Then there's nothing really to argue about. Instead we can discuss what we see and how that affects our subjective perceptions.

For example, Kermit the Frog doesn't have eyelids, but most of these images show a frog with eyelids.


These are all kermit. The identifying caricatural features are all here.


I am finishing “Love, Death + Robots” on Netflix and between the quality of CGI and things like DALL-E and Imagen, movie, design, and illustration industries are in for an upheaval.


Video is probably a whole other ballpark regarding required training and resources.


Is there an easy way to try this model on my computer?


As far as I know, the actual trained system is proprietary, and you can only use it by requesting access to their online system for generating imagery: https://openai.com/dall-e-2/

There are open-source efforts to implement it and make trained models available, but I don't imagine they are yet at the same scale of ingested data / model size as OpenAI's system: https://github.com/lucidrains/DALLE2-pytorch


Not Dalle2 specifically, it's proprietary and there's a waitlist. Older open source alternatives include https://huggingface.co/dalle-mini


I see a distinct lack of "Kermit The Frog in Dragon Ball Z".


Here is the prompt for "A still of Kermit The Frog in Dragon Ball Z (1989)"

https://user-images.githubusercontent.com/1332366/171921054-...


Oh man. Now I want to apply for access just for that. Thanks for the idea.


If this is legit it would be one of the most impressive things I've seen in many years.


It's legit, just variations / iterations.

Here's what I got for "A still of Kermit The Frog in Blade Runner 2049 (2017)":

https://imgur.com/a/y7t3RKx


None of them scream Blade Runner 2049 to me.


In this case its the overall "tone" and color palette. Very uniform diffuse lighting, muted color, etc.


? Dale is a legit project from openAI


I think "legit" in this context might mean more of the Twitter post author's representation of their use of DALL-E 2:

- Were the prompts shown the ones fed to DALL-E 2 or were there more complex details described in the prompt?

- Were these the first images generated for the prompt, or did the author generate many images and cherry-pick the best example, and if so from how many?


The second one is how everyone does it, by the way. AI may as well be a cherry orchard. There’s nothing wrong with it.

The first one, I’d have a huge problem with. Lying about prompts is a no no. Thankfully there’s not much incentive to lie.


That doesn't mean the images in this tweet thread are output from Dalle though…

Although if an individual created all of these then that's about the same amount of impressive


It's funny that 10 years ago "a human did it" would be, by far, the most plausible explanation. Today, it's "AI did it." Put another way, if a human made these then it'd be as shocking as seeing DALL-E 2 in 2012.


Kermit the Frog in Behind The Green Door

Kermit the Frog in Salò, or the 120 Days of Sodom

Kermit the Frog in Pink Flamingos

----

I actually might have Dalle2 access soonish. Honestly this is the best demonstration I've seen that demonstrates to me very well that we are about 2 years away from maybe not "general ai" but some pretty wild shit that is going to make most of what we do and value as humans very different.


Followup: "Big Bird Throughout Cinematic History": https://www.reddit.com/r/dalle2/comments/v4q5rh/big_bird_thr...


When will they get back to me with my Dalle2 access! I need to make a ton of kermit images.


“ All the impressive achievements of deep learning amount to just curve fitting” - Judea Pearl

https://archive.ph/89lqw


If the results are actually impressive, then the dismissive rhetoric doesn't amount to a hill of beans.


Is that Fozzie Bear in the mirror/backdrop of the Stranger Things one? Or am I seeing what I expect to see...


These are so good I am scared.


Good, that is the correct response. The progress in AI recently has been staggering


The Big Lebowski’s one is clearly a guy in a frog costume… hilarious!


Not a single image shows Kermit. Green frogs yes. Kermit no. Fail.


A lot of art-adjacent jobs are about to become toast.


Kermit in Eraserhead is everything I expected it to be.


Looking better than those pixel art NFTs.


The Matrix version looks like TMNT. :)


"Various films." I hope you're not thinking what I think you're thinking.


You can bet your life savings that OpenAI could generate some really interesting Kermit porn.


man look at this amazing curve fitting from training samples


man all this is is a fancy Photoshop. i bet you could find all of these in Google Images or something.


None of those look like Kermit the Frog.


I don’t understand how this isn’t infringing some copyright. Anyone do any research on this?

If we accept that a model trained on copyright material does not infringe on the materials rights, then circumventing all copyright can be as simple as creating a sufficiently close derivative and giving it away.

Not to say that copyright is good to begin with.


Everything that you, a person with a paintbrush, could paint "in the style of" something else is informed by what your model (your brain) has been exposed to. There's no getting around that, and commissioning you to paint something "in the style of postwar authoritarian England" does not infringe the copyright of V for Vendetta (even if I told you "make it look just like the movie"); it's an original painting.

Stylistic inspiration is not an infringement of copyright, in either that case or the "do it on a computer" case here.

The Kermit the Frog aspect though is interesting - it applies equally for both the human and machine made works - if an argument could be made that the subject of the work sufficiently resembles the character, maybe there's a trademark issue at hand?

But in any scenario, nothing legally novel about the work being created by machine.


> But in any scenario, nothing legally novel about the work being created by machine.

…except for the fact that it was created by a machine.

Just like copyright law had to be revised to deal with software and the internet, it will need to be revised to deal with AI.


> …except for the fact that it was created by a machine.

We've already had this for years. The photos you take on any modern smart phone are partially the invention of AI (doubly so if you use something like portrait or night mode). It's not just raw CCD output, and yet, the photographer retains the copyright.

DALL-E's terms could require users to assign copyright. If not, I don't see any reason it wouldn't go to the person who came up with the prompt and picked one particular generation.

If I take a picture of a mountain with a camera, neither the mountain nor the camera hold copyright. DALL-E's just another tool in the toolbox.


That’s obviously very different. The contributions of the AI processing my phone camera does is not significant, and can be substituted relatively easily with Photoshop or similar software. It’s just a convenience feature.

Dall-E is the complete opposite. My contribution is insignificant (a prompt), whereas Dall-Es is 99% of the work.

> If I take a picture of a mountain with a camera, neither the mountain nor the camera hold copyright. DALL-E's just another tool in the toolbox.

Not sure how this contradicts my point. Nobody “owns” that mountain, so nobody could call your photo of it a derivative work.

If you take a photo of a frame of a movie, then who owns the rights to that photo? It’s not a raw frame, it’s a photo of a screen. Does that mean you didn’t just steal someone’s IP?

These questions need answers, and copyright laws try to answer them in as fair a way as possible. Dall E raises new questions, and thus we need to update copyright laws with new answers.

My personal view is that obtaining licensing for every single work used to train the model is impossible. So instead of simply making this type of AI research illegal, the fairest solution is to put that model into the public domain.

There are still financial incentives to perform this research, as the discoveries could be used later to train a commercial model with proper licensing.


> My contribution is insignificant (a prompt), whereas Dall-Es is 99% of the work.

I disagree entirely with this. Picking a prompt and potentially going through dozens, hundreds, or thousands of variations and re-generations has artistic value.

Just like the photograph of a mountain. Nature did most of the work, but a human selecting an angle, lighting, etc. from the nearly infinite combinations available matters.

We've also established in court that monkeys can't hold copyright (https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...), and they're quite a bit more sentient than DALL-E.


>circumventing all copyright can be as simple as creating a sufficiently close derivative and giving it away.

That's correct. People do this all the time, sans the giving it away part.

Also, there is no way that you can argue these images are not transformative.


I shouldn’t clarify - I’m not talking about the Kermit images, I’m talking about Dalle2 itself. If you had it render Neo from the Matrix, albeit imperfectly, would that be transformative use?


It probably depends on how close it looks to the original.


If I draw "Neo from the Matrix" does it matter how accurate I get Neo?


If you drew it yourself there’s precedent under fair use. If you made something that drew it when prompted for “The Matrix” presumably it knows what that is and therefore is more ambiguous


For copyright purposes, the tweeter "made it themselves."


That remains to be seen. If I take a photograph of a still from the Matrix and print it, that's not the same as me photo-realistically drawing the same still from memory, which is itself not the same as me photo-realistically drawing it while looking at the still itself.

Copyright law is way more nuanced than you think, especially around fair use.



It's established that you can't assert copyright over a picture generated by an ML algorithm. I don't think it's established whether a picture generated by an ML algorithm can infringe someone else's copyright (e.g. if you ask DALL-E to produce a picture of Mickey Mouse, and it does a perfect rendition of him, is this picture in the public domain, or does it infringe Disney's copyright?).


Hang on, I don't see where that said that you can't assert copyright over a ML generated picture. I think the decision says that the ML algorithm itself cannot assert copyright?


The decision says that no one can be considered the author of that picture, as copyright only protects human creative works. It is possible that the work infringes someone else's copyright (the decision doesn't address this point at all), but unless that happens, the image is in the public domain and you can freely copy it.

Note that, unlike patents and trademarks, copyright doesn't require registration. The court could have found that, while the AI can't assert copyright, the copyright still exists and belongs to the owner of the AI or the designer of the AI etc. Instead, they found that the work can't be protected by copyright, since it is not the product of human creativity.

Edit: here is a citation from the article that I'm basing the previous assertions on:

> Both in its 2019 decision and its decision this February, the USCO found the “human authorship” element was lacking and was wholly necessary to obtain a copyright, Engadget’s K. Holt wrote. Current copyright law only provides protections to “the fruits of intellectual labor” that “are founded in the creative powers of the [human] mind,” the USCO states.


>> Also, there is no way that you can argue these images are not transformative.

Exactly. Kermit has been very much transformed, and "in the style of" is not copyright infringement AFAIK.



These are honestly not very impressive (no sarcasm here) and further convince me that the next AI Winter will come with this coming recession.

Don't get me wrong, they are still impressive in the quality of the visual they produce, but just like Markov Chain demos of old, they're neat but way miss the mark.

None of these capture the "feel" of Kermit the Frog. Most of them look like weird designs for the Ninja Turtles movie in the 90s.

There are several distinctive features of Kermit that a missing from nearly all of these.

- For any of the "live action" ones, Kermit should still always be a puppet. - Kermit notoriously has lanky arms, - Kermit never has eye lids - His eyes sit way on top of his head. - He often has his weird neck decoration. - His eyes have a very distinctive pupil shape.

None of these get Kermit correct, they all just look like frogs (maybe Dalle2 isn't trained on copyrighted/trademarked material?)

There are fan made versions of some of these which show just how different Dalle2 is from human imagination:

Kermit actually has been on family guy: https://static.wikia.nocookie.net/muppet/images/7/71/Famguys...

There are several "Kermit in Star Wars Examples" here are two: https://i.kym-cdn.com/entries/icons/original/000/021/668/ker..., https://i.ytimg.com/vi/6MebZx-4950/maxresdefault.jpg

Again if this was done on someone's laptop it would be really impressive. However the fact that so much talent and resources were poured into pushing AI to it's limits and this is what we get tells me we've hit another brick wall as far as research goes.


I completely disagree. I think these are taking it a step further than your examples. Dalle2 is not just using the existing Kermit and pasting it in different environments, it's modifying Kermit to fit in that world.

For example, your Star Wars example...

https://i.ytimg.com/vi/6MebZx-4950/maxresdefault.jpg

It's clearly just an existing photo of Kermit pasted over an image from the film. There are even two sets of arms. I could Photoshop that in a few minutes.

Then, the Dalle2 image...

https://pbs.twimg.com/media/FUEDDm2UEAAO8yb?format=jpg&name=...

I think it's impressive. It looks like Kermit is a character in the Star Wars universe. There are a few issues with the eyes and feet, and it's also hard to tell if it's a creature or a person in a frog suit. However, it gets 90% of the way there, and the pose is great for a frog/human hybrid.

The most exciting thing is how this could be used as a starting point for design. I could take the Dalle2 Kermit image above, fix the eyes/feet, add a few distinctive Kermit features, and have a great piece of concept art in an hour, rather than taking a day or two to create something from scratch. Obviously it can't be applied to all workflows, but for those it's suited for, it'll save vast amounts of time and costs. For that reason, it's already something of real value in its current state. The same can't be said about the Star Wars examples you provided.


I genuinely can't tell if you're trolling. This isn't impressive because the AI model doesn't accurately capture the "feel" of Kermit!?


The computer was asked to produce photos of Kermit the frog. It failed spectacularly at rendering anything resembling Kermit the frog.


Yet, it did produce things which "look like weird designs for the Ninja Turtles movie in the 90s."

In other words, it has done as good a job of costume design, lighting and photographing a live action Kermit as New Line Cinema paid $13.5m to accomplish for TMNT in 1990.

And you know who they got to do those creature designs?

Jim Henson

So maybe we shouldn't be so dismissive.


These are crazy takes honestly


Agreed. They are self-evident, high-quality representations of KtF. When faced with disagreement on that, it's hard to know what to say in response to convince someone other than to point to another canonical representation of KtF and say, "These are the same."


The OP pointed specific examples:

> - For any of the "live action" ones, Kermit should still always be a puppet. - Kermit notoriously has lanky arms, - Kermit never has eye lids - His eyes sit way on top of his head. - He often has his weird neck decoration. - His eyes have a very distinctive pupil shape.


This is something akin to isolated demands for rigor. Apparently these features are not essential to KtF-ness, because most people look at these pictures and see unambiguous KtF.


When it completely does capture the feel of Kermit.


I disagree that this is unimpressive, but do largely agree about AI winter. Dall-E 2 is probably the most impressive AI implementation I can recall in the past 5-10 years and it's still highly specialized problem being solved, and it's unclear what market it really can go after other than freelancing digital artists online who work for tiny commissions. I guess it's also gonna be great for NFTs but I consider that market illusory and will disappear within a few years.


I am definitely going to bet against the "AI winter is returning" idea by investing huge amounts of my time into understanding these algorithms. History doesn't have to repeat, sometimes that's the foolish prediction (the apple newton was made fun of by the simpsons but when the ipad came out the timing was perfect). I don't like overinflating OpenAIs already enormous ego but these are incredible images.


Agree there's no reason it has to come again, just things got so hyped the past 5-10 years I wouldn't be surprised if a reality check on the horizon involves a "winter" of sorts and paring down of investment, as it becomes clear that current techniques are great at highly specialized problems where there is a lot of data. There's lots of firms that have both though so maybe it just keeps chugging along as ROI is found.


Well the investment landscape in general is kinda drying up (innovation is getting harder) and AI is one of the few areas that could or is bearing fruit so that's a big part of it too (even hardware and software is kinda stagnating). If investment is going to go anywhere AI is still one of the few places in the world that could have 100x returns (despite many frauds, yes).


While I think many are over-interpreting the quality of these results, yours is sounding like a clear case of a No True Kermit fallacy.

There are many ways to define what "Kermit the Frog in $MOVIE" means, and the choice the AI made is absolutely valid. There are of course various other valid choices, but this doesn't invalidate the ones presented.

Furthermore, judging by some other examples in this HN thread, it seems that the fact most of the pictures are not puppets is more of a choice of the human choosing the photos, as in other cases DALL-E was indeed adding puppet-like characters in movie-like decors.


I thought I was taking crazy pills, none of them look like kermit bur rather they look like a generic frog. They don't even have the same pattern around his collar.


It is odd, isn't it? It captures "essential" characteristics of all those films in a honestly brilliant way - but it doesn't capture any of the iconic characteristics of Kermit himself!


Your take on Kermit is too literal. Allow some artistic license. And you neglect all of the other thematic elements from the prompt.


> These are honestly not very impressive (no sarcasm here) and further convince me that the next AI Winter will come with this coming recession.

"Sure, this AI can produce high-resolution realistic images leaps and bounds above anything that's been shown before... but there's an aspect which could use improvement. Obviously, this proves that the current AI technology will never amount to anything and we should just give up on it now."


> Again if this was done on someone's laptop it would be really impressive. However the fact that so much talent and resources were poured into pushing AI to it's limits and this is what we get tells me we've hit another brick wall as far as research goes.

You might be missing the point of what OpenAI is doing. The point is to show off the capability of their models in a way that's likely to go viral and lead to more business for OpenAI. Some people laughed at GPT-3's silly demos, but when they launched GitHub Copilot...


... And it's a decent tool?

If people say Dalle can improve the workflow of digital artists, sure, but Copilot hasn't revolutionized programming either, you still have to be a good programmer to finish whatever you are doing:

> A paper accepted for publication in the IEEE Symposium on Security and Privacy in 2022 assessed the security of code generated by Copilot [...] The study found that across these axes in multiple languages, 39.33% of top suggestions and 40.73% of total suggestions lead to code vulnerabilities. Additionally, they found that small, non-semantic (i.e., comments) changes made to code could impact code safety.[14]


> but when they launched GitHub Copilot...

What happened next? Is anyone using copilot for serious work? Has it changed programming in a fundamental way?

I personally have zero use for copilot since the for type of code I write the actual code writing is not a bottle neck, so automating that process is of no value to me. On top of that getting the details exactly right is essential so the ratio of boiler plate to real code is very, very low for me.


I agree with what you say in re: Kermit. Most of these images look to me like a frog that looks like Kermit the Frog but isn't. Metaphorically (and literally) Jim Henson isn't in these images.

However, I don't think you're correct in your assessment of the import of this sort of thing: it's an imagination machine. This isn't a brick wall, it's a foundation on which to build.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: