What's kind of crazy is how the images tend to have similarities in small features that become very apparent when flipping back and forth between images, but which are not obvious per se.
For example, I flipped back and forth between Beatrix Potter and Paulus Potter. A rounded white bonnet in one picture becomes a couple of blossoms in the other. The roof of a house becomes some shadowy wall with plants in the other. Two flower pots are very similar, just with slightly different coloring.
It makes it more apparent that the algorithm etches the images out of noise, and if the seed is the same for two images with different prompts, you're likely to see traces of that noise represented differently but recognizable in both images.
The positioning seems very consistent, almost to the point where I wonder if that was part of the selection process to demonstrate the differences in style. There are only four per style, where the position of a subject could be a selection factor. Hard to tell if the position similarities are driven by the Stable Diffusion model or by the selection of representative images.
The composition and positioning come from the original seed. If the same seed is used, the same background image noise is applied for the initial image which is transformed into all the styles.
Thus the similarities you see would make sense if using also the same seed for the tests.
Yes, if you generate a bunch of images of waves or the sea, or other repeating patterns with the same seed, you can see how all the 'peeks and toughs' of those patterns line up in the same place.
incredible job! i was just showing my own experiments with sd to a friend and this just take the cake. thank you so much for mixing in that artist list!!
If you play with Stable Diffusion enough this behavior becomes very apparent. Changing the seeds will give different results, but even relatively significant changes in the prompt will still find similar themes or layouts.
Arguably it could be different from our experience. It could even be a superior and more efficient methodology then the things our brain uses to imagine things.
The blue woman in Sam Bosma's style looks strikingly similar to the one in Noah Bradley's style. They even have the same 3 patches of pink on their cheeks.
Very cool! I recently made a game kind of like AI "pictionary" where the user has to guess the "artist", subject, and description of a piece of art generated by stable diffusion:
Including a few real sample pictures from the artist, or a link to a google image search of the artist name, so you can check if the generated style matches.
Funny that the Bob Ross version just makes them look like Bob Ross. Maybe there are more pictures of Bob Ross in the training set than his actual paintings.
While it is the case the dataset isn’t well curated or perfectly labeled, it could just mean that grammar is not understood - the labels could be clear to a human, whether the image is a picture of Bob Ross or a painting by him. But the training misses that relationship. Even with poorly labeled data, I suspect AI will eventually figure out which labels are more likely to be poor and deal with it appropriately.
In the reverse direction, you can try:
A horse rides an astronaut
And you will probably generate an astronaut riding a horse. It’s not a poor description of what we want; our assumptions about how grammar should work aren’t being honored.
Nice experiment! I would only change two things: (1) sort the artists in alphabetical order, and (2) allow users to write the name of the artist and show if it's in the list.
I'm saying that because it's a little bit tedious to search for the artist you're looking for.
A part from that, I find the idea super-interesting :)
I’m seeing that too. Results don’t match the artist at all. Also, the names at the top are alpha by firstname, and all the Japanese artists are bunched up at the bottom.
I lecture in painting, and to me some of these are truly impressive. Not surprisingly, those artists is not predominantly painting (e.g. Josph Beuys) or is predominantly linear (Audrey Beardly) do not fare so well.
There is a lot of talk in our department as to how we might prepare our students for this technology. It is scary how fast it is growing, and how it is spreading to things like 3D and texturing.
One of my team is already using it in production. It used to take his artists three days to come up with five visual development ideas. Now he can get fifty overnight to choose from.
I agree, they are nice results. But Joseph Beuys is better known for performance art. In his most famous piece, he covered himself in gold foil, and explained art to a dead hare (below). To my mind, it makes little sense to apply this method to his art.
The Audrey Beardly results are a bit better, but if I wanted Beardly-ish drawings I would likely be disappointed. Beardly's lines were very fine... 'filigree', like a spider crawling over the paper, and he almost never made art in colour. Also, a lot of his work was as sexy as hell. NSFW last link (bottom of page) for relatively mild examples.
I'd be interested in hearing your take because you are actively involved in this field. From my perspective outside of it, it seems that it's going to be an absolute bloodbath in terms of opportunities for people to actually live as artists (excluding those who work in mediums that can't be represented on a 2D screen).
Yep. A bloodbath is certainly on its way. Our illustration program will be first in the figuring line.
We think that there may be some room for our students as 'high class' art directors. What will give them unique merit is their deep knowledge of pictorial formalities. Anyone can give the text hint 'flowers in a vase'. But what about...
'Move the camera down to avoid the strong coincidence line between the edge of the vase and the edge of the table. Change the saturation value of the vase to emphasize background/foreground contrast. Increase the amount of negative spaces around the periphery of the flower mass' etc.
Tek like Stable Diffuse may also lead to a resurgence of interest in natural media, like oil paint, water colour and suchlike.
I suspect that we'll end up with a split similar to amateur and professional photography, with generative models for the latter trained not on plain English but on something much more stringently structured, with ability to unambiguously specify many important but non-trivial parameters in the prompt, including for specific well-defined areas of the output etc. Probably with a GUI on top where you can literally highlight specific objects etc and adjust parameters, and it'll construct the query for the next iteration.
The concept of "derivative work" is pretty important in copyright law. I wonder if anyone has thoughts on this, in terms of this type of project. Should there be legal implications to this?
I know someone -- a completely unknown artist -- who used to make a fair portion of their living by drawing D&D characters for people. Unfortunately, orders slowed down, because someone can input one of his images into software like this, and generate endless variations in the same style. Should this be allowed?
Are images created "in the style of" a certain artist completely dependent on images created by that artist? If so, should that artist be compensated? Why or why not?
A human artist can freely paint something in the style of another artist. It's not considered a derivative work. You can't copyright a style.
A derivative work is an adaptation, translation, or modification of a particular, existing copyrighted work.
If you asked Stable Diffusion for "Vincent van Gogh's Starry Night with a cat looking at the sky", you'd get a derivative work (although Starry Night is in the public domain, so you wouldn't be violating its copyright).
I’ve been idly working on a similar list but with a much more basic prompt.
The results vary wildly, even run by run, but I might put them in a few buckets:
1. Similar enough that someone who doesn’t know much about art could be fooled
2. Amateur knockoff but recognizable style
3. Influence is there if you know what to look for
4. Artist probably not in the training data at all
The last one kinda surprised me, for artists whose work is online and who have unusual names. I would have thought those cases would be really good. Maybe they ran out of disk space with all the porn?
Also interesting that it gets much closer for figurative painters than for abstract painters.
What frustrates me about Stable Diffusion is there doesn't seem to be any documentation as to what artists or vocabulary it understands. Generally people say "look at existing prompts or use various prompt generators" but that doesn't really solve the problem. I don't want to just look at what other people have randomly discovered; I want to know what the program really knows.
You can't really debug an AI. The dimensions of it's understanding are quite literally beyond human interpretation, which makes it both smarter/more efficient than humans while also extremely dumb and context-unaware. Most of our attempts at adding a 'memory' to AI has been a hack thus far, which is why all of these prompts consist of people force-feeding word salad down the AI's throat for generally reproducible results.
It's the nature of ML models--nobody is 100% sure what it understands until they try something and get results.
It was given a lot of tagged data: 600 million captioned images from LAION-5B. So if you want to know what it might support, you could try any one of the captions from those 600 million images.
But why isn't the list of words from those captions available anywhere (at least as far as I can tell)? There may be 600 million captions, but the number of unique words would probably be 10 or 20 thousand at most, completely feasible to browse or grep.
SD isn't a person you can converse with. It's just a program trained on captions and can do no more than what's in them. It's like those old adventure games that would always complain "I don't that word" except even worse because SD will happily make a picture with words it doesn't know and not tell you.
I think anyone who has both played an IF game and has played with stable diffusion knows there is a world of difference between the two.
The main difference is that coming up with a word SD doesn't know that's not contrived is really difficult. In an IF game, you are constantly guessing the correct word.
I haven't downloaded the database myself, but I imagine if you did it wouldn't be too hard to get that data. Looks like you can get the torrent here https://laion.ai/blog/laion-400-open-dataset/
I don't think the underlying model is word based, but character based. You could download the caption data for LAION and grep that, but it's not strictly 1:1 with what SD was trained against.
Huh, interesting, I had just ... assumed CLIP's tokenizer was character based, like GPT's was. At least, I think GPT's is character based?
Is there any reason it couldn't be character based, besides the (presumably very large) increase in resources needed to train and run inference? This is all way out of my league, but seems like you could get interesting results from this, since (by my caveman understanding) this hypothetical transformer could make some sense of words it had never seen before, so spelling variants or neologisms and such.
I started a proper reply but had to board a plane.
It's actually a byte-pair encoded (BPE is better than character encoding but can do the things you mentioned) list of things that includes words. You can find common English suffixes in it listed separately too.
Thanks for the responses, I really appreciate the help. My only background with ML is playing with LSTMs and simple sequence-to-sequence models back before transformers, and the last few days I've been trying to deep dive as much as I can into the "state-of-the-art". I dislike treating the technology as a magical black box...
GPT (and many other modern NLP models) use byte-pair encoding. Your summary of the benefits of this is correct - it can deal novel words much better.
Byte-pair encoding (BPE) is better than character encoding because it can deal with unicode (and emojis).
CLIP uses a BPE encoding of the vocabulary: The transformer operates on a lower-cased byte pair encoding (BPE) representation of the text with a 49,152 vocab size
So strictly this vocabulary is NOT (just) words, it is common sequences of byte pairs. You can see this if you examine the vocabulary - you'll find things like "tive" which isn't a word but is a very common English suffix.
Thank you. This is really helpful. Yes, you don't know exactly how SD will respond, but for example you can grep celebrity names and can know whether SD has any chance of drawing a picture with them in it or not rather than just randomly guessing.
It's a word list, so as I'm sure you've already figured out, you have to grep first and last names separately. For example, "jennifer" as a first name is token 19786, while "garner</w>" is token 20340. If you want "james garner" instead, looks like that's tokens 6963 and 20340. Except, since it's a word list, there's still no guarantee that either celebrity is necessarily represented until you try.
49,407 tokens, many of which are not useful. It's an arduous process to narrow down, so this link decided to go in the other direction, working from zero up rather than 49,407 down.
The simple answer is that there is no clean cut list of artists that it "understands". The model has no explicitly programmed concept of artist or style -- just the CLIP based text encoding used to train the conditional autoencoding part of the denoiser network, trained on (AFAIK) caption data recorded with the image.
So in practice asking for art "in the style of <x>" is sort of limiting the denoiser to statistical pathways resembling other images captioned "in the style of <x>". At least, that's my understanding. Still trying to grok ML and diffusion models.
Honest question, would a solid understanding of the open training data help?
Having the art vocabulary down as well.
In effect, knowing what is present and how it’s tagged so you can « invoke » it more readily in the prompt-result.
Maybe I’m out of my depth. I know the corpus of tagged image used for training is enormous … but I still think that would help the user ( a prompt-crafter )
I downloaded the vocab.json mentioned above and I think it helps. For example it explains why I can get SD to make pictures involving Einstein but not Feynman. Feynman simply isn't in the training set, but Einstein is.
thanks for the follow up, that was my understanding but your example is telling.
I will have to check myself.
I tryed with minor success to have stable diffusion draw lesser know non-us personality.
It always kind of work, but the palette is limited.
For instance I tried charles de gaule and you get something that look like him. But he's depicted talking on a radio or waving his arm around like a politician.
I tried to make him to grocery or play volley ball, it does not really work. While Michael jackson or Dennis Rodman get a way better treatment.
edit :
that vocab file is smaller than I thought. "De Gaule" is not in there but neither Einstein or feiman? I think I missing something.
I Can find "obama", trump or Macron. Unclear about Michael Jackson. No beyonce or Denis Rodman. hmm weird, I had great result with all of them. Like.. recognizable details like tatoo or silly glasses.
Whatever else this emergent "creative AI" phenomenon may or may not do, it's definitely touching nerves in people who still believe there's something ineffable and transcendent about the creative experience.
Stable Diffusion is literally copy-pasting existing artwork. It's not creating anything new.
If anything, this sort of "AI" only makes human ineffable creative experiences even more valuable, because without it there would be no training set and no "AI".
Whether or not its creating something "new" is debatable, but its clearly not copy pasting. It is basing the artwork on a combination of other sources, nothing is created de novo, but that is very different than copy & pasting
If you want to experiment with something similar by yourself and you don't have the patience to wait for Stable Diffusion to crunch through thousands of images on your laptop or in a Colab notebook, here's how you can parallelize processing relatively easily on AWS Batch or Kubernetes: https://outerbounds.com/blog/parallelizing-stable-diffusion-...
Hate to say it but when i see stuff like this it only reminds me of what we could have achieved if this ingenuity had been applied in another domain.
Can't help feeling that this accidentally harms creative types and risks swamping us with visual junk.
The technical achievment is astounding but no-one would seriously claim that crafting an image via a short prompt is creative except in the most cursory way.
I'm probably missing some life changing use-case, but apeing art in random styles can't be it.
People said this about cameras. About digital cameras. About digital photo editing software. The next generation will normalize these tools and find incredible ways to be creative within their new cutting edge medium.
The post-art world is here! Just think about how history books will remember this period! The styles that will be borne of necessity, of the need to break down art and find what makes it tick.
"People said this about cameras. About digital cameras. About digital photo editing software."
Also about desktop publishing.
Remember all the printers (ie. people working in the printing industry operating printing machines) that were put out of a job when you could just buy a (electronic) printer for your home computer and just print whatever you wanted yourself?
People were wringing their hands about that too back then... now we take it for granted that we can instantly print whatever we want whenever we want, without having to pay an expensive professional to do it for us (something most people couldn't afford).
Has it resulted in more junk being printed? Absolutely. But it also let people print all sorts of fantastic not to mention useful things that would almost never have seen the light of day without cheap and easy access to home printers.
The xerox copier was similarly revolutionary... as was the printing press itself, which put a lot of scribes out of business.
Photoshop put a lot of airbrush artists out of business, and who does copy and paste with physical glue and paper anymore?
As with photography, printers, copiers and photoshop, artists who embrace this technology will be able to use it to enhance their creativity and speed up their creative process.
There'll be a lot more competition, a lot more junk but also a lot more fantastic art that we can't even dream of yet.
> [...] and who does copy and paste with physical glue and paper anymore?
When I was finishing high school in the early 2000s, we still had teachers who made worksheets that way.
I remember one history teacher in particular. She used a photocopier to get sections from books, cut-and-paste them together, and then use the photocopier again to make the final sheets to distribute to the students.
I was very surprised at the time, but also admired the ingenuity. The process is much more physical than using a PC.
Yep, I see this as a start, and very curious to see the ways in which it’ll get used with a human in the loop, and also the ways human artists will be pushed to creat art that’s out of distribution for these models.
Even then, few artist got famous on technical skill alone, surely less than those who got famous primarily for their message irregardless of their skill. And this is before getting into the endless pit of defining what art is.
Besides having an ai doing the legwork is no much different than Veronese giving large swath of paintings to his novices while focusing on the two/three major parts.
I think we agree then - if new technologies allow an artist to more rapidly explore, iterate, and refine a particular message, then those artists should still have something to create beyond what is possible with these images.
> As long as “invention and feeling constitute essential qualities in a work of Art,” the writer argued, “Photography can never assume a higher rank than engraving.”
Ha, dissing engraving at the same time as photography.
Though I wonder if the 'write' meant engraving of a design someone else already produced, or any engraving work at all.
Both reactions always happen. With basically anything new, people will select some points via happenstance or bias, draw one of a few basic trend lines [1], and give a hot take. Because they generally think only about first-order effects and don't imagine other things that could happen, the hot takes are often of the utopia/dystopia variety.
These hot takes generally tell you more about the opiner (or the audience they're playing to) than the reality to come. It turns out it's hard to model en entire universe using 3 pounds of meat.
True, but then we've essentialy had limitless image generation capabilities since we've had the tools to make marks. I guess this is faster, and in other ways it offers promising new opportinities for people who can't / don't want to learn to create stuff directly.
Others are interpreting my original comment as "this is not art", but I'm not really trying to make that argument. Art is entirely subjective and i don't presume to define what is or isn't art.
I guess my point is more specifically "what itch does this scratch"?
It's really cool, and that may well be the answer tbh.
people who can't / don't want to learn to create stuff directly.
That’s 99% of the people. I mean even to learn prompt engineering will probably be too much for majority of those 99% people, but it’s a huge step forward in user friendliness, compared to, say, photoshop.
”what itch does this scratch"?
How many people post pictures on social media? Many of those pictures are not personal, they show something pretty, cool, or interesting in some way. All of those people can potentially use image generators to achieve the same effect.
> only reminds me of what we could have achieved if this ingenuity had been applied in another domain.
I hate arguments like this. Even ignoring how dismisive it is of the achievement at hand, why would you assume ingenuity is transferable like that? Someone who makes a breakthrough in physics is by no means likely to have made an equivalently ground breaking advance in biology if they had decided to study that field instead.
I think this phrase actually means "I don't want to seem like a Luddite, but now that AI is disrupting something that I personally care about, I'm no longer enthusiastic about progress"
Aside from the fact that I explicitly praised the achievment, my point actually relies on said appreciation.
I guess my musing was hypothetical but I was careless in communicating that. I get that we can't centrally plan innovation or human effort - and I certainly wouldn't want to live in a society where this was the case.
I am a filmmaker and most films are essentially crafted this way. Beyond hiring and securing resources, directors essentially create by communicating ideas in short prompts because there isn’t enough time to do anything more.
I could absolutely see an AI model doing the job of an entire film crew. I have issues with this, but only with respect to the longer term aggregate affects on culture in the broad sense. I cannot honestly believe that much would be lost from the perspective of one project or another.
I'm of the opposite opinion. AI assisted art is simply the natural next chapter for "art" as a whole. It will finally kickstart the public discourse about what being an artist means in the perspective of artistic vision vs execution.
Most artists spend their lives not refining their brush stroke, but rather their eyes. The way I see it, the impact of curation and artistic direction will matter more and more in the future.
i find myself mentally unable to comprehend people who believe that the drawing part of drawing is monotony.
discovering that this mindset not only exists but is widespread has been equally as disturbing as any ai advancement.
I would look at it as more of "allowing normal people access to unbefore-dreamed-of levels of draftsmanship" vs some comment on capital-A Art. It allows non-artists to express themselves visually.
I'm biased: I've been working on an image generation app. But the beta users I've had so far will generate fifty or a hundred images in a day. That isn't a use case traditional artists support.
> reminds me of what we could have achieved if this ingenuity had been applied in another domain
It will; I think the reason we're seeing diffusion models applied to image generation first, is that it's a task that meshes well with the models. But also in general I think people will still be guided by the principle "use the right tool for the job" - this is just another tool. I doubt that the set of paths toward realization for any given needed creative imagery collapses to just "use a model"
I don't know why everyone assumes that ML researchers have some big map of the future where they can make decisions like "yeah, let's choose this branch over here, where AI gets good at generating art first, rather than that other one where it cures cancer a decade earlier." The breakthroughs come where they come, and no one knows where some model architecture will have an application in the future.
Bosch is marvelous. Mucha and Monet are good. Michelangelo tries to be more like his sculptures. Not sure about Walt Disney and Roy Lichtenstein, and Rembrandt version is especially ugly. Maybe it will worth to get rid of "van Rijn" in that sample.
Overall: the impression is better with author's popularity. I think if we train the model only with well-tagged filtered dataset - results may be much better, but we effectively will get a 5-year old Prizm app.
It is sort of funny/interesting -- I only tried a few, but famous anime or manga artists (try "Junji Ito" or "Hayao Miyazaki") seem to have at least one picture that is clearly the result of the algorithm picking up on their fans' art.
It's really interesting how good some of the results are, but the styles seem to be completely mixed up.
Just looking up Claude Monet, and there isn't anything impressionist. Leonardo da Vinci, on the other hand, give results that rather look like British impressionist paintings.
I've seen paintings in the style of Donato Giancola that really looked like his style, but in the examples of this site none of the result do. Maybe there needs to be some adjustments to the prompts?
Interesting, how with billions of nodes and supposed "intelligence", the network hasn't been able to deduce a simple concept of symmetry in human faces. All of the eyes and all of the lips in all of the pictures are asymmetrical, which easily gives AI generated images away.
"Facial symmetry === beauty" is not that old of a scientific concept, relatively new if you compared to how long humanity has existed before someone started to really study it.
And even so, too symmetrical faces will look just as un-human as a face that is too asymmetric. You need a face that is just the right amount of symmetrical in order for it to actually look good.
I think you make it sounds simpler than what it is.
It's also not a model that is trained to make as realistic people as possible, it's trained on a lot of different things, so obviously it won't excel at making realistic people. But one can easily imagine that some future models will be heavily trained on making realistic people rather than semi-realistic everything, like Stable Diffusion is trained to do today.
Many of the artists in this list specifically try not* to have this symmetry present in the faces.
It's what makes many styles of art separate from just taking a photo or simply going outside.
The system used here is actually astoundingly good at producing many artists stylesbecause it's not going for symmetry.
Midjourney is way better than this, though it too can produces some weird results... but at its best it's absolutely indistinguishable from photographs or art made by humans.
It feels like they really only track local continuity without any meaningful knowledge. Hands are wonky and wrong in ways that don't look like any drawing I've ever seen. Horses with five or six legs.
Humans also have a hilariously hard time drawing bicycles, but at least we pretty much always nail the number of appendages.
Interesting that it seems to have no concept of the filmmakers they tried to include here - Tim Burton and Walt Disney didn't produce anything recognizable and look to me like the default stuff you get without providing a style.
Honestly I don't think its very good at emulating style. It picks up some things but often times misses the heart of the matter.
Did some spot checking with some of my favorite artists. Rockwell's paintings are all about storytelling, clearly not present in the work. Their emulation for frazetta doesn't look like frazetta's work at all. HR Giger emulation is a joke. David finch at least gets a penciler's style but misses the use of solid blacks and dynamic posing. Frank Miller doesn't look like miller's work at all. etc etc etc
This list goes on and on. Personally while I understand using the 'in the style of' as a way to change the image results, I think in many many cases the results just don't look like the art of that artist.
Truly impressive, though it gives me some great fear for all the people whose career is art.
This will likely take the pareto to a new height instead of 20/80 maybe even 1/99
> The people making fliers have been replaced by AI prompting overnight.
I haven't seen good examples of that yet but I'm curious how push-button you can make this. Flyers, web design and UI design require the copy, layout, information hierarchy, colours, illustrations, branding etc. to be cohesive so it's a different problem space with way more constraints compared to generating a single image.
If getting the final design requires a lot of rounds of prompting and tweaks, busy people are going to outsource this still (in the hope the prompting and feedback needed to the person doing the work will be less).
That group will never be selling their own work as contemporary fine art but want that prestige, and the one way they had to make table scraps with that skillset is now gone.
A different person is doing their own flyer art with AI and adding words around it themselves, as evidence by my last months worth of fliers that have reached me. Promotion companies have always been up on trendy tools for differentiation.
I used to draw the flyers when my band played shows. People still went to the shows. If had iterated a few times with one of these models to draw the thing that I wanted to draw rather than deluding myself, and spent 5 minutes polishing it in photoshop, the product would have been orders of magnitude better imo.
Art is rarely about the product. Some color on a canvas has been traded for millions, while anyone could just have dropped a bucket of paint to create a similar result.
After checking some 19th century artists I see it failed hard. All of the responses look the same, there isn't enough data in the training set to differentiate actual styles beyond "vaguely realistic".
For example, I flipped back and forth between Beatrix Potter and Paulus Potter. A rounded white bonnet in one picture becomes a couple of blossoms in the other. The roof of a house becomes some shadowy wall with plants in the other. Two flower pots are very similar, just with slightly different coloring.
It makes it more apparent that the algorithm etches the images out of noise, and if the seed is the same for two images with different prompts, you're likely to see traces of that noise represented differently but recognizable in both images.