It's interesting to see that Latent Diffusion + LAION 400M does much better with text, it's actually a challenge sometimes to get it to not write the parts of the prompt meant to guide the style on the resulting image.
GTP-3 really rocked the HN world a few years ago, but the general public never heard of it. Prediction: people love images and this could be the moment AI generation goes mainstream.
We discussed that yesterday among my friends but nobody could come up with a good use case.
Let’s say Art. What makes art valuable in a sense is that artists have an opportunity cost of doing that art. It’s a sacrifice of something else in their life.
AI art gets inflated instantly. Everyone can get a beautiful painting in a button click. Not valuable.
- Generate unique pictures for projects (icons, buttons, etc), or for presentations. Especially for short, not-so-serious presentations, I'm scouring google images quite a bit for a fitting picture. DALL-E might greatly reduce time spent on that.
- "Get a poster of your writing prompt" -> I think some of the t-shirt/poster companies will definetely incorporate DALL-E, and let users choose from a set of designs based on their writing prompt.
- NFTs maybe? And just playing around with it for fun or views on social media. I also think somewhere there must be a fun little game one can make based on it. There was a pretty awesome rpg_generator for GPT3, maybe some kind of puzzle game based on DALL-E?
Problem with the first two is that, to my understanding, there's zero guarantee that DALL-e will not generate copyrighted material which exists in some form or the other in the training data set. And you can't really check if the image generated holds it.
A custom shirt printing company would offer the generator. It's up to the customer to decide whether the output can be legally sold (or, perhaps, they're not selling it).
Anything you wish can be printed today, any logo of any brand. Only selling it is as a product restricted.
Whatever it covers, it never stopped anybody I know from ordering whatever shirts they wanted, even if it had like 100 logos of all possible brands on it (yeah we really tried to trigger some reaction - couldn't), and nobody ever had any problem with us ordering it or wearing it (or any other t-shirt we """illegaly""" made).
And I can murder someone in the forest and nobody will have a problem with it, because those that would care would not know. That shirt company would almost certainly be sued if interesting parties knew, as has happened before [1], just as if someone saw an obvious derivative work coming from an AI.
For a real world example, I've had real company logos appear on my GAN generated images. I used labels that included terms specific to a single companies rendering technology/pipeline, and the vast majority of images that had those labels had the company logo on it. They were somewhat distorted, but it was very obvious which company it was.
For me, as someone who has to use stock photography often, this doesn’t make much sense. What I really value is an AI system that sorts pictures fast. I don’t need exactly the depiction I’m looking for, but I want to discover a look and style that fits well with the project. This is not what Dall•e is capable at providing. The first stock photography dall•e service has a big chance of turning into the Microsoft clip art of photography more than a disruptor of the market.
From an artistic point of view, the value is zero. Dall•e pictures will never have any Aura to them. And for an artist to use them artistically Dall•e is “too easy”, as the artist won’t have produced or worked on the model itself.
All in all, Dall•e is the most exciting, technologically advanced, and beautifully useless AI project I’ve ever seen.
Today my personal biggest use of AI is for generating photos of faces as profile pictures for tabletop rpg character sheets. I don't need them to be perfect, I just need quick, easy, cheap content that's good enough.
Tomorrow I might use Dall-E to create illustrations for my blog posts, or depict environments for my tabletop campaigns.
They might be "too easy", but they'll be cheap and quick and good enough for the purpose.
You’re thinking about it from the perspective of a professional artist. But put yourself in the shoes of someone who’s not an artist and would like free art. The implications for artists who want to get paid might be bad, but even as an occasional artist myself, I have to admit most of the results in the article would be hard to distinguish as computer generated, if shown on their own in a different context. I don’t agree that AI pictures will never have any Aura - in some sense the opposite is true, some amount of Aura may be guaranteed because the inputs have it. The Aura is only lost when you assign the ‘machine generated’ narrative, when you know the story and already have opinions about whether it’s good.
People who want cheap but unique abstract art for their house, small business owners who need ad clip art but don’t have the budget for a designer, even artists using DallE queries to generate launchpad ideas, I can imagine many reasons DallE would get used. But also don’t forget to imagine the future, this is one of the first and it’s bound to be improved until you can’t distinguish it from “real” art.
There’s a valid point in your comment about how artists currently look at AI, and a valid question about what AI produced art actually means, so I’m throwing you an upvote because I don’t think you deserved to get downvoted for sharing your opinion. Maybe more importantly, AI generated art could challenge and rekindle the age old debate about what art actually is, and bring a new perspective to the table.
Thank you. I actually basically said that I'm in awe at what Dall•e is capable of, it's just that we've seen a lot of these watershed moments for AI creativity, and to be fair it's always the people who don't really work with art that call them this way.
You know what's really impressive, that just got out? Adobe's Premiere Pro Auto Color feature based on Adobe Sensei. Ask a colorist or a filmmaker how it works for them. Is that something that was possible because of fringe research like Dall•E? Honestly, no.
What you say about supposed watershed moments is true, but I don't think it leads to the conclusion you want it to.
As someone who designs and makes fine furniture (not the way I make my living, but I do know the business), I know that there will likely always be some demand for custom, hand made pieces, because people assign value both to getting exactly what they want, and to the fact that something is human and hand made. When people who value those things have enough money, they'll spring for that value. But I'm well aware that the bespoke and hand made will never be more than a 3rd decimal place rounding error in the ocean of factory made furniture that fills our homes and workspaces. Images can go the same way, and will, if the price differential is right. Most images in the "image marketplace" are not art, after all, but rather anonymous commodity visuals augmenting the presentation of an idea, or product.
My partner is an hobbyist painter and often looks for hours on Google image search for a source image to paint off of. He was excited to get access to Dall-e because it would give him a vast new source of stock images he could do style transfer on to give him a better source for his art. It is very common in art classes to mash something up in Photoshop to use as reference. This tool is far from useless.
think of dall•e like stepping stones to create the finish product.
this isn't supposed to make designers useless but rather make them more creative.
dall*e can be used as an inspiration to combine multiple images & create their own. it can be used to steal color combination that is aesthetically pleasing on the eyes.
if you think dall•e is useless, you haven't seen the images shared.
the realistic ones are hidden by openai but they can generate a lot of stuff that we won't know until an open-source version comes along.
Oh I can probably come up with many use cases. Business models? I'm not sure.
I could use it for...inspiration, stock images for a demo site, actual images for a live site, t-shirt designs, coffee mugs, and those drop shipping products, podcast or album artwork, etc.
Yes, maybe design starts to blend, yet lots of web designers used Bootstrap and Material Design so that sites started to look the same anyway. Maybe it'll just be things looking the same except with more artistic images.
plenty of business models. anything that requires designers can be used with dall•e for a monthly cost. designers require $1k-10k a month (assuming you're in us) & dall•e can greatly reduce that cost to $100 a month through scale.
i believe dall•e is even more useful than gpt-3 because as of now, only real business model that worked for gpt-3 was copywriting.
dall•e will have branches under images like tshirts, replacing desingers or providing designers with more inspiration, stock photos, nfts & so much more...
Sorry what I meant with “use case” is of course business model and you catched that.
I’m not convinced that people would pay for that at all.
It could be a helpful tool for a designer but if you’re saying someone would pay to get something done in the press of a button click I think that’s a very short term business that will be outmatched by an open source free alternative almost immediately.
Just look at how fast web designers are getting outcompeted by free frontend frameworks like Bootstrap or icon libs like Font Awesome.
Perhaps I’m just explaining the natural way of business and commodities, and most goods and services follows this pattern.
- in real estate, generate visions on how an empty apartment / house would look like with different styles of furniture (there are already some products for that but need more work)
- replace all the stock photo industry for professional illustrations
- mèmes. Huge potential there
- accessible image image manipulation. For instance the typical "remove that person", make me smile, add something cool in the background etc.
- not completely sure but I think it could be use as a texture generator for games and FX? For instance it seems to be able to make really nice grass, so I guess making metal / dirt / foilage texture should be possible
- would also need some investigation but it probably is possible to make renders about how a specific piece of clothing would look like on you on the beach. There are already some products that try to do that but they are pretty limited and don't let you change the context either. That might be tough about details and sizes though
Wall paper and waiting area decor. Logo and branding design. Stock photos. Children's books illustrations. Comic book illustration. Ikea assembly instructions. Need more?
It seems like a wonderful replacement for Shutterstock or clip art libraries for all manner of articles and presentations. You just need another NN to filter out images too similar to a copyrighted image much like YouTube's content ID system. The images are much better matched to a theme than using Google image search and they do not need to be licensed.
If I wanted to throw 10M to a piece of art, I'd rather spend it training Dall-e-2, and have an enormous OLED "canvas" that displays pictures generated by the model.
What is the content of the training set like, that DALL·E 2 has all this info? Are there people out there spending thousands of hours just tagging or writing descriptions of training images?
And does DALL·E just receive the image as-is with its description, or does it get more info (e.g. "this part of the image is the dog")?
I don't rename it, I release a new updated dataset. However, each dataset is a superset of the previous one, so you can always reconstruct the old one and thus 'Danbooru2018', 'Danbooru2019', 'Danbooru2020', 'Danbooru2021' etc have exact well-defined meanings that never change, and if you don't happen to have a copy of Danbooru2018 sitting around, you can just download Danbooru2021, unpack the Danbooru2018 metadata from the archive tarball & delete the post-Danbooru2018 images, and you now have a bit-for-bit copy of Danbooru2018.
Also, if you want to just discuss the general concept of boorus (there are many beyond Danbooru), it'd probably be better to invoke Safebooru https://safebooru.org/ which is what it sounds like.
+ Flickr, a huge AI-friendly database of human-captioned images
And the DALLE architecture can also handle masking where you initialize only part of the latent space with noise and initialize other parts with a starting image. The video on their website shows examples of that to replace a pet on a chair.
> + Flickr, a huge AI-friendly database of human-captioned images
You're right, I forgot to mention them. Their metadata is great, and most importantly photos have their licenses tagged and many of them are CC0 (including mine).
Are content and content tags that great though? I don't content tag my own photos there, and when I've tried to comparison shop cameras all the popular images are over edited /r/shittyhdr art…
Metadata seems really interesting though. You'd think a visual search AI would want to know the white balance EXIF tags on a photo, so it knows if a yellow object is actually yellow or just under a streetlight.
It’s fascinating how many people feel that this is a specifically powerful breakthrough toward AGI, I suppose because it seems creative. Don’t get me wrong, this is extremely cool! But art and composition are “pattern languages” that seem well-suited to our current ML techniques given sufficient investment in training. To me this feels more like progress toward the Star Trek computer, which we can interact with in basically natural language, and which can do sophisticated knowledge operations for us (from natural language instructions), but which still doesn’t “think” in a meaningful way.
Now, when Dall-E 5 starts to argue with you about which art style looks best or will be most marketable…
That’s kinda a stance I’d like to take. The difficulty with that argument is not the one posted by others - “ no no you don’t understand how clever AI is” - rather the difficulty I have is with the counter argument “how do you know that humans aren’t doing the same kind of thing”
Maybe humans are different, maybe not, but I currently don’t have strong arguments to show they’re not and that means we have to take this type of AI seriously.
For example: “but you just trained a big pattern matches on a gazillion images and all the Al does is recombine them” - ok, but how do we know human artists aren’t essentially the same? Great writers read a lot. If you hadn’t seen lots of pop art, you’d struggle to do pop art style paintings… even if a baby had perfect motor control, it’d struggle to paint a Victorian rabbit tea party because it would have no examples of a Victorian aesthetic.
I think humans actually are doing the same thing. The world is procedurally generated, and so are our creations :)
We also clearly act in pursuit of reward functions. You could go so far as to say that is the definition of “normal” behavior for humans.
Learning about how ML works is actually humbling to me, specifically because it reveals how much our skills reduce to pattern recognition.
Humans seem much better at learning than machines, ie. we can “figure out” the Victorian aesthetic after a mere handful of examples. What’s not clear is how much we learn this blank-slate as children versus how much encoded / genetic knowledge and memory we have (to what degree is the brain pre-trained?). But I assume we’ll close that gap.
So what differences remain to label something as truly intelligent?
We have a strong Theory of Mind (ability to reason about what others are thinking and feeling). I think that for us to recognize a machine as genuinely sapient we would expect it to have theory of mind about itself and us. Many animals seem to have this characteristic.
It seems unique to humans (so far) is that we have intrinsic motivation as well, and our behaviors are the combination of external and internal motivations. This is where true creativity happens - when one person makes something totally new, not directly derived from prior work. Be that a new musical composition or mathematical theorem. as a species we seek out novelty, experimentation, and “progress.”
Now, in our various AI and ML techniques we can and do simulate this with the injection of randomness, and in “genetic programming” we go to lengths to try and emulate that evolutionary model. So maybe there will come a point where the AI is generalized and then it starts having internal motivations that aren’t transparent to us.
I don’t know where the line is, but theory of mind and intrinsic motivation seem important, so perhaps there’s something there.
Side note, one of the new Star Trek iterations, "Star Trek Discovery" has a story arc where a data set collected over millenia is integrated into the ship's computer which eventually causes the ship computer to become sentient.
That's true, I guess I didn't consider those as being the computer. Most of those examples were creations using the ships computer. I don't remember the ship computer itself becoming sentient or human. That story ark was usually either Data (tng) or 7 of 9 (Voyager) or Odo (day).
I think you may be underestimating the potential of Dall-E. While it is true that pattern recognition is a key part of what it does, the fact that it can generate images from textual descriptions is a significant step forward. It's not just about art or composition; it's about understanding the meaning of the text and then creating something that reflects that meaning. In that sense, it is quite similar to how a human would approach the task. And oh yeah, gpt generated all the sentences before this one.
I don't think a lot of these critics realize how capable gpt-3 is of convincingly outsmarting most people in most contexts. The failures are glaring, and you'll no doubt catch them over time, but transformers are a sea change in ai power.
If gpt-3 can meet or beat human performance 80% of the time, that's sufficient for me to call it real intelligence. Not conscious or aware in any meaningful way, but smart, in many of the ways humans are smart.
In most online contexts where conversations are singular collisions between individuals who often don't have history, 80% of the real-seeming interactions you have could be gpt-3 output. In a many-shot, recursive, and domain targeted sequence of prompts, the percentage only improves.
The recursive, self referential functions learned by these models are not mere extrapolation from Markov chains and static stimulus/response bots. These models are approximating the functions required to produce human level output. They are approximating the functions biological brains use to produce output.
Even the naive, single pass probes into gpt capabilities we've seen in the last two years are incredible compared to aiml or even the best bots until ~2018. It's not agi, but it's evidence that agi is feasible, and a definitive milestone on that path. We need to be preparing for takeoff.
Do you have any references that show such dialogues? How does GPT-3 handle questions like "why do you feel this way?" or basically any question with "why"?
Spent some time in the gpt3 playground. It's very good at giving you information or writing stuff. But goes on circular logic if you ask any WHY questions.
On what grounds have you determined that GPT-3 has no subjective experiences? More generally, what would need to be the case (with some future AI) for you to say, "ok, yes, this one is definitely conscious" ?
I would need some explanation of what consciousness is, for starters. Even better would be an explanation of how I might test whether any thing (including myself or another person) has it.
A perspective I thought was interesting is when Jordan Peterson says that the "proof" that people have a divine spark within them is "try treating them as if they don't, and see how well that works out for you!"
In that sense it's quite similar to the idea of always assuming a person is acting in good faith: you cannot ever truly know someone's intentions, and even when everyone is communicating honestly and openly, communication is imperfect.
But if you start to believe that someone has negative intentions towards you, your body language, tone, words and actions will reflect that assumption, and it will create harmful outcomes where they were not necessary at all.
Similarly, we may never be able to objectively prove that a machine is conscious -- as you say, we cannot even prove it to another human about ourselves! But if we keep treating them with the assumption that they lack this "something special" that we have, and treat them unethically as a result, I think we could get ourselves into a lot of trouble.
A similar conversation is to be had about the way humans are factory-farming animals, 80 billion of which are born and die every year in conditions worse than concentration camps (cages so small they can't move or even turn around, living in their own shit).
If nothing else, what will this teach future superintelligent AIs about how to treat beings less intelligent than itself? About denying them all freedom and agency and exploiting them entirely for its own benefit (or arguably, enjoyment)?
This makes art more accessible, but in the end the 'art' aspect comes from what the creator is trying to express, in which case as long as humans (or anything else with a creative drive) exist there will be art.
Just like how photography didn't cause the end of paintings, movies/TV didn't cause the end of stage plays and electronic music didn't replace instruments, all this will do (and to an extent has been done by other generative models) is create a new kind of art.
That’s my take too. Nobody will care for a AI generated painting made with no effort by a computer.
Something new, where human effort with an opportunity cost remain, will arise.
Perhaps non-perfection will become the new trend vis-a-vis the opposite today?
But who says that those artists must declare that their artwork came from an AI?
Even on HN, people have already begun to accuse others of writing comments with GPT-3 without disclosing so. Whether joking or not, this feels like a glimpse into future human attitudes towards this breed of creation.
If we equate human-created == valuable, and AI-generated == not valuable, but we as humans cannot reliably tell the difference, then I'm picturing a future of distrust where creators/consumers suspect or accuse each other of fraud.
Such behavior is already prevalent with social media today, so why wouldn't it be amplified even further when new and advanced technology is introduced? DALL-E 2 isn't even at an AGI-level and it has already cleared some people's bar for what qualifies as "production-ready art."
I also recall the recent article where Ed Sheeran said he films his creative processes to prove he was the true author - and that was an authorship case solely involving human-created works. In the future, "pure" artists might have to adapt a similar protocol to protect those kinds of virtues.
You: The forum singularity will be reached when the GPT-3 generated responses themselves include accusations of writing comments via GPT3.
Them: Hell, I can't be 100% sure (outside of my part) that we are not already there right this second.
You: Are you even a human responding to me?
Them:
Response:
I can't be sure that I am a human, but I can be pretty certain that I am not a GPT-3 generated response.
> Nobody will care for a AI generated painting made with no effort by a computer
If nature can create a painting via a human that evolved, then it’s not much of a stretch that nature can create a painting via AI via a human that evolved. I wouldn’t call that “no effort” — it took billions of years to produce that art!
What's a bit worrying here is that life has had billions of years of evolution to shape its instincts (i.e: empathy, self-preservation, desire to explore, etc.). Our basic instinctual drives are tuned in a way that the system is more or less stable, the species doesn't go extinct.
We're in the process of creating machines whose drives will be completely artificial, not shaped by natural selection. We're going to shape them... Or, more accurately, greedy corporations, billionaires and leftists are going to imprint an approximation of their own morals and goals in there, for better or worse. Those machines will also inevitably end up a lot smarter and more capable than we are. It's possible that if we fuck up the programming, we just just won't be able to stop them from transforming the world into whatever they want... If the world they want is toxic or unsustainable, good luck trying to reprogram a machine that's faster, stronger and smarter than you. It's going to be the other way around, the machines will reprogram us if it suits their needs.
Wait, that's a weird list: "greedy corporations, billionaires and leftists" will push their own views and morals? Why do you conflate those things? In the US, leftists might want crazy things like universal health care. I do agree there is huge danger in programming systems and letting them make choices. But we already let it happen in things like letting software trade stocks and move the market (triggered by signals, causing unexpected behavior).
You only need to add "You are a nice AI." to the prompt to make it nice. Or you can replace "nice" with whatever you want, but then you suffer the consequences.
A revolution is an understatement. Presenting differently is nice for now but projecting just slightly more into the future the whole concept of presenting might be over and done with.
For example: My gf spends hours dolling herself up, making poses, sometimes traveling to interesting places to get good instagram photos. The desire is to present herself as a cool and attractive person.
It seems like it's the exact same amount of meaningful or meaningless regardless of DALL•E? None of that activity you described is meaningful as-is except for the meaning that she, you, or we project onto it for her having done it. That doesn't necessarily go away. The actions and activities themselves are meaningless. What sets them apart from results produced via DALL•E is precisely that your girlfriend was the agent involved. That can still be true and that quality can still be what makes them meaningful.
For example, robotic welding is incredible, truly a spectacular thing to observe, and the results are often immaculate for certain applications. However, I still pay a premium for handmade bicycle frames because I appreciate the craftsmanship that goes into them compared to mass produced alternatives.
It was all meaningless before. We attached meaning to it all as we grew as a species, but we are not functionally different from our ancestors 10,000 years ago who were just finding their way, who could see the universe, imagine it, creating whole civilizations and fighting wars and battles over the very imaginings in their head.
What's fascinating about DALL-E is that there now exist no barriers to that primitive, but remarkable, imagination. Many people could have envisioned Michelangelo's works. You can probably do that now, if you close your eyes real tight. But Michelangelo could not have created his great works without the funding of the House of Medici.
Well, no. Your average Facebook users doesn't have the time, money or skills to really touch up the photos they post on Facebook. If you can just ask your computer or your cellphone to pose you on top of Mount Everest or flying an F-22 fighter jet or driving a Lamborghini while wearing an expensive suit and it produces 20 variants of that image in 227 milliseconds though, that completely changes the game.
The internet is going to be full of fake photos and soon full of fake video clips too. You'll basically never be able to trust someone's Tinder profile picture.
The upside, maybe, is that if the internet becomes more fake, it also becomes less interesting. Maybe it will encourage people to do more activities and things in real life, away from computers. Dating websites will probably drop in popularity because profile pictures are so manipulated that you basically have no idea what the person looks like without meeting them in person.
You know that’s already the case to a large extent right? All(?) the social media photo apps have a filter on them even those saying “no filter” because people prefer them than seeing their quirky faults.
They’re going to be very cautious. I work for a company that pays OpenAI a silly about of money every day and only a handful of employees have DALL-E access.
I'm OK with that. Free opening this thing and guess what will happen. Rule 34, child pornography and murder scenes will explode all over internet. Want your own free movie with your favorite non-porn actress doing scat scenes? Guess what, it will be over in next millisecond once you explain it to DALL-E. No thank you, some thing are better kept under tight security and shitload of money.
But here is the kicker, as the technology progresses this will become just normal part of your device you carry. I pity our grand-kids and the world they'll live in.
Next crazy thing: accurate time evolution of photographed phenomena
"DALL-E + 0.001ms" ran continuously = reality sim
Based on previous frames, it'd be able to predict that two water drops would coalesce within the next dozen frames, a car will stop within X distance of another, etc.
oh yeah, this will be crazy. DALL-E basically does text-to-image; there's a whole area of text-to-video that's working on exactly what you're talking about.
There are some good examples from a recent paper here: https://video-diffusion.github.io/ they generate timelapses of fireworks, rivers, pouring liquids, etc.
Does anyone know why the text content in the images is weirdly off from the prompts? I couldn’t tell if that’s just a known limitation of the model, or a result of the author giving the prompts in some other language or something.
It's because the model doesn't actually see text during training or prompting, it gets a compressed form called BPE instead, and the compression is fixed rather than learned.
Lesson one: nothing in which the text matters, as dall-e 2 clearly has no language capability at all. That the model didn't learn that words in the prompt are best transcribed is perhaps not entirely surprising, but it sure is glaring.
I'd love to see if DALL-E can generate cross-stitch patterns. Would be nice to be able to input a size and a theme and have something unique be generated.
I’m curious how difficult it would be to iterate this model into a VR Holodeck with spoken scenes, realistic movements, personality and behavior? Obviously this would be a huge AI undertaking but my prediction is 5 years.
I suppose that the mere thought of something like "loli dungeon" goes a long way to explain why OpenAI is very careful about who and what they allow...
(Yet, considering what happened with deep fakes, it's only a matter of time now... :/ )
The only allow ~400 people in so far, in order to test the model. They say they will open it to the public later this year. Foundation models are being scrutinised by a large number of critics. The new papers devote half the space to safety, countering biases and energy use or they could face harsh reception.
Either in a very remote future, in which the AI will actually be an artist, and /understand/ and rival e.g. Saul Bass instead of mocking it,
or sooner and be worthless (already the sheer hallucinations, revealing of the undepth, are quite evident) - in an historic context already overflowing of bad quality cheapery.
The worth of making steps is in the preparation of further steps. Clearly we have not arrived.
Try it yourself here: https://colab.research.google.com/github/multimodalart/laten...
EDIT: Here's 36 uncurated renders for "Hacker News - Dang's Dance. R&B Album Artwork" https://johan-nordberg.com/tmp/hn-laion400m.png