You can run Stable Diffusion on an iPhone 11 and it completes in under a minute or two. Running on CPU generally takes around 5 minutes. My almost top of the line macbook runs a batch of four in around 30 seconds on Metal, and I'm sure it's much faster with a mid-range GPU considering how unoptimized Metal is with Torch. And yes, you can go take a look around reddit and 4chan, the vast majority of those aren't dreambooth/MJ/remote models.
That's not even taking into account local LoRAs and scripts that are possible instead of some company's untweakable crap. The open source around this is healthy, has pushed past DALL-E, and there's no real roadblock to Open Source LLMs except of course, the training cost. Even still, people are getting $200k+ models in their hands for free from various training runs and donated computing and LoRAing them and fine tuning them all to make them comparable to the closed off remote models.
Any "cryptographic" scheme with the generations of these will just catch the lazy. The lazy already include the confabulated sources in their papers, and don't try to normalize the Error Level Analysis in generated images (probably the quickest way to determine whether an image is generated), so I don't think it's actually a net benefit. It's a cat and mouse game, and will push the mice further into the walls.
You can't possibly say that generative images like this are "poor quality"
Again, poor quality or slow on conventional hardware.
The first of your examples was generated on a desktop computer with 2080 Ti and even then still glaring uncanny hands. We don't know how long it took but I think the reason for the hands is that it's too slow to generate a dozen of these in hopes that hands would come out right.
The other one I can see done on any laptop in a few minutes, but it's more primitive and just a monochrome sketch. I skip over obvious issues e.g. with shape of glasses.
For both examples you don't need any specialized tools or watermarking to notice this stuff.
Maybe you see what I mean why indie homegrown AI is not such a big deal ;) Sure there are people who will invest in hardware but those people will are not and for now won't be mainstream enough to matter. Especially if it will be licensed, most people don't like to violate laws. Most people will just use chatgpt or dall-e.
I don't see what your point is. The regulatory capture going on right now with the attempt to license things is ludicrous and akin to licensing matrix multiplications. It won't stick. Stable Diffusion and Approximated Functions (neural networks) are not something magical despite the fear they want to impart on them.
Commercial AI has all of those issues you mentioned, and more, and less. Midjourney is just a bunch of LoRAs layered on top and scripting to generate the images. But since they do that, midjourney images have a specific "feel" that it can't seem to get rid of. It's nothing really out of the reach for someone sufficiently motivated to reproduce.
DALL-E is laughable now, and it's only been a year. Certainly has been surpassed by open source, and outside competitors. I'm not sure what your motivation is to discount open source. People are already running LLM inference on their phones.
My point: 99% of people will use chatgpt etc. because homebrew alternatives are either bad (easy to detect with naked eye) or slow. Probably Microsoft will also make sure no competitors can offer good enough AI by pushing for regulation. So if those big platforms are required to watermark/detect own AI results that's good enough. Remaining 1% of crazy people don't count.
That midjourney et al are also detectable to the naked eye.
Why do you need to watermark them, again? The error level analysis is off the charts with generative images. They light up like a Christmas tree. Just because you and uninformed legislators and journalists don't know how to check the ELA of an image, doesn't mean they're undetectable. And the cheaters include the bogus sources spit out by ChatGPT already. The cryptographic qualities will be lost as soon as an editor gets their hands on it, automated editing or not. It's a cat and mouse game.
And also, I find it telling that you think if someone doesn't have high end hardware, they're going to pay $20/mo to OpenAI. For $20/mo, you can buy a mid-range video card and write it off. For an extra $10/mo, you can deprecate the cost and buy a high end laptop for that price, if you're a professional, and you're not locked into OpenAI. You're also assuming that 1. hardware doesn't get better and 2. techniques don't improve to run them on limited hardware.
> That midjourney et al are also detectable to the naked eye.
Again, either too slow, requiring outrageous hardware, or obviously noticeable. So far no examples to the contrary.
Don't forget, the topic is using special measures to detect undetectable with naked eye. When you can simply see the screwed up hands on a photo it's not even necessary.
> Why do you need to watermark them, again?
Why do you think I need to watermark them again?
> a mid-range video card
and a PC to put it in, a space to put the PC in, etc. With a laptop we're back in wait for an hour to see a result.
> hardware doesn't get better and 2. techniques don't improve to run them on limited hardware
We can revisit this if it consumer hardware gets good enough...
>With a laptop we're back in wait for an hour to see a result.
Any laptop within the last five years with decent memory can run stable diffusion on the cpu in around 12 minutes. My MacBook Pro runs a batch of four on Metal in around 30 seconds.
>We can revisit this if it consumer hardware gets good enough...
I mean, I just showed you a quantized llama running on a Pixel 5 and 6. And, I wouldn't discount most of the next generation of hardware having ML co processing like MacBooks and iPhones and Pixels do with all of this hype.
That's not even taking into account local LoRAs and scripts that are possible instead of some company's untweakable crap. The open source around this is healthy, has pushed past DALL-E, and there's no real roadblock to Open Source LLMs except of course, the training cost. Even still, people are getting $200k+ models in their hands for free from various training runs and donated computing and LoRAing them and fine tuning them all to make them comparable to the closed off remote models.
Any "cryptographic" scheme with the generations of these will just catch the lazy. The lazy already include the confabulated sources in their papers, and don't try to normalize the Error Level Analysis in generated images (probably the quickest way to determine whether an image is generated), so I don't think it's actually a net benefit. It's a cat and mouse game, and will push the mice further into the walls.
You can't possibly say that generative images like this are "poor quality"
https://www.reddit.com/r/StableDiffusion/comments/131lpks/my...
https://i.imgur.com/3iDf43z.png