> the exception rather than the rule [...] . Girl with a Pearl Earring is so famous that there are countless copies of it on the internet
I think that's part of the point, though.
Even if it's working like some kind of feature detection - "this is what 'nose' looks like [nose1,nose2,...,nose9999]" and "this is what 'girl with a pearl earring vermeer' looks like [vermeer1,vermeer2,...,]"
Clearly it is storing copies of those images internally, and constant re-enforcement of it finding noses and vermeer paintings in the sources pushes the priority of those features up so that it stores them.
We can't trust these models, as they exist right now, to reliably generate new works, rather than spit out something that already exists.
Technically speaking, there can be only so many such reproducible works in the model due to the information-theoretic constraint. But yes, I agree that this is a close enough reproduction of the Vermeer's work, and while this is indeed in public domain, you are right that other copyrighted works may be present in the model, in verbatim and extractable with a right but currently unknown prompt.
The photos of the painting on the internet aren’t original works either. This system is learning from everyone taking pictures. If you paid a very good artist to paint that painting, they’d work from a picture. No artist emerges fully formed without impact of what they’ve seen.
There’s a difference between an artist and a machine (for now), but it’s not unreasonable to assume that some internalisation of others’ work is fair.
Not reliably, no. But at least for this size of model, its capacity to memorize seems to be limited to works that are so popular that 'everyone' knows what they look like. And it also seems to generate them only when asked. So in any situation where a human is reviewing the output, it doesn't seem like there's a big risk of plagiarizing something by accident. Larger models may differ.
I feel like this is one of those occasions where perfect is an enemy of the good. Yes there might be some, very very famous, artworks stored inside it. But unless you think StableAI has completely blown past the current hutter prize winner, there is no way this thing is storing many things. The things it might be storing are too public to really be that big of a deal. It will spit out a Vermeer, but your submission to Divient Art in 2007 is definitely protected.
> there is no way this thing is storing many things.
I didn't claim that it was storing all of the images.
It is, however, quite clearly storing at least some of the original training images. What that definition is, I don't know.
To go back to the original point of this thread:
> The thing is, people have wild misconceptions about what this technology does. Many clearly think it's a photocopier. As if StableDiffusion is hiding copies of all these works inside.
I don't think it's a wild misconception, or even an unreasonable concern to be worried that it could output some of it's training images. As an end-user of the technology, you have no idea.
I think that's part of the point, though.
Even if it's working like some kind of feature detection - "this is what 'nose' looks like [nose1,nose2,...,nose9999]" and "this is what 'girl with a pearl earring vermeer' looks like [vermeer1,vermeer2,...,]"
Clearly it is storing copies of those images internally, and constant re-enforcement of it finding noses and vermeer paintings in the sources pushes the priority of those features up so that it stores them.
We can't trust these models, as they exist right now, to reliably generate new works, rather than spit out something that already exists.