Eh... The mp3 decoder can generate copyrighted music if you feed it the right inputs...
Likewise, in this work they prime the pump by using exact training prompts of highly duplicated training images. And then you have to generate 500 images from that prompt to find 10 duplications. You've really gotta want to find the duplicates, which indicates that these are going to be extremely rare in practice, and even more rare once the training data is hardened against the attack by deduplication.
> Eh... The mp3 decoder can generate copyrighted music if you feed it the right inputs...
Good analogy! An MP3 decoder takes an input and produces an output. If the output is copyrighted material, it's well understood that the inputs is simply a transformed version of that same copyrighted material and is similarly copyrighted.
The SD model is very much analogous. The prompt causes the algorithm to extract some output from the input model. If the output is copyrighted material then similarly the input model must carry a transformed version of that same copyrighted material and is therefore also subject to copyright.
Right?
By the way, I pose this, but I highly doubt this is actually how the courts will rule. I think they'll find the model itself is fine, that the training is subject to a fair use defense, but that the outputs may be subject to copyright if there's substantial similarly to an existing work in the training set.
Likewise, in this work they prime the pump by using exact training prompts of highly duplicated training images. And then you have to generate 500 images from that prompt to find 10 duplications. You've really gotta want to find the duplicates, which indicates that these are going to be extremely rare in practice, and even more rare once the training data is hardened against the attack by deduplication.