If you use my copyrighted code to train your brain and reproduce exact copies of...

lelag · on Aug 31, 2022

Indeed. A difference I see between the GPT-like language models and those image generation models is that it feels like the language models actually hold full copies of a lot of sample of the training dataset (hence their ability to recite existing content), whereas the image generation models clearly do not: Stable Diffusion is 4Gb something, yet it can draw anything.

That's the amazing part: the training dataset contains 5B images. Yet it distilled all of them in this mere 4Gb of data and can produce an infinite amount of content with those. It really feel like it learned how to draw, in the same sense a human learn and does not simply reproduce exact copy of what it already saw.

earth_walker · on Aug 31, 2022

I see what you mean about GPT returning copies. But is there really a difference if your model just happens to 'calculate' an exact copy instead?

For example, if you enter in something simple and high profile, what it returns looks pretty close to an existing work...

Try, for example, typing in "Banksy", "Brad Pitt" or "Starbucks".

So if you type in "Photo of a Coffee", "Impressionist painting of a fruit bowl" or "Blue canvas with one red line in the middle", how do you guarantee that the image you get back isn't actually a copy of someone's work?

lurquer · on Aug 31, 2022

Copyright - at least for images - is going to become moot shortly.

Copyright exists to allow an artist time to reap the benefits of his labor. Without this time, they say, no rational person would invest his labor into making art. Dubious… but, whatever. The point is, if you remove the ‘labor’ from the mix, there’s no need for copyright.

If I can produce spectacular images with zero individual labor, there will be little reason for me to copy someone else’s work.

anonAndOn · on Aug 31, 2022

> there will be little reason for me to copy someone else’s work

Can I interest you in some recently discovered DaVinci paintings? 10% off when you buy 2 or more.

kixiQu · on Aug 31, 2022

Haven't we seen a lot of examples of these things vomiting up Starry Night on demand? Isn't that a clear case of memorization?

scarmig · on Aug 31, 2022

Midjourney did this one:

https://i0.wp.com/www.technollama.co.uk/wp-content/uploads/2...

Not really a copy. I'm sure you could cajole it into something very similar to the actual piece, but at that point it's more the model + the extensive prompt instead of the model itself.

majou · on Aug 31, 2022

32Gb (4GB)

thunderbird120 · on Aug 31, 2022

Generative models attempt to model their training data. Essentially, they try to be a model of the underlying data distribution from which all samples in the training data were drawn from. A model of that distribution which cannot reproduce all samples from the original training data given the right prompting/query/seed is an incomplete model by definition. If it can't reproduce all samples drawn from the original distribution then it clearly does not model the same distribution those samples were drawn from.

That said, this is very very different from just copying at a conceptual level. This is going to end up being a an interesting legal question going forward. I'm curious to see how it turns out.