> When training a model you are deriving a function that takes some input and produces an output. The issue with copyright and licensing here is that a copy is made and reproduced numerous times when training.
How's that any different from what happens inside a human's brain when learning?
> The model is not walking around a museum where it is an authorized viewing.
The training data could well be from an online museum. And the idea that viewing something public has to be "authorized" is very insidious.
> The further issue is that it may output material that competes with the original.
It is different from a human brain in that it is not a human brain. It is a statistical function that produces some optimized outputs for some inputs.
I have made no mention of things being authorized in public. In the US you are allowed to take a photo of anything you want in public. These models are not being trained on datasets collected wholly in public though, it is very insidious to suggest that they are.
The internet is not "the public". It is a series of digital properties that define terms for interacting with them. Now, a lot of material is publicly accessible online, but that does not mean that it is not still governed by copyright. For example, my code on Github is publicly accessible, but that doesn't mean you can disregard the license.
If you use this copyrighted material to produce a product for commercial gain you will likely face a fair use test in court. If you use it for a non-commercial cause with public benefit you could probably pass that fair use test. Open source will do very well because of this.
The model is not a human though, and very often these are not "public" works that it is trained on.
> It is a statistical function that produces some optimized outputs for some inputs.
So is a human mind.
> In the US you are allowed to take a photo of anything you want in public. These models are not being trained on datasets collected wholly in public though, it is very insidious to suggest that they are.
How so? What non-public training data are they using, and why does it matter?
> The internet is not "the public". It is a series of digital properties that define terms for interacting with them. Now, a lot of material is publicly accessible online, but that does not mean that it is not still governed by copyright. For example, my code on Github is publicly accessible, but that doesn't mean you can disregard the license.
It does mean you can read the code and learn from it without concern for the license (morally, if not legally).
>> When training a model you are deriving a function that takes some input and produces an output. The issue with copyright and licensing here is that a copy is made and reproduced numerous times when training.
>How's that any different from what happens inside a human's brain when learning?
I don't know, nor does anyone else. So let me ask you - how is that the same as what happens inside a human's brain when learning?
We don't know the details. But it's pretty implausible that the process of learning wouldn't involve the brain having some representation of the thing it's learning, or wouldn't involve repeatedly "copying" that representation. Every way we know of processing data works like that. (OK, there are theoretical notions of reversible computation - but it's more complex and less effective than the regular kind, so it seems very unlikely the brain would operate that way)
And a human who has learned to perform a task has certainly "derived a function that takes some input and produces an output".
> But it's pretty implausible that the process of learning wouldn't involve the brain having some representation of the thing it's learning, or wouldn't involve repeatedly "copying" that representation.
I think you can easily make a stronger statement:
We do know that art students spend many hours literally tracing other images in order to learn to draw. We do know that repetition is how the brain improves over time.
Based on that, seems pretty clear to me that the other commenters here would agree (regardless what the brain does internally) that at a minimum, art students are violating copyright many, many, times in order to learn.
How's that any different from what happens inside a human's brain when learning?
> The model is not walking around a museum where it is an authorized viewing.
The training data could well be from an online museum. And the idea that viewing something public has to be "authorized" is very insidious.
> The further issue is that it may output material that competes with the original.
So might a human student.