Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It isn't copyright infringement to train an AI on information that you obtained legally.

Example: If I put my book on mybook.com, and you download it (legally), you can read it, learn from it, and produce works in a similar style, all without my consent, and copyright offers no tools to restrict that.

The only tool copyright offers to protect against that is distribution.



A.I.'s are subject to copyright (unlike a human being like yourself ) so the A.I. which has been taught is an infringement of the copyright of the artist because the A.I. is a derivitive copyrightable work.

Since copyright doesn't govern your brain's "wetware", the comparison to human behavior is irrelevant.


Two issues with that. Training an AI has thus far been regarded in courts as non-creative, and thus the AI model/weights are not copyrightable, nor does the model sufficiently represent the original work, so it's not a derivative work.


I think you are confusing the issue of whether the A.I. output is creative with whether training the A.I. itself is creative enough for the A.I. software to be copyrightable.

The law has a very broad definition of creativity when a human is involved, even a human taking a picture of the Mona Lisa is considered creative.

Since humans train the A.I., and it takes creativity to design the training scheme, the A.I. would appear to me to clearly be a creative work and copyrightable.

You brought up a seperate issue of whether the model infringes the original work. And whether it's okay because the model doesn't resemble the original.

I don't think that's precisely the accurate legal standard. What's relavant is whether the copying into the model constitutes fair use.

One of the considerations of fair use is whether the use is transformative. But that's only one consideration a court will look at in determining whether the A.I. company can succeed in a fair use defense. I don't believe this is settled law.

But if there's any broad point I want to make, it's that the law does not ever consider a machine or piece of software "like a human" or say "If it's okay for a human to do it it's okay for a piece of software to do it."


I am not confusing that issue. I am stating outright that it is not sufficiently clear that a human is responsible for the neural net weights. The gradient descent may be considered automatic.

Section 313.2 of the U.S Copyright compendium states, "The (copyright) office will not register works produced by a machine or mere mechanical process that operates randomly or automatically without any creative input or intervention from a human author".

Moreover, the Copyright Act's definition of a computer program is “a set of statements or instructions to be used directly or indirectly in a computer to bring about a certain result”. Neural net weights are clearly neither statements nor instructions, so it's unlikely the weights would gain copyright as a computer program.

It's indeterminate whether the SSO doctrine[1] is operable on neural net weights, as it can be difficult to determine in which order weights are used. If those weights were to gain copyright protection, they may not be enforceable on similar sets of weights.

Regarding whether resembling the original work is the legal standard for being considered an infringing work: the test applied by a court is called Substantial Similarity[2], and it's extremely unlikely that neural network weights would be substantially similar to training data when considering the idea-expression divide.

Copying into the model... are you suggesting that if I have an image that's represented by an array in memory, and I copy that array from one memory location to another in my pc, that act of copying is the infringing act?

Fair use may even be a bit non-sequitur in this case. If I look at copyrighted art, I don't have a fair use case, because I haven't incorporated the art into anything, though the act of viewing has changed my brain. Training a neural net on training data may not even be considered incorporation.

On your last broad point... okay? I suppose.

[1] https://en.wikipedia.org/wiki/Structure,_sequence_and_organi... [2] https://en.wikipedia.org/wiki/Substantial_similarity


Some of your points are interesting, but let's back up.

You might want to read this article:

https://www.law.cornell.edu/wex/fixed_in_a_tangible_medium_o...

Your brain is immune from copyright lawsuits as being a derivative work because your brain is not a tangible medium of expression.

Your thoughts are not derivative works under the copyright act because your thoughts, seeing as they are not contained in a tangible medium of expression, are not works at all.

Computers are clearly tangible mediums of expression.

We can argue whether a particular piece of software is copyrightable or whether a dataset is protected under the copyright act's definition of compilations.

But it should not be up for debate whether whatever is happening in your brain when you learn something is in any way relevant in terms of copyright law.

I would also add the idea- expression dichotomy is irrelevent.

If the dataset was merely ideas like "A boy goes to Wizard school" then you might win on that ground. But if you feed it the complete works of Harry Potter, no court is going to claim that the data the machine created based on Harry Potter is an "idea".




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: