Hacker News new | past | comments | ask | show | jobs | submit | nnvvhh's comments login

Current US copyright law is not clearly in a place to view model training as infringement. Courts have a long history of permissiveness in the face of copyright challenges to new tech (e.g. the image search engine cases, Google v. Oracle and smartphones, Sony v. Universal and VCRs) and I predict it will happen again with AI. The cat is out of the bag and judges know that finding training to be infringement of each training example will have a negative impact on a new product category. If training was more obviously infringement then that permissiveness would be harder to sell, but in my opinion it's really difficult to argue that a "copy" of an example has been made during training (aside from the copy made to process the example).


I'm not so sure about this. I'm not saying courts in the US will rule one way or the other, I'm just saying it's certainly not a forgone conclusion that training is fair use. Even if it is, the companies might not have sone their due diligence.

Lots of data they trained on is available for purchase (e.g. artists often sell prints or reproduction rights, the books in books3 are widely available, etc). It's my understanding that companies like Stability and OpenAI did not attempt to determine if the data they trained on was available for purchase and then buy a legally purchased copy for training. That might cause them to run afoul of fair use doctrine in the US (not sure of other jurisdictions).

See these excerpts describing fair use for copying library materials [1] (many of these collections are being released by groups referring to themselves as libraries):

> Copying a complete work from the library collection is prohibited unless the work is not available at a “fair price.” This is generally the case when the work is out of print and used copies are not available at a reasonable price. If a work, located within the library’s collection, is available at a reasonable price, the library may reproduce one article or other contribution to a copyrighted collection or periodical issue, or a small part of any other copyrighted work, for example, a chapter from a book. This right to copy does not apply if the library is aware that the copying of a work (available at a fair price) is systematic. For example, if 30 different members of one class are requesting a copy of the same article, the library has reason to believe that the instructor is trying to avoid seeking permission for 30 copies.

> The copying, whether performed by the library or whether unsupervised by the library patron, cannot be for a commercial advantage. This means that the library (or a copying service hired by the library) cannot profit from the copying. In addition, the copying for the patron must be done for purposes of private study, scholarship, or research.

[1]: https://fairuse.stanford.edu/overview/academic-and-education...


The availability of the copyrighted works is not determinative. Fair use in the US takes (at minimum) four factors into account, listed in the federal copyright statute: https://www.law.cornell.edu/uscode/text/17/107.

That quote from Stanford's library is not discussing fair use doctrine in general, but rather is stating what is permitted in those specific circumstances. There are plenty of instances of fair use where the underlying work used was available at a fair price. That's the whole point of fair use law: some use of a work that is facially infringement escapes liability because the particular use is considered fair.


Another aspect of this arrangement: you don't pay federal income tax on money you receive as a loan in the US. The money does not count as income because of the matching obligation to pay it back.


Said differently, it doesn't count as income because it's not income.

I borrowed just shy of a million dollars to buy my house. That money sure wasn't income...


Is there any county where a loan is counted as income?


Nothing you mentioned has anything to do with communism AS AN IDEOLOGY, though. So why discuss it? It could have been any group based on the logic of your comment.


Yonatan rightly categorizes Nazism as an ideology centered around race. Unfortunately, he then veers off to finger-pointing modern day political discourse as a means to justify "punching Nazis". My previous comment was aimed at the crowd that thinks " punching Nazis" is a zero-sum game of villain and hero. It has more nuance, and should be understood in a previous historical context.


I wrote a law school paper discussing the potential liability stemming from Copilot: https://nickvh.com/blog/archive/2022/02/copilot/copilot.html


IANAL or a law student.

Good paper, though I disagree with many points.

However, two things: you claim that Copilot might have infringed during training because of the GPL, but the GPL's clause only kicks in on distribution. If GitHub had trained Copilot and then did nothing else, that clause would not kick in.

Second, your Stack Overflow example of copying is not good because SO has an explicit license for all material given. You must agree to let SO and others have the material under that license, which is CC-BY-SA 4.0.

Don't get me wrong; I hate Copilot. But those two things you said were wrong.


Thanks for engaging with the paper.

I'm too lazy to check the GPL comment (I'll assume I made a mistake). But as far as I can tell my only reference to Stack Overflow was not about liability based on copying from SO. I was making a comment about a common industry practice.


I think some countries use a different tradeoff than the US for their income tax. Instead of spending a ton of government and taxpayer time and effort to accurately assess what each taxpayer owes, the government simply generates an estimate based on what it knows about each taxpayer and uses that. They accept the reduced accuracy but it is offset by the reduced effort in determining everyone's bills. That is to say, you don't need to move away from an income tax to address the problem.



Copyright exists upon creation, registration is required to sue. They don't need to register them all.


Registration is not required to sue, but definitely makes it easier.


You're mistaken about who has the burden to prove actual copying. The person alleging infringement has to show that the defendant copied the work.


Maybe in the courts, but you can get your youtube channel shutdown or even be permanently kicked off your ISP (and the internet, unless you have more than one option) based on nothing but unsubstantiated accusations that you've violated copyright


True, but then it doesn't matter if you produced any content or not.


It can, depending on the nature of the content and how creators distribute it.

Anyone can find themselves forced to defend their innocence after being hit with DMCA notices, but content creators are especially vulnerable because they depend on their works being publicly available. They have a lot more at stake (including the ability to eat and pay rent) if they are hit with accusations.

They are the ones who'll be hit by poorly implemented algorithms (like Google's notoriously terrible content ID or the many even worse imitations created by companies without the talent/money/data google has at their disposal) or can be targeted specifically because their creative works are perceived as a threat to some company.

The DMCA is often abused to silence criticism, to hide unpleasant information about a company or individual from the public, or to attack competitors. It costs very little to fire off accusations that have real-world consequences and as far as I know, no major company has ever been held accountable for doing so inappropriately, even when it's been brought to the attention of courts.


The standard in a US civil trial is preponderance of evidence, though, so all you have to do is convince the judge that it is more likely than not that the defendant copied the work.


Sure, but where the burden lies still matters. The plaintiff needing to prove copying is a lot more defendant-friendly than the defendant needing to disprove copying.


Congress has a lot of room within a grant of power (e.g. "to promote the progress...") because of the necessary and proper clause. In short, the tight relationship you're demanding is not required.


I've read that clause and at first sight it would seem that it too is a restriction but clearly it was not interpreted that way historically or when it is being cited. How is there so much consensus in the interpretation?


The simple answer is that the Supreme Court of the United States said as much. Lower courts are bound to follow SCOTUS' holdings. Of course, precedent can always be overturned (i.e. if SCOTUS changes its mind), but the N&P Clause's interpretation is unlikely to be changed.


How would this lead to an antitrust claim?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: