Judge dismisses majority of GitHub Copilot copyright claims

KyleBerezin · 2024-08-28T16:36:36 1724862996

I will throw in a random story here about chat gpt 4.0. I'm not commenting on this article directly, just a somewhat related anecdote. I was using chatgpt to help me write some android opengl rendering code. OpenGL can be very esoteric and I haven't touched it for at least 10 years.

Everything was going great and I had a working example, so I decided to look online for some example code to verify I was doing things correctly, and not making any glaring mistakes. It was then that I found an exact line by line copy of what chat gpt had given me. This was before it had the ability to google things, and the code predated openAI. It had even brought across spelling errors in the variables, the only thing it changed was it translated the comments from Spanish to English.

I had always been under the impression that chat gpt just learned from sources, and then gave you a new result based roughly on its sources. I think some of the confounding variables here were, 1. this was a very specific use case and not many examples existed, and 2. all opengl code looks similar, to a point.

The worst part was, there was no license provided for the code or the repo, so it was not legal for me to take the code wholesale like that. I am now much more cautious about asking chat gpt for code, I only have it give me direction now, and no longer use 'sample code' that it produces.

vundercind · 2024-08-28T17:03:40 1724864620

> I had always been under the impression that chat gpt just learned from sources, and then gave you a new result based roughly on its sources. I think some of the confounding variables here were, 1.

If I’ve understood the transformer paper correctly, these things probabilistically guess words based on what they’ve been trained on, with the probabilities weighted dynamically by the prompt, by what they’ve already generated, and by what they “think” they might generate for the next few tokens (they look ahead somewhat), with another set of probability-weight adjustments applied to all that by a statistical guess at which tokens or words are most-significant or important.

None of that would prevent them from spitting out exactly what they’ve seen in training data. Keeping them from doing that a lot requires introducing “noise” to all the statistics stuff above, and maybe a gate after generation that tries to check if what’s been generated is too similar to training data and forces another run (maybe with more noise) if it is, similar to how they prevent them from saying racist stuff or whatever.

devmor · 2024-08-28T18:12:44 1724868764

You have understood correctly. What LLMs are, at least in their current state, is not fundamentally different from a simple markov chain generator.

Technically speaking, it is of course, far more complex. There is some incredible vector math and token rerouting going on; But in terms of how you get output from input - it's still "how often have I seen x in relation to y" at the core level.

They do not learn, they do not think, they do not reason. They are probability engines. If anyone tells you their LLM is not, it has just been painted over in snake oil to appear otherwise.

ben_w · 2024-08-28T19:16:20 1724872580

For all that I agree there's a big pile of snake oil in this area, I disagree with you overall.

Having played with Markov models since I was a kid*, LLMs are really not just that.

All that stuff you acknowledge but then gloss over, that is the actual learning, which tells it which previous tokens are relevant and how much attention to pay to them. This learning creates a world model which is functionally (barely but functionally) performing something approximately reasoning.

Statistics and probability is the mechanism it uses to do this, but that alone doesn't make it a Markov chain, those are a very specific thing that's a lot more limited.

For example: consider a context window of 128k tokens where each token has 64k possible discrete values. If implemented as a Markov chain, this would need a transition matrix of (2^16)^(2^17) by (2^16) entries (unless I've gotten one of the numbers backwards, but you get the idea regardless). This is too many, and because it is too many you have to create a function to approximate those transitions. But even then, that only works as a Markov chain if it's a deterministic function, and the actual behaviour is not deterministic due to the temperature setting not (usually) being zero.

* Commodore 64 user guide aged 6 or so, so I didn't really understand it at the time but that's what it was

ipaddr · 2024-08-29T03:41:16 1724902876

I have the original guide here. Do you remember where you spotted this? Before the bouncing ball sprite demo?

ben_w · 2024-08-29T06:49:17 1724914157

Off the top of my head and 34 years later, sadly not.

Best I can do is describe this code/listing:

It created a few lists of words (IIRC these lists were adjectives, nouns, verbs?), and in the main loop it kept track of which list it had just taken a word from in order to decide what to pick next — e.g. if it had just picked an adjective then was allowed to pick either another adjective or go onto a noun, if it had just picked a noun then it could end the sentence or it could go on to a verb — and it would pick a random word from whichever list.

This either fit entirely on one screen, or very close to that — I was little, I would have made a syntax error if it had been much longer.

(Perhaps it will turn out to be my family's user manual wasn't even the official one, though I do remember it being a thick blue thing which matches the pictures I've seen online).

HeatrayEnjoyer · 2024-08-28T19:35:46 1724873746

You can't write software code without the ability to think. You can't tactfully respond to emotional expression without the ability to think. If that is snake oil then we are all walking, talking, snake oil.

commodoreboxer · 2024-08-29T00:23:14 1724890994

The way we do it is not the same way they do it. They literally only predict the next probable tokens. The way they do it is amazing, and the fact that they can do it as well as they do is amazing, but human thinking is a lot more nuanced than just predicting.

The fact that AI seems to be so reasoned is not that they are doing reasoning, but because there is a phenomenal amount of reasoning inherently embedded in their training data.

AI actually thinks in the same way that the figures on a movie screen actually move. It's a trick, and the difference may be pedantic, but it's very important in order to have a real discussion about the ramifications of it.

rowanG077 · 2024-08-29T03:40:23 1724902823

As far as I know we don't know how we do it. We have very little clue how our higher level behaviours emerge. So you can't claim we don't do it the same way.

codr7 · 2024-08-29T12:30:10 1724934610

Of course I can, humans learn faster from far less data and don't hallucinate to the same extent. What they do is very likely similar to a part of what we do, but they're missing critical components and my feeling is not all of them (empathy and creativity for example) are even possible to replicate outside of a human experience.

rowanG077 · 2024-08-29T13:05:08 1724936708

You are extrapolating from a result to the implementation and making a judgment call it's thus not the same. That is not valid to do. You can come up with countless examples of the same tech underlying principle beeing used but the results are dramatically better now. Lithography for example.

You could also look at a koala and make the argument they function totally differently from us since they almost can't learn anything and are extremely stupid.

codr7 · 2024-08-29T14:35:16 1724942116

I would definitely make the argument that Koalas function differently from us, also from observation, which is a perfectly fine thing to do.

You're so wound up in your theories that you're not making much sense to me.

MLskynetio · 2024-08-29T07:36:46 1724917006

We learn our behaviours by immitating others.

You can clearly see behaviourol pattern in people and in their parents.

For example the boy who brushes his teeth the same way his father does.

I'm really lost on what you think your brain is doing? Have you never thought through things but acted differently? Like procrastination? Spouting out something and thinking after "ah man i should have just done x instead of y'?

tbrownaw · 2024-08-29T02:20:37 1724898037

> You can't write software code without the ability to think.

Counterexample: using genetic algorithms to write corewars programs: http://www.corewars.org/docs/evolving_warriors.html

> You can't tactfully respond to emotional expression without the ability to think.

Counterexample: ELIZA.

kjkjadksj · 2024-08-29T05:56:20 1724910980

If there are 8 blue beads and 2 red beads in a jar, and I ask the computer to draw a bead out of the jar and its a blue one that it has drawn, did it really think about giving me the bead?

latexr · 2024-08-29T09:31:08 1724923868

> You can't write software code without the ability to think. You can't tactfully respond to emotional expression without the ability to think.

Of course you can.

https://en.wikipedia.org/wiki/Chinese_room

They’re not responding “tactfully”, you’re projecting emotion to a bunch of words written coldly.

It’s like writing a program that has a number of fixed strings like “I feel sad” or “I’m depressed” and when it sees those it outputs “I’m sorry to hear that. I’m here for you and love you”. The words may be comforting and come at the right time, but there’s no feeling or thought put into them.

HeatrayEnjoyer · 2024-08-31T02:20:53 1725070853

>but there’s no feeling or thought put into them.

How do you empirically measure this?

consteval · 2024-09-03T20:31:42 1725395502

Humans can measure feelings, computers can't. Therefore I can say if ChatGPT doesn't have enough feeling but it can never do the inverse to me.

That feels simplistic, but we're dealing with fundamentally human concepts. I see absolutely no reason to work under the assumption that computer programs are somehow in the same domain as human thought, which is what a lot of people (you) are saying.

The goal should not be to demonstrate ChatGPT and Humans are different, because to me that is obvious and should be the starting point. Rather we should do the inverse, show that ChatGPT is indistinguishable from a person, as measured by Humans. And then, maybe, we can consider granting this computer program human rights like the right to use copyrightable media in a transformative way.

Ah, but that is really hard to do. So the AI tech bros don't do it, and instead work in the opposite direction.

MLskynetio · 2024-08-29T07:31:57 1724916717

They learned already at the beginning. Its called training.

Its the same thing we humans do, just a lot faster and focused on the content we give it.

'think' what is thinking? Recalling what you learned?

Wiki says: "Their most paradigmatic forms are judging, reasoning, concept formation, problem solving, and deliberation. But other mental processes, like considering an idea, memory, or imagination, are also often included"

Talk to an LLM, it will reflect these concepts very well.

'reason': even people don't reason. I had plenty of discussions with people who do not act logical. And there have been plenty of good examples of LLMs leanring to reason. Look at Grok2 and just wait for GPT 5.

You put the achievement of LLM down as its nothing despite the fact that it could mean a lot more. How big is the chance that we are also just probaiblity engines?

We as humans are more individual than a LLM and we do have more mechanism in our brains like time components, emotional interactions, social systems.

And not even your mentioning of "Markov chain" is correct: A LLM Architecture is not how a markov chain works otherwise we wouldn't have the scaling issues we have with the LLMs...

tbrownaw · 2024-08-29T02:14:33 1724897673

> What LLMs are, at least in their current state, is not fundamentally different from a simple markov chain generator.

The output can contain words that are only in the prompt and not the training set. (I asked the Bing chatbot about a "word" I got from /dev/urandom)

I don't think a proper Markov chain can handle that sort of thing?

codedokode · 2024-08-28T17:36:03 1724866563

I remember similar news about ML services that generate mnusic: they are able to reproduce melodies and lyric from copyrighted songs (if you find a way around filters on song or artist title) and even producer tags in Hip-Hop tracks.

All this latest ML growth is built on massive copyright violations.

tomp · 2024-08-28T18:21:29 1724869289

It’s not a copyright violation.

Maybe not myself, but many averagely-talented artists can draw Mickey Mouse.

They might even draw one for me if I ask! Or I can just find it on Google… (technically my computer is producing it on the screen…)

That in itself is not a copyright violation. But if I use it, in a commercial manner, then it becomes a copyright violation.

Producing copyrighted things isn’t illegal. It’s on the user to not use copyrighted things, in a way that’s illegal (not fair use or licenced).

codedokode · 2024-08-28T21:08:46 1724879326

I wanted to note that you cannot compare learning by a human and "ML learning" which basically is calculating coefficients from copyrighted materials. Don't those coefficients fall under definition of a "derivative work" by the way?

ML models are not "learning" in the same way as human do, and while they use the misleading word "learning" is has completely different meaning; also, ML models are not humans and therefore they are not subject of laws; the engineers who perform calculations are.

So comparing calculation of ML model parameters to a person studying art is incorrect; you should compare engineers performing calculations with data from copyrighted material to a person studying art. It is immediately obvious that there cases are not equivalent. And those engineers are not learning anything in the process so the cannot use the analogy as an excuse.

codedokode · 2024-08-28T18:54:52 1724871292

The fact that those service can reproduce copyrighted content proves that it was used during training. And was it legally obtained? How do you think, services like Udio bought millions of CDs? Or they got the training material somewhere else? You cannot legally download content from streaming services for example.

camgunz · 2024-08-29T10:46:20 1724928380

> Producing copyrighted things isn’t illegal.

There's no difference between producing and copying. Let's take a real world example: music samples. There's an entire clearinghouse process for music sampling, more or less forged after the 80s/90s when sampling blew up. Record companies and artist were like, "hey that's my song", courts agreed, a market was created.

This is pretty analogous to what's happening now, which is code samples. Developers are like, "hey that's my code". But that's where we're diverging, and this is probably because big companies aren't involved. People were sampling Atlantic Records' stuff. People aren't sampling Microsoft's stuff, they're sampling random GitHub OSS project guy's stuff.

But to your point, you're basically arguing that it's fine as long as no one listens to "Bitter Sweet Symphony". Most people think it's not the end user (listener) who's infringing copyright, but the party doing the copying (The Verve). Even if we accept your principle here, you're putting way too heavy a burden on people who use services like Copilot. Am I supposed to check that everything I autocomplete is properly licensed? You more or less said "shut these services down" in so many words.

spmurrayzzz · 2024-08-28T19:03:34 1724871814

> Producing copyrighted things isn’t illegal

That isn't true, strictly speaking. The right to reproduce work is covered by copyright law, irrespective of whether the reproduction is commercial or not.

kelnos · 2024-08-28T18:38:15 1724870295

So what? Most uses when we're talking about code or artwork are going to involve someone taking the generated result and publishing it somewhere.

> But if I use it, in a commercial manner, then it becomes a copyright violation.

No, that's incorrect. Commercial use has nothing to do with it. Any act of distribution, regardless of whether or not it's for commercial or personal use, regardless of whether you charge $100,000 or $0, falls under copyright law.

Maxatar · 2024-08-28T19:14:57 1724872497

You can usually dismiss anyone who talks about the law in such absolute terms, especially when it comes to copyright.

The U.S. Copyright Office has some guidelines about fair use, and non-commercial as well as personal use are listed as considerations that courts take into account when judging whether an unlicensed copy constitutes an infringement:

https://www.copyright.gov/fair-use/

johnnyanmac · 2024-08-28T20:04:41 1724875481

You're technically correct, but this ignores some unfortunate realities of DMCA for the smaller fry.

The bigger fry are and will keep fighting to change the ideas of copryright, now that it's inconvinient for them after spending decades strenghening it.

Maxatar · 2024-08-28T21:23:07 1724880187

No one who draws Mickey Mouse for personal use is getting prosecuted or litigated, big or small.

johnnyanmac · 2024-08-28T21:40:49 1724881249

I pretty much said that, yes. But the argument of "personal use" ends when you post on the internet. Which is what most "big fries" are doing as some endpoint.

But those are more "medium fries" anyway. The "big fries" aren't gonna take any risk whatsoever unless they are going the edgy parody route (someone like Adult Swim). There's little upside to Bungie or Microsoft or Laika posting Mickey mouse to begin with

Maxatar · 2024-08-28T22:11:30 1724883090

You are making it seem like people aren't getting sued for drawing Mickey Mouse for personal use because Disney would rather go after the big guys.

My point is that it is perfectly legal to draw Mickey Mouse for your own personal use because it does not violate the law.

johnnyanmac · 2024-08-28T23:08:12 1724886492

I see. We're in two threads and looping. I replied to this here: https://news.ycombinator.com/item?id=41383588

In summary, you probably can and Disney won't care, but it is technically not allowed outside of fair use constraints. It's not legal in the same realm that it's illegal to ride a bike in a swimming pool in Baldwin Park,CA. Key point: don't make money, don't be stupid, and don't be unlikeable.

Even then, most platforms control the content and they probably won't defend you and would take it down. That's not a legal matter so much as a platform policy.

>people aren't getting sued for drawing Mickey Mouse for personal use because Disney would rather go after the big guys.

It's even simpler than that. If a lawyer costs $10k to run a small claims case, and they have a shaky chance (fair use) or a low payout, it's not profitable to go after you.

That's why other factors need to come into play, like potential brand damage, scaring off imitators, or simply pissing off the wrong lawyer somehow.

codedokode · 2024-08-28T18:52:35 1724871155

Technically torrent sites do not host copyrighted content but you can go to jail for this.

BobaFloutist · 2024-08-28T19:02:37 1724871757

But the AI is a commercial product, that is able to produce copyrighted material. How is that not using copyrighted material for commercial purposes?

amanaplanacanal · 2024-08-29T02:15:24 1724897724

A cassette recorder is a commercial product that is able to produce copyrighted material. If the user directs it to.

skydhash · 2024-08-29T10:55:09 1724928909

There was no media used as the material for the cassete recorder.

johnnyanmac · 2024-08-28T19:58:47 1724875127

yes, and no. Stricly speaking you need a licesnse to allow you to use copyright information. Even if it's non-commercial (obvious example, if you post, say, Jasmine making out with Hitler on some non-monetized account, Disney can take that down. Or try. It really comes down to if the platform wants to argue fair use or parody or whatnot. Most won't defend you, though).

But enforcement-wise, Disney won't bother going after every copyright potential. They will focus on the biggest money makers or the biggest potential to brand damage. So it's not a worry for most people who will just draw some mickey mouse for a friend, or even private client as long as they aren't stupid.

nox101 · 2024-08-29T03:13:25 1724901205

What were you asking it to do? If it's less than 15 lines, tell me and I'll write the solution myself. You can check how close I get.

My point being that lots of OpenGL is practically boilerplate.

ativzzz · 2024-08-28T18:41:05 1724870465

> there was no license provided for the code or the repo

Interesting - I assume any code that's not licensed as "free to use for whatever purpose I want"

_rend · 2024-08-28T18:43:12 1724870592

Not at all: unless a license is provided, the code is fully protected under copyright and you have _no_ rights to copy it or use it in _any_ way you want (unless falling under "fair use" clauses for the jurisdiction you're in/the author is in).

paulddraper · 2024-08-29T00:20:35 1724890835

The opposite

johnnyanmac · 2024-08-28T20:06:32 1724875592

no license[0] is the default fallback when nothing is provided. Realistically, it's "use at your own risk", because someone who doesn't license may not even be aware of what others do with it (or you fallback to whatever rules of the platform you posted on).

https://choosealicense.com/no-permission/

skybrian · 2024-08-28T18:50:12 1724871012

Uh, no, this is not how copyright law works.

rlpb · 2024-08-29T13:50:09 1724939409

If there's only one way to do it, or a developer familiar with the field would independently come up with the same way of doing it, then the "copyrightability" of the result comes into question.

This doesn't stop you getting yourself a legal headache though, of course.

MLskynetio · 2024-08-29T07:17:43 1724915863

I do not understand your problem.

Do you wanna life in fear as a software developer because you did the same thing as others? Even if the problem has basically not an unlimited way of doing it?

Do you want a certain amout of code being (c) and patent?

I personally don't. I see an advantage of limited patents for complex magical algorithms were someone was really sitting there and solving a hard problem to reap the benefits for a short period of time, but otherwise no.

I do not want to check every code block for some (c) or patent.

kreyenborgi · 2024-08-28T18:21:10 1724869270

This has happened quite a few times with me as well, both with chatgpt and phind (phind in particular is often basically stackoverflow with a few variable names changed).

ars · 2024-08-28T18:24:50 1724869490

One way that can happen is if your prompt and context are so specific, that this copied code is the only thing that matches.

This would also imply that this specific question is a rare one, with few examples online for it to train on.

johnnyanmac · 2024-08-28T19:48:03 1724874483

>I think some of the confounding variables here were, 1. this was a very specific use case and not many examples existed, and 2. all opengl code looks similar, to a point.

Yeah, that's why I wouldn't trust AI right now with much except the most basic renderin boilerplate. I'd be so brazen to wager that 90% of the most vauable rendering recipes are prorietary code within some real time rendering studio. Of the remaining, half of that is in some text book and may or may not even be available to scrape online.

LLM's still needs a training set, and I'm not convinced the information even exists to be scraped on the public internet for this kind of stuff (if years of googling has taught me anything).

nl · 2024-08-29T00:20:09 1724890809

The original reporting has more details: https://www.developer-tech.com/news/judge-dismisses-majority...

In particular this:

An amended version of the complaint had taken issue with GitHub’s duplication detection filter, which allows users to “detect and suppress” Copilot suggestions matching public code on GitHub.

The developers argued that turning off this filter would “receive identical code” and cited a study showing how AI models can “memorise” and reproduce parts of their training data, potentially including copyrighted code.

However, Judge Tigar found these arguments unconvincing. He determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work. The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”

I think this is the key point: reproduction is the issue, not training. And as noted in the study[1] reproduction doesn't usually happen unless you go to extra lengths to make it.

[1] Not sure but maybe https://dl.acm.org/doi/abs/10.1145/3597503.3639133? Can anyone find the filing?

omeid2 · 2024-08-29T00:43:48 1724892228

> reproduction doesn't usually happen unless you go to extra lengths to make it.

And who is to say that people who want to copy your code without adhering to your license terms or pay won't go to extra lengths? or am I missing something here?

nl · 2024-08-29T01:11:34 1724893894

> And who is to say that people who want to copy your code without adhering to your license terms or pay won't go to extra lengths? or am I missing something here?

It seems like they could just download your code from Github and violate your license like that so.. unclear why they'd bother doing it via copilot.

paulryanrogers · 2024-08-29T02:21:13 1724898073

Copilot cooyright-washes the license. So even we'll behaved devs could unknowingly copy code without complying with the license.

cthalupa · 2024-08-29T06:00:29 1724911229

It's unclear to me that that is the case. If I prompt an image generation model specifically to generate an unlicensed picture of Batman, I do not suspect that a judge is going to be sympathetic to the argument that the t-shirt I was selling was made by DALL-E so I should be free of any copyright infringement claims against me.

Why would specifically prompting the LLM to generate similarly copyrighted code be any different? That's what's being discussed here - that people will "go the extra lengths" to intentionally copy your code.

The "It wasn't me, it was the tool, even though I directed the tool quite specifically to do the thing that happened" is not a new defense in the legal systems of the world. We are already generally are able to deal with these without significant issue.

nl · 2024-08-30T02:31:41 1724985101

It's very unusual this happens for non-boilerplate code unless you deliberately disable the duplicate the duplicate filter.

squigz · 2024-08-29T10:40:44 1724928044

> reproduction doesn't usually happen unless you go to extra lengths to make it.

roenxi · 2024-08-29T03:14:57 1724901297

Although legally there may be a problem, in practice this seems to be a bit of a "so what?" scenario. Our hypothetical dev is using a tool that writes a function. It writes the function. The dev was going to get a function that did what he wanted one way or another so it isn't clear that it makes much difference if the function happens to be a clone of a copyrighted function.

If it were my GPLed code being copied, it doesn't seem to make any difference whether I press copyright claims or not. It won't help me compete against this dev's company. They'd use Copilot to rewrite the code differently. The speed of improvement in code-writing tools doesn't give me much hope that my code is uniquely clever.

strunz · 2024-08-29T02:36:12 1724898972

Because then you can blame it on Copilot instead of outright stealing

umeshunni · 2024-08-29T00:47:45 1724892465

At that point, they might as well copy your original code without going through an LLM to do it

ec109685 · 2024-08-29T00:56:36 1724892996

Wouldn’t the person themselves be in violation at that point and the owner of the code could go after them? (I know this wouldn’t be super practical but it seems to match what would have without an LLM in between).

threecheese · 2024-08-29T02:46:57 1724899617

This is the primary impediment to my large org at $work using llm-based coding tools: potential accidentally duplicated source code and the legal implications (including the risk of a viral copyleft license easter egg).

paulryanrogers · 2024-08-29T02:22:04 1724898124

Possibly. But how would they know unless they searched all the LLM's sources for any snippets it suggested?

williamcotton · 2024-08-29T00:48:04 1724892484

Well then you end up with a work that a judge will allow to proceed to trial.

Only expressive software is protected by copyright and sometimes that interpretation should be handled by a jury.

nox101 · 2024-08-29T03:09:38 1724900978

This is legal for a person to do, right? Why should it not be legal for an LLM to do?

AFAIK but IANAL, I can go look at a solution in a GPLed library and then code that solution in my proprietary code base. As in, "oh, I see, the used a hash map for that and a list of this, and locked at this point. I'll code up something similar". As along as you don't "copy the code" you're fine.

Am I wrong?

Is it just a matter of scale? (hey there LLM, re-write all of OpenOffice into ClosedOffice).

consteval · 2024-09-03T20:36:05 1725395765

> This is legal for a person to do, right? Why should it not be legal for an LLM to do?

Because humans are human. We are rewarded extra super duper rights that inanimate things, such as a computer program, are not rewarded.

I think we should therefore work to demonstrate this is not the case, which is very hard to do. No idea why we would work under the absolutely insane, never-before-seen assumption that computer programs deserve human rights. I don't know where this line of reasoning started, or why, but at it's core it's so unbelievably preposterous (and against the very wellbeing of mankind).

mistercow · 2024-08-29T00:49:25 1724892565

It’s trained on publicly available code, so what would be the point of that? If you’re looking to specifically infringe the copyright of code available on the open web, using an LLM code completion engine is just about the most roundabout and unreliable way to achieve that.

m4rtink · 2024-08-29T01:20:31 1724894431

Isn't this all about turning community developed GPL licensed code into your own (LLM regurgitatetd) code that you can than make proprietary, saving a lot of money and not giving back anything to the original community?

mistercow · 2024-08-29T01:26:40 1724894800

That's not generally how the legal system works. If you've disabled the duplication filter, and then generated substantial amounts of duplicated GPL code and then violated the GPL license, a judge is not going to say "Well, fair enough, you used the loophole of copying code through an LLM, after all." They're going to treat it as a willful copyright violation, the same as if you'd just copied and pasted it.

And again, I have to ask: why? Why would you think that putting tons of time and effort into tricking the model into repeating memorized code is going to be a better investment than just using it normally to implement the functionality you want?

bdw5204 · 2024-08-29T01:36:05 1724895365

It is but "AI" is considered such an "important" technology at the moment that no judge is going to want to be the one that "destroys innovation" by enforcing copyright law. If the perception of the technology changes in the political world in an unfavorable manner, these cases would go the other way or (if there's precedent) they'll pass laws overturning the precedents.

1vuio0pswjnm7 · 2024-08-29T02:17:30 1724897850

"Can anyone find the filing?"

https://arxiv.org/pdf/2202.07646

Personally, I would not rely on blogs like "developer-tech.com" for unbiased information about "AI" litigation.

I would read the order and draw own conclusions.^1 (Note the plaintiffs are attempting to file an interloctory appeal re: the dimissal of the DMCA claims.)

1 https://ia904708.us.archive.org/6/items/gov.uscourts.cand.40...

No doubt I am in the minority but I am more interested in the contract claims, which have survived dismissal, than in the DMCA ones. If plaintiffs can stop the "training" of "AI" through contract, then in theory DMCA violations can be avoided.^2

2 For example, license terms which specifically prohibit using the licensed source code to train language models.

Is there a "fair use" defense to contract liability.^3

3 https://cdn.ca9.uscourts.gov/datastore/opinions/2006/05/16/0...

User23 · 2024-08-29T02:43:15 1724899395

> Is there a "fair use" defense to contract liability.

NAL, but in common law jurisdictions and maybe others there can be various implicit terms to any contract, like fair dealing.

Also if you can literally claim fair use, unless you signed a contract waiving that right (if that's even possible), it doesn't matter.

Heck, most software licensing in the USA is purporting to grant you rights that you already have from the Copyright Act. That's right, in US law you own the authorized copy you receive. The license claiming that you don't is questionable at best. To be fair the courts somehow have managed to become divided on this, but the plain language of the law itself is crystal clear, explicitly granting the right to make additional copies as necessary for execution.

Also that hardly matters when the fake license can still be "enforced" via lawfare. Most everyone is going to choose to pay up rather than fight a protracted legal battle against Microsoft.

1vuio0pswjnm7 · 2024-08-29T04:26:37 1724905597

https://www.courtlistener.com/opinion/506070/acorn-structure...

https://www.courtlistener.com/opinion/604886/national-car-re...

jprete · 2024-08-29T00:49:44 1724892584

IANAL but I think for the specific dismissed claims in this specific case, reproduction is the issue, and it doesn't indicate anything about training.

I think it would be extremely hard to make claims against GitHub for training AI with code on GitHub, assuming GH has the typical "right to use data to improve service" clause that usually shows up on free-service EULAs.

aezart · 2024-08-29T00:57:11 1724893031

> I think this is the key point: reproduction is the issue, not training. And as noted in the study[1] reproduction doesn't usually happen unless you go to extra lengths to make it.

But Microsoft is selling a service capable of such reproduction. They're selling access to an archive containing copyrighted code.

To me it's the equivalent of selling someone a DVD set of pirated movies. The DVD set doesn't "reproduce" the copyrighted material unless you "prompt" it to (by looking through the set to find the movie, and then putting it in your DVD player), but it was already there to begin with.

nl · 2024-08-29T01:09:06 1724893746

Strongly disagree with your analogy here. Lots of services are capable of doing things that are against the law but in general it's the actual breaking of the law that is prosecuted.

The closest thing to what you are suggesting is the Napster ruling, where a critical part was that the whole service was only about copyright infringement. In the Github case most people are using it to write original code which is not a copyright violation so there is substantial non-infringing use.

But what I think doesn't matter. The judge disagreed with that interpretation too.

darby_nine · 2024-08-28T15:56:13 1724860573

Huh I guess you can just avoid legal liability by laundering through a chatbot

IncreasePosts · 2024-08-28T15:58:21 1724860701

Another way to look at it is anyone can look at source available code to learn how to program without breaking a license.

saurik · 2024-08-28T16:55:07 1724864107

And if you then write a program that is remarkably similar to the one you read, that's copyright infringement. As another reply noted--but without anywhere near enough verbosity--this is not without risk, and people who intend to work on similar systems often try to use a strategy where they burn one engineer by having them read the original code, have them document it carefully with a lawyer to remove all expressive aspects, and then have a separate engineer develop it from the clean documents.

spiralpolitik · 2024-08-28T18:44:03 1724870643

Not quite.

Copyright doesn't protect general concepts, methods, or common knowledge. So you could write a program that is remarkably similar to another one and not infringe copyright. Just like you can write a book with the same plot as another without infringing copyright.

Plus given that most programming languages have a finite grammar and a limited number of ways to express general concepts, the individual bits of code that make up most programs are probably not sufficiently original to be copyrightable in themselves.

saurik · 2024-08-28T22:22:20 1724883740

But the result is that you can't assume that this is the case: you have to actually look in a case-by-case basis to decide if the chatbot you are using -- one which has no understanding of copyright as nuanced as either of us -- merely learned something general purpose and applied it in a way which did not lead to infringement, if the code it generated is technically infringing but is fair use, or if what it developed isn't allowed.

A lot of people seem to want to believe that the output of the chatbot is somehow inherently clean in all cases, and they cite this idea that a human can read code and learn from it... but a human can -- even without realizing it!! -- infringe on copyrights, and so such an analogy doesn't absolve the chatbot. If we then continue to assume that the chatbot's output is clean, then we are ascribing it a superhuman ability to launder copyright.

neilv · 2024-08-28T17:06:55 1724864815

> strategy where they burn one engineer by having them read the original code, have them document it carefully with a lawyer to remove all expressive aspects, and then have a separate engineer develop it from the clean documents.

Interesting. What kinds of situations is that strategy used for?

(I'm familiar with cleanroom, which I understand means that you start with un-tainted engineers, who've credibly never been exposed to the proprietary IP, the work only from unencumbered public documentation and running the system as an opaque box. Then there's also validation, like with parallel systems and fuzzing. But I haven't thought through in what situations this might not work, so might require the tainted documenting approach.)

TrueDuality · 2024-08-28T17:48:40 1724867320

This is the full or classic version of clean room reverse engineering. Using unencumbered public documentation is relatively new, that kind of detailed documentation wasn't widely available. Car manufacturers still protect their service manuals with an agreement that basically says they can't be used for this but I think a lot of service centers stopped making people sign them.

The classic tech story that used this technique is the IBM BIOS and the resulting spread of "IBM PC-Compatible" machines. There is a little bit about it on the wikipedia page (https://en.wikipedia.org/wiki/IBM_PC%E2%80%93compatible). Random factoid, the Netflix Original "Halt and Catch Fire" has a depiction of doing this IBM clone reverse engineering and did a pretty good job at it.

kelnos · 2024-08-28T18:40:24 1724870424

The strategy described by the GP is clean-room (reverse) engineering.

threatofrain · 2024-08-28T18:53:09 1724871189

That sounds like a question of degree for the jury — the evaluation of whether or not the facts presented warrant a claim of sufficiently infringing similarity. In this case the judge felt the plaintiffs weren't even close to demonstrating infringement that the question never appeared in front of a jury.

If we're moving the question to one of degree then it's up to Microsoft and others to monitor their output because even if a model is not trained on copyrighted material, you can still accidentally infringe. Even if you never listened to music near or by Lady Gaga, that does not mean you can use your own original inspiration to accidentally write songs that are too similar to Lady Gaga. In other words, like the Ed Sheeran case.

tidenly · 2024-08-29T06:08:06 1724911686

Does that have any legal basis? It sounds a lot like what Google did for their Java engine, which essentially rewrote the entire engine with the same APIs, while referencing the original source code. Didn't the courts decide it was fine?

anileated · 2024-08-28T16:08:54 1724861334

Anyone can, that’s orthogonal. This is about an automated tool that launders copyright at scale, generating revenue for its operator.

(And if you seriously say that this tool is learning how to program, ask yourself if that tool’s operator is effectively a slave owner.)

drdeca · 2024-08-28T16:45:52 1724863552

Saying that it “launders” only makes sense under the position you are claiming. So, it might be fine as a conclusion/claim, which I guess is how you’re using it, but it wouldn’t be good to use as part of an argument leading to your conclusion.

(I didn’t phrase that well…)

I generally don’t consider “learn” to apply only to entities which have the rights of a person, and of which ownership would amount to slavery.

It is a common saying “You can’t teach an old dog new tricks.”. It is widely understood that, in contrast, one can often teach a young dog new tricks. The dog, in this case, learns the trick. We do not generally consider training an animal to do a task to be slavery. Well, some vegans might? But it is far from a typical view of the word “slavery”.

So, am I saying that these language models are as rights-having and mind-having as a dog? No, much less so. Still, I have no objection to the word “learn” being used in this way.

anileated · 2024-08-29T03:48:13 1724903293

Some people say “it’s OK to ingest copyrighted material automatically at scale, since it’s for learning purposes”. They use two kinds of arguments for this.

Argument A:

A1. It’s a basic human right to be able to learn from things you see. You browse the Internet, you read some source code, you learn. Doesn’t matter what’s the license, you are free to do this.

A2. It’s called “machine learning”, so the machine does the same.

A3. Machine learning can use any content its operators can get a hold of.

This is obviously wrong, because machine is being assigned human rights. We can argue about what exactly are pre-requisites for something to be granted human rights—it’s maybe not a specific physiology (some might say certain smart non-humanoid animals deserve it), but it’s pretty certainly sentience and consciousness. Meanwhile, the whole reason AI tech is big is that there is supposed to no sentient being who would understand (and therefore deserve any right to be treated well and get rewarded). If you take that away and grant AI human rights, then there is no point in this tech.

So, either the machine has human-level sentience and is being forced to work (which humans famously tend to consider “slavery”), born and killed on demand, etc., or the machine is not learning in the sense under consideration because it’s an unthinking tool for its human operator.

Which brings us to argument B:

B1. It’s a basic human right to be able to learn from things you see. You browse the Internet, you read source code, you learn. Doesn’t matter what’s the license, you are free to do this.

B2. If you use a computer or [insert technology] to learn, that’s OK.

B3. An LLM is just another instance of that technology. You use LLM and you learn.

This is wrong for slightly more subtle reasons, but on bright side there’s multiple of them.

First, it’s not clear that someone learns while using Copilot to produce a work for them. If I asked Copilot to write me a Fibonacci number generator, have I learned how to write it? If I ask Midjourney to draw me 2055 Los Angeles skyline in the style of Picasso, did I learn how to draw?

Second, and this is a crucial fallacy, making a computer famously does not require ingesting all of the copyrighted material you can subsequently access through that computer. Said computer can exist just fine without it; the LLMs, however, cannot.

The inputs (knowledge and parts) required to produce the computer you’re using were largely obtained through ordinary ways (patents licensed, hardware paid for), whereas the inputs required to produce an LLM have been, some would say, effectively stolen.

drdeca · 2024-08-29T07:14:28 1724915668

I think one difference is that you are seeing things as defaulting to “not allowed to use the work for whatever purpose, and the only reason it is ok for people to look at it to learn from it, is because they have a human right to do so, which overrides the default”, while I would view things as “by default you can do what you want with the media, provided that it doesn’t go against a particular law (such as rules against distributing copies of it or substantial portions of it, etc.)” .

So, I think linking things to “it is a basic human right” is a mistake.

The argument is not “it is a human right that this can be done, therefore it is allowed.” The argument is “this does not violate any of the rules that could potentially forbid it.” .

anileated · 2024-08-30T15:10:53 1725030653

> one difference is that you are seeing things as defaulting to “not allowed to use the work for whatever purpose, and the only reason it is ok for people to look at it to learn from it, is because they have a human right to do so, which overrides the default”

I think in law the default is “can use as allowed by the owner”. If the owner doesn’t specify, then the default is something like “can’t distribute”.

This is thanks to the idea of property, and more specifically intellectual property responsible for a lot of innovation (including computing and LLMs themselves).

If you think some sort of intellectual property communism—you make stuff, but you don’t get to own it, and you get what you are given—is best, then fair enough, that’s your opinion.

drdeca · 2024-08-30T23:43:07 1725061387

While I don’t think a full intellectual-property-communism (as you phrase it) would be best, but I do think something a bit closer to it than we currently have would likely be better. (Mostly reducing copyright lengths a decent bit, closer to what they were in the early years of the US.) I think I agree that if implemented correctly, it can be a net benefit in promoting innovation/production-of-good-things. (I also think the existence of trademark law and patent laws are also good, though they may also have some flaws.)

Hm, in terms of defaults, my understanding is that, “by default you can do what you like with whatever data, but because copyright laws create copyrights, you are forbidden from distributing copies of a work which is under copyright, or distributing (or publicly performing) things which are substantially based on such a work, unless you you are doing so in accordance with permission from the copyright holder.”. So, because the law only restricts the distribution/public-performance of copies of the work or of portions of the work or of derivative works that are substantially based on the work, copyright doesn’t let the copyright owner dictate what can be done with the work outside of how the permission they may grant to distribute or perform things based on the work can include conditions. My impression is that if you aren’t distributing or performing the work or a derivative work, then copyright doesn’t restrict what you can do (outside of those things) with the work. Furthermore, my impression is that “derivative work” does not encompass everything that is in any way based on the work, but only things satisfying certain conditions about like, substantial similarity, and whether it also competes with the original work (but I think that last bit is an established and repeated precedent, rather than a law?).

Though, I’m not very well versed in law, and I don’t know how this fits in with a license to use a piece of software! I suspect that software is a special case, and that if it were not special-cases, that software licenses wouldn’t legally need to be agreed to, in order to be allowed to run the software? But that’s just a guess, and if I’m wrong about that then it would suggest that I’m wrong about the other thing?

As a side note: I think that property is a much more natural concept than intellectual property. The way I see it, IP was created by states, but property more generally makes sense outside of states (I don’t say that it predated them because I don’t know; I’m far from a historian.).

anileated · 2024-09-05T03:33:01 1725507181

> My impression is that if you aren’t distributing or performing the work or a derivative work, then copyright doesn’t restrict what you can do (outside of those things) with the work

LLM operators like ClosedAI are distributing derivative works at scale commercially.

drdeca · 2024-09-07T18:44:24 1725734664

Only in a sense of “derivative work” which is rather broad, and which I don’t think copyright law restricts (though this still needs to be settled by the courts). To be a copyright violation, it doesn’t suffice that the one work had a causal influence on the other work.

There is a test that I think is called a “three pronged test” with the 3 prongs being (iirc) something like:

1) substantial similarity: is the allegedly infringing work substantially similar to the work which it is allegedly infringing

2) Was there an actual causal influence by the work that was alleged infringed on, on the work that allegedly infringed?

3) Could the allegedly infringing work act as a substitute (economically) for the work allegedly being infringed on?

The third prong seems satisfied. The first one does not. The second one also seems satisfied but I’m less confident that I’m remembering the idea correctly (though I could be wrong about the three of them as a whole).

naasking · 2024-08-28T16:59:25 1724864365

> And if you seriously say that this tool is learning how to program, ask yourself if that tool’s operator is effectively a slave owner.

This doesn't follow. I don't see why knowledge and intelligence necessarily entail that it has a desire for autonomy, which is why slavery is really abhorrent.

anileated · 2024-08-29T04:01:53 1724904113

You can ask yourself how much desire for autonomy would indentured servants have after a few generations. Us humans can get used to almost everything. Presumably, simply being used to abuse and not desiring freedom just because you never had (or could even imagine) it doesn’t make being abused or lacking freedom “good”.

> I don't see why knowledge and intelligence necessarily entail that it has a desire for autonomy

I’d replace that with “knowledge, intelligence and human-like sentience”. Someone proposed to grant the tool the right humans normally have. (Humans can learn from reading any stuff under any license, so why not the tool.) Well, you’d think human-like sentience/consciousness are required for those rights, and human-like sentience/consciousness would desire the appropriate degree of autonomy.

naasking · 2024-08-29T16:06:40 1724947600

> Us humans can get used to almost everything. Presumably, simply being used to abuse and not desiring freedom just because you never had (or could even imagine) it doesn’t make being abused or lacking freedom “good”.

I don't think this is plausible. You can see your slaver has freedoms you don't, and no doubt you would desire to be free of your shackles like they are, so imagining it wouldn't be difficult at all.

> Someone proposed to grant the tool the right humans normally have. (Humans can learn from reading any stuff under any license, so why not the tool.) Well, you’d think human-like sentience/consciousness are required for those rights

I don't see why sentience would be required for some entity or tool to have the right to learn and synthesize new things like humans do. Copyright is a legal fiction that serves a purpose, and we can grant these rights under any circumstances we like, as long as we think it's a good idea.

consteval · 2024-09-03T20:43:59 1725396239

If you're arguing that LLMs cannot imagine this "freedom", then I'd say that then an LLM and a Human are fundamentally different. Therefore, LLMs should not be granted human rights.

I think this is a matter of having your cake and eating it. You can't say LLMs should have some human rights (particularly the ones that generate revenue), but not others, like a right to freedom.

> I don't see why sentience would be required for some entity or tool to have the right to learn and synthesize new things like humans do

On the contrary, I don't see why sentience should not be required.

These laws, for all they have existed, only apply to humans. Dogs cannot use them. A plant cannot use them. It is therefore reasonable to say you must be a human to use these rights. In my mind, what is unreasonable is claiming a computer program should be granted these rights. You'd have to justify why that should be the case, what good that can do for humanity as whole.

Turns out that's very hard, so AI people don't do it. They just give up. Instead they start out at an assumption that puts their ideology in a favorable position - that being that computer programs should be awarded human rights.

But that assumption, you'll find, is not actually fool proof. If you ask around, a lot of everyday people will consider it preposterous. They might call you insane. So, to me, you must justify that in tangible terms.

naasking · 2024-09-04T13:24:43 1725456283

> You can't say LLMs should have some human rights (particularly the ones that generate revenue), but not others, like a right to freedom.

There is no evidence that this is the case. These rights are not necessarily all or nothing. They are all or nothing for humans because humans have a bundle of properties that entail these rights, but artificial intelligences may have only a subset of those properties, and so logically may only get a subset of those rights.

> On the contrary, I don't see why sentience should not be required.

Sentience is the ability to feel. All that's needed for learning is the ability to perceive and have thoughts. Maybe there's some deep, intrinsic connection between the two, but this is not known at this time, and therefore I see no reason to connect the two.

> In my mind, what is unreasonable is claiming a computer program should be granted these rights.

There's a long history of human abuse of "lower animals" because we assumed they were dumb and non-sentient. Turns out that this is not the case. We should not be so open-minded that our brains fall out, but we should also be very wary of repeating our old mistakes.

consteval · 2024-09-05T17:06:29 1725555989

> We should not be so open-minded that our brains fall out, but we should also be very wary of repeating our old mistakes

Precisely, which is why it makes absolutely no sense to me to say that AI can't be granted a right to freedom.

I mean, what are you even arguing here? Do you not understand that this statement is in support of my position, not against?

> Sentience is the ability to feel. All that's needed for learning is the ability to perceive and have thoughts.

Highly debatable. You just made this up. These aren't the definition of anything. Once again, you need to bring something tangible to the table or people will call you crazy.

> therefore I see no reason to connect the two

Once again, this is your problem here. You're starting off, beginning, with an assumption that favors your stance. You can't do that, especially when said assumption has never, not even once, been true for all of human history.

Au contraire, I see no reason NOT to connect the two and you certainly haven't given any reasons why. These rights have always, only, applied to humans. I say we retain that status quo until someone gives something to show otherwise.

anileated · 2024-09-05T03:47:46 1725508066

> artificial intelligences may have only a subset of those properties

In order to split these qualities you need to understand what they are and define them well from first principles. Long story short, if you have solved the hard problem of consciousness we are eagerly awaiting your world-shattering paper.

To me a claim that an LLM is sufficiently like a human when it ingests data, but suddenly merely a tool when its rights start being concerned, is mental gymnastics unsupported by requisite levels of philosophical inquiry.

> There's a long history of human abuse of "lower animals" because we assumed they were dumb and non-sentient. Turns out that this is not the case

If you apply that logic to LLMs, you have bigger issues than granting them a single right that only puts their operators in the clear when it concerns copyright laundering.

anileated · 2024-08-31T12:04:59 1725105899

> You can see your slaver has freedoms you don't

Cool, so slavery where slaves do not see the slavers (let us call it “proper segregation”) is OK?

> I don't see why sentience would be required for some entity or tool to have the right to learn and synthesize new things like humans do

If sentience is not required for a “right” to learn, then I have nothing else to say to you. There is nothing there that is even learning. Learning is a concept that presumes an entity with volition, aspiration, consciousness.

naasking · 2024-09-04T13:27:37 1725456457

> Cool, so slavery where slaves do not see the slavers (let us call it “proper segregation”) is OK?

Sorry, you cannot erase the desire for autonomy even with "proper segregation".

> If sentience is not required for a “right” to learn, then I have nothing else to say to you. There is nothing there that is even learning. Learning is a concept that presumes an entity with volition, aspiration, consciousness.

Learning does not presume any such thing, and I also don't think you understand the meaning of sentience.

anileated · 2024-09-05T03:35:53 1725507353

> Sorry, you cannot erase the desire for autonomy even with "proper segregation".

Good, then we are on the same page with respect to abuse when LLMs are concerned, if we are to consider them sentient (as a prerequisite to be learning).

> Learning does not presume any such thing, and I also don't think you understand the meaning of sentience.

Look it up.

BobaFloutist · 2024-08-28T19:04:40 1724871880

If we could train the desire for autonomy out of humans, it wouldn't make human slavery any less abhorrent, even if they volunteered for the process and/or were well compensated.

naasking · 2024-08-29T16:05:03 1724947503

It absolutely would make it less abhorrent. Maybe you think it would still be abhorrent, but this is debatable. People literally do consent to slavery-like roles in places like the BDSM community, and some people might find it distasteful but not illegal or morally abhorrent, because these people still have the autonomy to opt-out at any point.

I also doubt training out the desire for autonomy is possible. Explore-exploit is fundamental to any kind of decision making, such as food foraging. That inclination goes deeper than higher brain functions.

anileated · 2024-09-03T06:10:10 1725343810

Consent implies human-like volition, consciousness. BDSM is a play scenario, and play stops being play when it is not consensual.

naasking · 2024-09-04T13:35:53 1725456953

I don't see why volition requires consciousness. People are very fond of thinking human qualities are irreducible and make far too many simplifying assumptions than are warranted.

anileated · 2024-09-05T03:37:05 1725507425

I did not write volition requires consciousness. Consent requires volition and consciousness.

darby_nine · 2024-08-28T18:12:15 1724868735

To me, the term "learning" as opposed to "training" entails autonomy.

anileated · 2024-08-29T04:13:52 1724904832

And even still, these words are used in many, sometimes mutually exclusive, meanings (“learn” as in “machine learning” is a far cry from “learn” as in “live and learn”). I wonder how the courts could even properly consider all implications if these words don’t have precise legal definitions all the way down to what it means being a human.

golergka · 2024-08-28T16:52:39 1724863959

> Anyone can, that’s orthogonal.

That's exactly what happens here. In this case anyone happens to be an LLM.

anileated · 2024-08-29T04:02:43 1724904163

LLM is not “anyone”, because LLM is a thing but “anyone” refers to people. If you consider LLMs people, then you should ask yourself whether they are suffering abuse from being treated the way they are by their operators.

golergka · 2024-08-29T16:14:49 1724948089

I don't see any reason to think that somebody who can learn is automatically a sentient being with rights.

anileated · 2024-09-02T03:33:59 1725248039

Learning is a concept that presumes an entity with volition, aspiration, consciousness.

stale2002 · 2024-08-28T16:13:50 1724861630

> Anyone can, that’s orthogonal.

Ok. So anyone "can" use a computer to do the same thing then. With the added part of "using a computer" it is now directly comparable and it is allowed.

> And if you seriously say that this tool is learning how to program

The tool is used by a person. The person is the one who takes the action, not the computer. So the point stands.

anileated · 2024-08-28T16:15:26 1724861726

If you “use a computer” to watch a pirated film, does that make the practice legal?

> The tool is used by a person. The person is the one who takes the action, not the computer. So the point stands.

If watching that pirated film helps you learn something, does that make it legal?

If the film was pirated not by you but by some for-profit company that charges you for watching it, does that make it legal?

stale2002 · 2024-08-28T16:20:34 1724862034

> Can you “use a computer” to watch a pirated film? Sure. Is it legal? Nah.

In many circumstances you can't mass distribute completely identical, non transformative, non fair use copies of large portions other people's copyrighted works, if thats what you meant.

But there are many exceptions to that rule where you are allowed to use or distribute other people's works. And just like a human being is allowed to use other people's copyrighted works in those many exceptions, a human is also allowed to use a computer to take advantage of those legal exceptions.

The only point here is that when you brought up that this uses a computer in your first post, thats not really a relevant detail.

A person can use those exceptions that allow them to use other people's copyrighted works, and they can do that with or without a computer and it is legal in those exceptions either way.

> If watching that pirated film helps you learn something, does that make it legal?

> If the film was pirated not by you but by some for-profit company that charges you for watching it, does that make it legal?

It depends on many factors. Yes there are many cases where yes it is legal to use other people's works.

Edit:

Evidence that I am right: you are right now commenting on a thread where a judge threw out all the copyright claims.

anileated · 2024-08-28T16:26:51 1724862411

> In most circumstances you can't mass distribute completely identical, non transformative, non fair use copies of large portions other people's copyrighted works

That law was defined long before there was a capability to launder authorship at scale in the way being discussed. The law does not account for this novel capability.

The law is intended to protect IP, which promotes innovation and creativity by creating relevant incentives. If that was the intention of the law, and it is not interpreted in that way, it ought to be revised for it to continue to serve those objectives.

> Evidence that I am right: you are right now commenting on a thread where a judge threw out all the copyright claims.

This only shows that you read the headline. It does not show that you (or the judge) are correct about the core issue.

stale2002 · 2024-08-28T16:28:07 1724862487

> The law does not account for the new capability.

Gotcha.

Well, fortunately, you are commenting on a post right now where the judge threw out the copyright claims.

So, apparently, I am correct that in this circumstance, that there is no illegal copyright infringement.

> promotes innovation and creativity

I'm this circumstance, it does seem to be promoting innovation and creativity because the AI stuff is allowed!

Glad you agree.

anileated · 2024-08-28T16:30:18 1724862618

It’s not a discussion about how the law is being interpreted by a particular court; that much is clear. It’s about how it ought to be interpreted.

danielmarkbruce · 2024-08-28T17:29:11 1724866151

It's more likely that relevant legislation needs to change if folks want the law to be different, rather than look to courts.

anileated · 2024-08-29T04:06:57 1724904417

I suppose it depends on the country. I heard the US is a somewhat unusual culture where concerns usually encoded in legislation in other countries instead battle it out in courts.

danielmarkbruce · 2024-08-29T05:10:13 1724908213

In the US there is a bunch of copyright legislation, and has been over time to deal with new things as they come up

threatofrain · 2024-08-28T18:44:06 1724870646

No, we should absolutely be interested in figuring out what the law would even say today. It is not obvious. That's why this case is interesting.

dgfitz · 2024-08-28T16:48:22 1724863702

> Another way to look at it is anyone can look at source available code to learn how to program without breaking a license.

Yes, and exactly ZERO amount of money have exchanged hands in this scenario. Learning is dope, the more the better.

The difference is, someone makes money off it, and not the persons(s) that wrote the code. This is not a valid argument

threatofrain · 2024-08-28T16:55:54 1724864154

Learning without making money does not shield you from copyright violation, otherwise public school teachers would just start saving money by copying whole texts. We don’t live in a society where it’s okay for an 8 year old to say “but I’m just trying to learn, I’m not a business!”

And making new music which is a synthesis of your life experience with copyrighted music does not mean copyright violation, regardless if you’re making money or compensating all the authors who’ve inspired you.

darby_nine · 2024-08-28T18:04:11 1724868251

> otherwise public school teachers would just start saving money by copying whole texts

This is literally a thing today. It may be illegal but the idea of prosecuting this is insane.

pc86 · 2024-08-28T16:59:54 1724864394

> We don’t live in a society where it’s okay for an 8 year old to say “but I’m just trying to learn, I’m not a business!”

I mean we absolutely do if you're an 8 year old. Except in the most NIMBY HOA-driven areas of the culture nobody expects a kid setting up a lemonade stand to get a business license or submit to health code inspections.

danielmarkbruce · 2024-08-28T17:30:31 1724866231

I look at source code and learn, and then sell my services for money. It's a very valid argument.

anileated · 2024-08-29T04:10:23 1724904623

> I look at source code and learn, and then sell my services for money.

Footnote—in most developed countries (slavery exists).

Human rights are great, aren’t they? Now, if you were, say, a tool that exclusively serves a human operator, this wouldn’t apply to you.

dgfitz · 2024-08-28T19:52:43 1724874763

Uh, no. Did you read what I said? Hell, did you read what you said? This response makes no sense whatsoever.

danielmarkbruce · 2024-08-28T20:35:33 1724877333

I'm basically an LLM. I trained in a similar way as the LLM, off of books and open source code, guessing what comes next, making errors, adjusting my brain. I make money the same way as the LLM, off of inference calls.

Maybe you don't truly understand how an LLM works, or is trained, or how inference works. Or how humans work. Money goes out during training, money comes in during inference.

dgfitz · 2024-08-29T10:39:29 1724927969

I understand how they work. If you’re going to make bad-faith arguments, we can drop it here.

danielmarkbruce · 2024-08-29T15:00:10 1724943610

Pointing out you are wrong isn't bad faith.

Several people have noted in this thread that you've missed a pretty simple fact.

dgfitz · 2024-08-29T18:21:59 1724955719

You claimed you’re “basically an LLM” so at this point it seems accurate. You’re mostly incorrect and spouting nonsense, nailed it!

danielmarkbruce · 2024-08-29T18:29:37 1724956177

Most don't consider making an analogy "bad faith".

https://en.wikipedia.org/wiki/Analogy

Even ChatGPT understands that money changes hands in both cases after training/learning:

https://chatgpt.com/share/525aabbb-1fcc-4b1d-a88e-34206c8f5c...

pc86 · 2024-08-28T16:58:40 1724864320

In this example a person is looking at code they can't legally copy, learning from it, and re-implementing the same functionality. Someone's definitely making money off of that. That person, that person's employer, clients and vendors, lots of people.

People get upset about AI because 1) the scale is much bigger because no human can read and generally remember all the code on GitHub while a sufficiently large model can, 2) it's a lot easier to prompt an AI into giving you a passable MVP than it is to code one from scratch, ESPECIALLY as a junior or even mid level, 3) there are unlikeable billionaires making money now where there weren't before.

aeonik · 2024-08-28T19:37:28 1724873848

I thought patents protected functionally, not copyright.

pc86 · 2024-08-28T20:37:57 1724877477

Who is talking about patents?

aeonik · 2024-08-28T22:29:23 1724884163

   In this example a person is looking at code they can't legally copy, learning from it, and re-implementing the same functionality.

I thought this is what patents protected, not copyright.

to11mtm · 2024-08-28T21:13:32 1724879612

> 1) the scale is much bigger because no human can read and generally remember all the code on GitHub while a sufficiently large model can,

Semi-True but it often hits an uncanny valley of either leaving enough comments in to know it was copied, or missing enough context that I'd rather the thing give me actual permalinks to whatever it thinks is relevant (i.e. like a search engine but better.)

> 2) it's a lot easier to prompt an AI into giving you a passable MVP than it is to code one from scratch, ESPECIALLY as a junior or even mid level

how do we define 'passable'? I've already run into a few cases where a jr/mid is doing 'passable' MVPs that, again, hit that 'uncanny valley' where subtle stuff is broken in an important way but it's hard to detect.

> 3) there are unlikeable billionaires making money now where there weren't before.

IDK 'Eyeball scans' are a bit much for me.

That said, this completely hand-waves over the knock-on effects.

All of the 'hype' generated about this, all the resulting Gartner reports, every organization latching on to the concept the same way I once was asked if I had any thoughts on how to integrate 'blockchain' into an enterprise that literally had no reason to short of attracting investment.

The problem this time, is they have something 'closer' to a product.

And we are seeing the result of that product more and more.

  - Layoffs

  - People having to deal with impacts of layoffs in their org and surprise surprise, the AI tools didn't replace the lost heads well enough.

  - Consumers dealing with the pain of these tools being applied. For example, my insurance provider shortened their online chat staff in lieu of an 'AI bot' that couldn't help me connect to one of the few humans left when all I was trying to do was add my wife to my auto policy. Or my bank that randomly decided one morning that spending <5$ for eggs and bacon was enough to hard-lock my card without even a text prompt -and- invalidate my password so that I had to call in to their line. (The eggs and bacon were purchased at my work's cafeteria, which I had previously purchased from.)

  - People sick of AI 'spam' that shows that uncanny valley. e.x. for ungodly reasons I sometimes see clickbait about cars and take it. And then they get very obvious facts completely wrong, that any human being actually writing the article, would have at least checked Wikipedia first for how many years it was produced...

  - People sick of getting work from colleagues/superiors where it's obvious an AI generated it and they didn't take the time to make sure it was right. Or maybe they did and it was below their skill, because again, that uncanny valley is a good bullshit generator. I've seen plenty of 'procedure documents' and 'technical requirements' that were obviously AI generated, yet actually catching the -subtlety- of the (still very important!) errors was difficult due to it's capabilities. The problem of course, is it's now someone -else's- problem to make sense of it, and frankly it's a proof of brandolini's law.

fsflover · 2024-08-28T16:33:32 1724862812

It's not so straightforward: https://en.wikipedia.org/wiki/Clean-room_design

croes · 2024-08-28T17:22:52 1724865772

Isn't fascinating that the same isn't true for books and music.

If it's too similar you get sued

shadowgovt · 2024-08-28T17:28:40 1724866120

I'm not exactly sure, but I think the underlying philosophy here is that code is a lot more like math than like music, and you can't copyright math.

So to have any copyright protection at all for code, the Office had to carve a narrow trail where the standard for copying is higher, because there are plenty of circumstances where there is only one right (or most optimal) algorithm, and there's no protection for the algorithm itself.

croes · 2024-08-28T18:37:23 1724870243

Music is pretty much math too.

shadowgovt · 2024-08-28T18:48:45 1724870925

Absolutely, as Ada Lovelace correctly observed. But IIUC copyright law doesn't generally recognize that association in any deep way.

kelnos · 2024-08-28T18:42:35 1724870555

And anyone can also look at that source available code, write their own version, distribute it, be sued for copyright infringement, and lose in court, because their version is too similar to the original.

ml-anon · 2024-08-28T16:58:17 1724864297

For the slow ones among us: Machine "learning" is not human learning. It is not similar to, analogous to, or in any way remotely comparable.

shadowgovt · 2024-08-28T17:30:29 1724866229

It seems obvious how they're comparable, in the same way that you can compare a parrot talking to human speech.

Black-box both systems and there's enough similarity to make a layperson go "Huh. Those look remarkably similar," even if the mathematicians among us know the underlying mechanisms, inputs, and outputs are quite different.

ml-anon · 2024-09-10T11:52:04 1725969124

if the inputs, mechanisms and outputs are different then they're...not comparable?

Spivak · 2024-08-28T16:11:22 1724861482

The significant step here is anything can do what you say. Because there's no human in the loop looking at source code and learning from it.

You have an autonomous system that's ingesting copyrighted material, doing math on it, storing it, and producing outputs on user requests. There's no learning or analogy to humans, the court is ruling that this particular math is enough to wash away bit color. The ruling was based on the outputs and the reasonable intent of the people who created it and what they are trying to accomplish, not how it works internally.

It's not the first, if you take copyrighted data and && 0x00 to all of it that certainly washes the bits too.

RandallBrown · 2024-08-28T17:17:31 1724865451

> You have an autonomous system that's ingesting copyrighted material, doing math on it, storing it, and producing outputs on user requests

People are also autonomous systems that ingest copyrighted material, do "math" on it, store it, and produce outputs on user requests.

The real difference is the scale at which a computer can ingest copyrighted material is MUCH greater than what a person can do. Does that make it illegal? Maybe, maybe not.

Spivak · 2024-08-28T19:58:27 1724875107

Am I in a bad sci-fi novel? People aren't machines! How is this a such a difficult concept? LLMs have as much thought as quicksort. I swear to god humans will anthropomorphize everything except ourselves. Do y'all's salaries depend on this or something?

There is no rule that says "If a human can do something, a computer program instructed by a human can do the same thing." Hell that rule doesn't even exist for humans acting as stand-ins. I can't send someone I hire out of the country and have them use my passport. It's why you can watch a movie in a theater but an autonomous system working on your behalf, a camera, can't.

Github made a tool, it's as alive as a hammer. It "learns" as much as your programmable pad lock. Whether or not the human employees of Github are allowed to use copyrighted material to make that tool, and whether the human employees of Github are performing a copyrighted work when users make use of the tool is the legal question.

Y'all wouldn't survive the https://en.wikipedia.org/wiki/Philosophical_zombie apocalypse.

amanaplanacanal · 2024-08-29T02:27:58 1724898478

Normally, if it is legal for a human to do something, I would assume that human could legally use a computer to help do that thing. Are there cases where this isn't true?

NoMoreNicksLeft · 2024-08-28T21:29:13 1724880553

The idea that the LLM violates copyright by reading/viewing a work is the same idea that you violate the copyright by reading or viewing the work. Perhaps you're creating an organically encoded copy of the work within your brain.

No copies are being made, and definitely no copies are being sold.

timhh · 2024-08-28T17:13:00 1724865180

No you can't, any more than you can encode a Disney film as a prime number or in the digits of pi and avoid copyright that way.

Read this classic essay: https://ansuz.sooke.bc.ca/entry/23

nimbius · 2024-08-28T17:22:21 1724865741

what the judge isnt arguing is the encoding...hes stating CoPilot:

“rarely emits memorised code in benign situations.”

So, you could in fact encode 5,000 copies of Mulan in different formats and, so long as 4,999 are not verbatim copies, youre good*

*you must affix the letters "AI" to the encoder

eviks · 2024-08-29T04:28:32 1724905712

The owner of the encoding bot is good, but you're still in trouble when using those results to distribute your Mulan viewer

timhh · 2024-08-28T18:34:02 1724870042

Maybe you could if you weren't clearly intending this as a way to violate copyright.

It still isn't a magic copyright eraser. The law doesn't fall for mathematical "aha but!" tricks like HN commenters assume it does.

nimbius · 2024-08-28T18:53:38 1724871218

nono, its not magic...

its AI :)

stale2002 · 2024-08-28T16:12:28 1724861548

No, not really. You mistake what the purpose of copyright is.

If I used a chatbot to sell the entire text of harry potter, all at once, that would still be illegal even though its through a chatbot.

Whats legal, of course, is creating transformative content, learning from other content, and mostly creating entirely new works even if you learned/trained from other content about how to do that. Or even if there are some similarities, or even if there were verbatim "copies" of full sentences like "he opened the door" that were "taken" from the original works!

Copyright law in the USA has never disallowed you entirely from ever using other people's works, in all circumstances. There are many exceptions.

kelnos · 2024-08-28T18:46:36 1724870796

> Copyright law in the USA has never disallowed you entirely from ever using other people's works, in all circumstances. There are many exceptions.

Sure, and the question is: "does using an AI chatbot like Copilot fall under one of those exceptions?" My position -- as well as the position of many here -- is that it shouldn't. You may disagree, and that's fine, but you're not fundamentally correct.

darby_nine · 2024-08-28T16:16:42 1724861802

> If I used a chatbot to sell the entire text of harry potter, all at once, that would still be illegal even though its through a chatbot.

Right, which is why you sell access to the chatbot with a knowing wink.

> You mistake what the purpose of copyright is.

At one point it was to ensure individual creators could eke out a living when threatened by capital. I frankly have no clue what the current legal theory surrounding it is.

RandallBrown · 2024-08-28T17:19:54 1724865594

It would still be illegal to ask the chatbot to recreate the text of Harry Potter.

Now, if you were to ask it to generate a similar story based on Harry Potter, that would be fine. Especially since that's basically what JK Rowling did after watching Star Wars.

joegibbs · 2024-08-29T00:34:06 1724891646

Harry Potter is a clone of Star Wars? I don't really see it, any more than any story that follows the Hero's Journey. I remember being a kid and reading Eragon though, and that really was very similar.

darby_nine · 2024-08-28T17:58:13 1724867893

> It would still be illegal to ask the chatbot to recreate the text of Harry Potter.

Ok, but this is basically impossible to litigate so what's the point of asserting it? Besides, copyright violation still requires distribution.

kelnos · 2024-08-28T18:47:46 1724870866

If you use ChatGPT to recreate the text of Harry Potter, then OpenAI is distributing that to you, which is copyright infringement.

darby_nine · 2024-08-28T18:54:32 1724871272

That's a very, erm, poetic understanding of distribution. Good luck bringing suit.

shadowgovt · 2024-08-28T17:32:12 1724866332

In the US, it's "To promote the progress of science and useful arts" as per the US Constitution.

Making sure individual creators can eke out a living is one avenue to pursue that goal.

darby_nine · 2024-08-28T17:59:10 1724867950

> Making sure individual creators can eke out a living is one avenue to pursue that goal.

Someone should let our legislators know.

UncleMeat · 2024-08-28T17:16:27 1724865387

The law doesn't work this way. Deliberately circumventing copyright via something like Copilot will have different consequences, even if the eventual outcome is that Copilot is allowed to train on open source code that has restrictive licenses.

darby_nine · 2024-08-28T18:16:41 1724869001

> The law doesn't work this way. Deliberately circumventing copyright via something like Copilot will have different consequences, even if the eventual outcome is that Copilot is allowed to train on open source code that has restrictive licenses.

Copilot is a deliberate circumvention of copyright. It might be legal but that doesn't change the clear intent here: charging people without having to do the work you're charging for.

austin-cheney · 2024-08-28T16:31:49 1724862709

The comments seem to misunderstand copyright. Copyright protects a literal work product from unauthorized duplication and nothing else. Even then there are numerous exceptions like fair use and personal backups.

Copyright does not restrict reading a book or watching a movie. Copyright also does not restrict access to a work. It only restricts duplication without express authorization. As for computer data the restricted duplication typically refers to dedicated storage, such as storage on disk as opposed to storage in CPU cache.

When Viacom sued YouTube for $1.6 billion they were trying to halt the public from accessing their content on YouTube. They only sued YouTube, not YouTube users, and only because YouTube stored Viacom IP without permission.

BeefWellington · 2024-08-28T17:18:17 1724865497

> When Viacom sued YouTube for $1.6 billion they were trying to halt the public from accessing their content on YouTube. They only sued YouTube, not YouTube users, and only because YouTube stored Viacom IP without permission.

Now do these steps for OpenAI instead of YouTube. Only OpenAI doesn't let users upload content, and instead scraped the content for themselves.

anticensor · 2024-08-29T05:43:58 1724910238

OpenAI actually lets users to upload content to the chat input.

advisedwang · 2024-08-28T18:44:56 1724870696

From the article it sounds like the plaintiffs were alleging that ChatGPT is effectively doing unauthorized duplication when it serves results that are extremely similar or identical to the plaintiff's code. They aren't just alleging that reading their code = infringement like you seem to imply.

paulddraper · 2024-08-29T00:24:46 1724891086

I don't think the author was implying that.

But yes, that is the charge.

maronato · 2024-08-28T15:44:03 1724859843

The judge argues that copilot “rarely emits memorised code in benign situations”, but what happens when it does? It is bound to happen some day, and when it does would I be breaching copyright by publishing the code copilot wrote? Just a few weeks ago a very similar suit for stable diffusion had its motion to dismiss copyright infringement claims denied. https://arstechnica.com/tech-policy/2024/08/artists-claim-bi...

dragonwriter · 2024-08-28T15:57:41 1724860661

> The judge argues that copilot “rarely emits memorised code in benign situations”, but what happens when it does? It is bound to happen some day, and when it does would I be breaching copyright if i, unknowingly, published the code copilot wrote?

That's irrelevant to the case being made against GitHub, which is why it is addressed in the decision.

> Just a few weeks ago a very similar suit for stable diffusion had its motion to dismiss copyright infringement claims denied.

The case against Midjourney, SAI, and RunwayML is based on a very different legal theory -- it is a simple direct copyright violation case ("they copied our work onto their servers and used it to train models") whereas the Copilot case (the copyright part of it) is a DMCA case claiming that Copilot removes copyright information management information.

It's not really surprising that the Copilot case was easier to dispose of without trial; it was a big stretch that had the advantage for the plaintiffs that, were it allowed to go forward, it doesn't admit a fair use defense the way a traditional direct copyright violation case does.

They aren't really "similar" except that both are lawsuits against AI service/model providers that rest some subset of their claims on some part of Title 17 of the US Code.

slavik81 · 2024-08-28T16:25:30 1724862330

I am not a lawyer, but I explore these questions by imagining an existing situation with a human. If your friend gave you code to publish and it turned out he gave you someone else's code that he had memorized, would you be breaching copyright? The answer in that case is plainly yes, and I think it would be no different with an LLM.

Substituting a human for a computer changes some aspects of the situation (e.g., the LLM cannot hold copyright over the works it creates), but it's useful because it leaves the real human's actions unchanged. However, for more complex questions that interact with things like work-for-hire contract law, you may need to take a more sophisticated approach.

blackoil · 2024-08-28T15:55:35 1724860535

You'll get a second system, that searches your code against index of copyrighted code. If found say > 70% matching against some unique code, it will be flagged for rewrite. It can be automated in Copilot to simply regenerate with a different seed.

bschmidt1 · 2024-08-28T17:08:05 1724864885

In some languages there are few ways (or 1 way) to do things, so everyone writes the same looking for loops, etc. And sometimes in the same order, with the same filenames, etc. by convention - especially in the case of heavy framework usage where most people's code is mostly identical % wise. The flagging system would have to be able to identify framework usage separate from IP and so-on.

Beyond that, it seems like you'd need a highly expressive language for this to work well. You can effectively scan for plagiarism in English because it's so varied that it really is an outlier to see several lines of text that are identical to each other from different sources, but maybe it's not that strange to see entirely identical files, or at least very similar code, in totally distinct, say, React or Ruby-on-Rails projects.

I think of code methodologies as more like construction techniques. Maybe some pieces and parts are patentable, and some can even be productized as tools, but a lot of it is just convention and technique.

hadlock · 2024-08-28T17:12:42 1724865162

Looking forward to the "rewrite this over and over until it no longer triggers the copyright-warning alarm" button in my IDE

Analemma_ · 2024-08-28T15:56:26 1724860586

The same thing that happens if you write a song which happens to have the same pattern of four notes as another song: absolutely nothing, because that would be an insane standard to hold copyright to and would lead to nothing ever being produced without a tidal wave of infringement suits.

to11mtm · 2024-08-28T21:23:09 1724880189

Depends...

But frankly one could ask whether this would be closer to 'Sampling'...

kelnos · 2024-08-28T18:50:37 1724871037

> and when it does would I be breaching copyright by publishing the code copilot wrote?

Presumably OpenAI would be committing copyright infringement by even displaying that code to you, if it does not have a license to do so.

to11mtm · 2024-08-28T21:20:03 1724880003

'Normally' IIRC you would still be a party to the lawsuit.

Might get out of it depending on specifics but you'd be a party.

OTOH, I -thought?- that part of at least enterprise Copilot was that they would defend in such cases.

ChrisArchitect · 2024-08-28T15:23:45 1724858625

Misleading OP,

Discussion from July:

Judge dismisses DMCA copyright claim in GitHub Copilot suit

https://news.ycombinator.com/item?id=40919253

AnimalMuppet · 2024-08-28T15:06:53 1724857613

Interesting. The parts that survived are the contract claims and the open-source license claims.

Contract is understandable - it supersedes almost everything else. If the law says I can do X but the contract says I can't, then I almost certainly can't.

It's nice to see open-source licenses being treated as having somewhat similar solidness as a contract.

tialaramex · 2024-08-28T16:33:03 1724862783

The FSF's argument for their copyleft was always based on exactly the same foundations as typical copyright licenses. If Alice can say that you must pay her $500 to do X with her copyrighted thing, then logically Bob can say that you must obey our rules to do X with his copyrighted thing.

This invites courts to pick, smash copyright (which would suit the FSF fine) or enforce their rules just the same (also fine). It makes it really difficult for a court, even one motivated to do so, to thread the needle and find a way to say Alice gets her way but Bob does not.

Structuring your arguments so that it's difficult for motivated courts to thread this needle is a good strategy when it's available. If you're lucky a judge will do it for you, as in Carlill v Carbolic Smoke Ball Co (the foundation of contract law) or indeed Bostock v. Clayton County - hey, says Gorsuch, the difference between this gay man and this straight woman isn't that they're attracted to men, that's the same - the actual difference is one of them is a man, but, that's sex discrimination, so this is a sex discrimination case!

panic · 2024-08-29T01:02:04 1724893324

If you have access to the Copilot weights, you should consider leaking them. We shared our code with you because we wanted it to be free, not sold back to us at $10/month.

ozr · 2024-08-29T03:21:51 1724901711

Fwiw, I've never paid for Copilot. I was automatically given free access for open source contributions. My largest public repo had maybe 100 stars. I've made minor commits to larger repos.

I don't know what the threshold is, but I'm fine with the trade-off I received.

stale2002 · 2024-08-29T06:56:52 1724914612

Then you should be happy to know that there are multiple open source coding weights out there already! Some of which are as good as, or possibly better than co-pilot.

That should satisfy anyone who actually cares about this as opposed to only being interest in making a snappy gotcha one liners.

tbrownaw · 2024-08-29T01:40:31 1724895631

... because GPU-hours are worthless?

e-clinton · 2024-08-29T01:34:58 1724895298

So if a developer uses “free code” in their software, must they only do it for free?

amanaplanacanal · 2024-08-29T02:17:32 1724897852

That depends on the original license.

jsyang00 · 2024-08-29T00:16:02 1724890562

> leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract

these seem to be major claims?

sidewndr46 · 2024-08-29T00:24:27 1724891067

It really depends, even if found to be in violation of the license the outcome could be "promise not to do it again".

lucb1e · 2024-08-29T00:31:00 1724891460

Could still be monumental if that creates the case law to be referenced in the future. Lawsuits are a lot easier to start when you know you're going to win because a previous case was extremely similar, which is to say this could have a major impact on the industry even if the punishment (this time, for doing it while it was uncharted territory) was a slap on the wrist

nickelpro · 2024-08-29T04:35:30 1724906130

Read the decision, the breach of contract claim simply survived dismissal, but the surrounding discussion makes it clear it doesn't have much of a prayer (which is the same as the "open-source license violation", the OP article is trash).