I understand the sentiment, but I think it is misguided and therefore - counterp...

orlp · on May 8, 2023

> If they generate verbatim copies of any non-trivial-enough part that would be the subject of a copyright license, yes, it would be copyright infringement. Yet, could anyone give such practical examples?

Yes, there are many such examples.

https://twitter.com/docsparse/status/1581461734665367554

https://twitter.com/mitsuhiko/status/1410886329924194309

https://codeium.com/blog/copilot-trains-on-gpl-codeium-does-...

arp242 · on May 9, 2023

> there are many such examples.

It's always the same two examples, and I would not classify that as "many", especially since that fast inverse square root function has been shown to be on GitHub and other sites countless of times with all sorts of different licences (which is wrong, but copilot doesn't seem to do better or worse than humans in this regard).

That codeium.com is just asking leading questions, or the AI equivalent of that.

scotty79 · on May 8, 2023

What so special about the code here:

https://twitter.com/docsparse/status/1581461734665367554

It implements common operation in a standard way as far as I can tell. AI re-implementation is not identical.

It seems more like multiple discovery than plagiarism let alone re-producing a copy without a license.

psychphysic · on May 8, 2023

To be fair that is not verbatim.

It is mutatis mutandis the same but is that a problem? I'm sure many would say so, I'm not convinced.

Ultimately if his code is out there a Google search could bring up a snippet without the license visible and I might copy paste that. The crux is the same code might be presented without context.

Copilot is just a tool and the personal responsible for it's safe usage is the human behind it.

In my world view, if I copy a picture off Google image search ultimately I am morally the one who infringed copyright not Google.

johnisgood · on May 8, 2023

> In my world view, if I copy a picture off Google image search ultimately I am morally the one who infringed copyright not Google.

I have an idea why, but... why exactly? What about a web scraper (that I made, similarly to that of Google) that downloads images? What if it is randomly downloading images and not intentionally a specific one?

nirava · on May 8, 2023

Valid points. But I don't want my code to be used by big corporations and monopolies to train closed source LLMs that they're going to sell. Shouldn't I get to have a say in that?

For example, GPL controls what kind of projects can use my source code. Maybe there could be an addendum to GPL that requires all LLMs trained on the source code to be open source. Sure, that won't guarantee that Copilot-like bots won't be trained using my code. But it does give me a legal framework to stop big corporations from profiting off such Copilot-like bots without making them open-source as well.

janeway · on May 8, 2023

A genuine Q: if the LLM was from a purely non-profit company that gave out their AI for free, would you mind your code being used? Would you in fact be proud that it has made a useful contribution? Assuming that the outcome does not affect your income.

xigoi · on May 9, 2023

Not the original commenter, but if the model was publicly released and open, I definitely wouldn't have a problem with that.

toastal · on May 9, 2023

This is the FOSS license I want

deelly · on May 8, 2023

> how do they differ from a software engineer who copies and pastes code?

They differ because author of the code did not agreed to use his code to teach LLM.

> Second, if the code is hosted anywhere else, there is no guarantee that Copilot (or another model) won't learn from that

This issue should be resolved not by author of the code, but by Copilot or any other LLM team.

> Third, for me, the crucial part of open-source code is maintenance.

Even if author is against it? This is a valid argument, but results is not very different from pirating of software.

stared · on May 8, 2023

You look at the problem from principles, while I look for the outcomes.

When to the third point - well, it is up to the author, and I respect that (regardless if I would do the same thing). People have the right to not share it at all, or share it as a copyrighted piece of software, or with any other limitations. Though, all limitations (and copyleft is a limitation) affect its usage.