Hacker News new | past | comments | ask | show | jobs | submit login

And GitHub’s EULA gives it the right to train Copilot on public code you host on GitHub.



The issue, though, is not the code I personally upload to my own public repositories, but the code that someone else uploads to Github by cloning my repository held somewhere else than Github.

Personally I have eschewed any personal use of Github since the MS aquisition and only ever use it where that's mandated by a client (so not my code). If you clone my code from elsewhere into a Github repo, that's just rude and contrary to me every intent and wish.

I think it's time to add a "No GitHub" clause as an optional add-on to the various open-source licenses.


So then the person who uploaded your code to GitHub has committed a copyright violation and I’m sure GitHub would honor to remove your code from the model training corpus as it was illegally uploaded to GitHub.


It’s not necessarily a copyright violation if the license permits copying. Under a permissive license, you are expressly permitted to copy the code and distribute copies provided you comply with whatever conditions the license mandates, without an explicit blessing of the copyright holders. Most popular licenses do not include a prohibition on training AI models. Maybe people should start including a clause.


Many popular licenses include a prohibition on being used to create proprietary software. GitHub Copilot is proprietary.


That's great, but GP's argument was

> Copilot does not 'steal' or and reproduce our code - it simply LEARNS from it as a human coder would learn from it.

Not "the terms of use you agreed to allow them to do it". Different argument with different amount of merit in my opinion


Agreed. I was just saying in the current environment GitHub has that license, nobody else has. So if the courts decide one day that because machines learn differently from humans, they will allow copyright holders to add a license exception that disallows machine training, then GitHub will benefit from this. It’s kind of ironical. What’s best for society is to not have any such law enacted and continue to allow open source models to progress alongside proprietary ones (in addition to more level competitive dynamics on the proprietary side).


They could just train a model on GPL code that can only be used on GPL code.

For MIT licenses that's impossible currently because of the requirement to mention the authors.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: