I agree that copyleft is more about "giving forward", and I think it's a confusion a lot of people make. Reading through the thread, I get the impression that some think as soon as one "distributes" the licensed material, original authors should get a copy. I'm extrapolating of course, but even then I feel some people would agree with that statement.
GPL, for instance, merely states that distributed sources or patches "based on" the program should be "conveyed" under the same terms. In other words, anyone who gets their hands on it will do so under the same license.
If anything, I would be worried that GitHub trained itself on publicly-available but not clearly licensed code, because then it would have no license to "use" it in any way[0]. GPL provides such a right, so there is no problem there. It would be even more worrying if the not clearly licensed code was in a private repository but I think I remember reading that private repositories were not included in the training data.
However, would you consider a black box program, of which the output can consistently produce verbatim or at the very least slightly modified copies of code from GPL code to be transformative? The problem does not lie in how the code is distributed but in how transformative the distributed code is. Not only does the same apply to any program besides AI-powered software, it applies to humans[1].
Given how unpredictable the output of an AI is, one should not be able to train itself on GPL code if it cannot reliably guarantee it will not produce infringing code.
GPL, for instance, merely states that distributed sources or patches "based on" the program should be "conveyed" under the same terms. In other words, anyone who gets their hands on it will do so under the same license.
If anything, I would be worried that GitHub trained itself on publicly-available but not clearly licensed code, because then it would have no license to "use" it in any way[0]. GPL provides such a right, so there is no problem there. It would be even more worrying if the not clearly licensed code was in a private repository but I think I remember reading that private repositories were not included in the training data.
However, would you consider a black box program, of which the output can consistently produce verbatim or at the very least slightly modified copies of code from GPL code to be transformative? The problem does not lie in how the code is distributed but in how transformative the distributed code is. Not only does the same apply to any program besides AI-powered software, it applies to humans[1].
Given how unpredictable the output of an AI is, one should not be able to train itself on GPL code if it cannot reliably guarantee it will not produce infringing code.
[0]: https://docs.github.com/en/site-policy/github-terms/github-t... (https://archive.ph/susi0#4-license-grant-to-us)
[1]: One such example would be how Microsoft employees allegedly prevented themselves from reading refterm source code, cf. https://github.com/microsoft/terminal/issues/10462#issuecomm...