Could a human also accidentally spit out the exact code while having it just learned and not memorized in good faith?
I guess the likelihood decreases as the code length increases but the likelihood also increases the more constraints on parameters such as code style, code uniformity etc you pose.
> Could a human also accidentally spit out the exact code while having it just learned and not memorized in good faith?
That's just copying with extra steps.
The way to do it legally is to have 1 person read the code, and then write up a document that describes functionally what the code does. Then, a second person implements software just from the notes.
That's the method Compaq used to re-implement the original PC BIOS from IBM.
Indeed. Case closed. If an AI produces verbatim code owned by somebody else and you cannot prove that the AI hasn't been trained on that code, we shall treat the case in exact the same way as we would treat it when humans are involved.
Except that with AI we can more easily (in principle) provide provable provenance of training set and (again in principle) reproduce the model and prove whether it could create the copyrighted work also without having had access to the work in its training set
> Typically, a clean-room design is done by having someone examine the system to be reimplemented and having this person write a specification. This specification is then reviewed by a lawyer to ensure that no copyrighted material is included. The specification is then implemented by a team with no connection to the original examiners.
Theoretically maybe, then they would have to prove they did so without having knowledge about the infringed code in court. You can't make that claim for AI that was trained on the infringed code3.
Yes, that's why any serious effort in producing software compatible with GPL-ed software requires the team writing code not to look at the original code at all. Usually a person (or small team) reads the original software and produces a spec, then another team implements the spec. This reduces the chance of accidentally copying GPL-ed code.
> Has a human ever memorised verbatim the whole of github?
No, and humans who have read copyrighted code are often prevented from working on clean room implementations of similar projects for this exact reason, so that those humans don't accidentally include something they learned from existing code.
Developers that worked on Windows internals are barred from working on WINE or ReactOS for this exact reason.
Hasn't that all been excessively played through in music copyright questions? With the difference that the parody exception that protects e.g. the entire The Rutles catalogue won't get you far in code...
I guess the likelihood decreases as the code length increases but the likelihood also increases the more constraints on parameters such as code style, code uniformity etc you pose.