As others have pointed out, it means the model contains copyrighted material. So I guess that’s totally illegal. Like if I ripped a Windows ISO, zipped it up and shared it with half the world. You know what would happen to me don’t you ?
Not the same thing at all. The data isn't just sitting there in a store inside the model that you can query. No-one would be able to look at the raw data and find any copyrighted material, even if all it was trained on was copyrighted code (which I agree is an issue).
What is not accurate? They are still not storing any material internally, even if the patterns they have learned can cause them to output copyrighted material verbatim. People need to break out of the mental model that an LLM is just a bunch of pointers fetching data from an internal data store.