As others have pointed out, it means the model contains copyrighted material. So...

Kiro · on May 8, 2023

Not the same thing at all. The data isn't just sitting there in a store inside the model that you can query. No-one would be able to look at the raw data and find any copyrighted material, even if all it was trained on was copyrighted code (which I agree is an issue).

ChatGTP · on May 9, 2023

There’s a lot of misconceptions here but LLMs and stable diffusion have spat out copyrighted material verbatim.

So that’s not accurate.

Kiro · on May 9, 2023

What is not accurate? They are still not storing any material internally, even if the patterns they have learned can cause them to output copyrighted material verbatim. People need to break out of the mental model that an LLM is just a bunch of pointers fetching data from an internal data store.

ChatGTP · on May 10, 2023

Have a read through other comments on this thread, you'll see some good examples.