Neither does copilot.

eptcyka · on May 8, 2023

But it does though. There have been many times where this was the case.

Kiro · on May 8, 2023

It only happens if you bait it really hard and push it into a corner. That's not representative at all. I use Copilot to write highly niched code that's based on my own repo. It's simply amazing at understanding the context and suggest things I was about to write anyway. Nothing it produces is just copypasted character by character. Not even close.

bamboozled · on May 8, 2023

As others have pointed out, it means the model contains copyrighted material. So I guess that’s totally illegal. Like if I ripped a Windows ISO, zipped it up and shared it with half the world. You know what would happen to me don’t you ?

Kiro · on May 8, 2023

Not the same thing at all. The data isn't just sitting there in a store inside the model that you can query. No-one would be able to look at the raw data and find any copyrighted material, even if all it was trained on was copyrighted code (which I agree is an issue).

ChatGTP · on May 9, 2023

There’s a lot of misconceptions here but LLMs and stable diffusion have spat out copyrighted material verbatim.

So that’s not accurate.

Kiro · on May 9, 2023

What is not accurate? They are still not storing any material internally, even if the patterns they have learned can cause them to output copyrighted material verbatim. People need to break out of the mental model that an LLM is just a bunch of pointers fetching data from an internal data store.

ChatGTP · on May 10, 2023

Have a read through other comments on this thread, you'll see some good examples.