Hacker News new | past | comments | ask | show | jobs | submit login

Neither does copilot.



But it does though. There have been many times where this was the case.


It only happens if you bait it really hard and push it into a corner. That's not representative at all. I use Copilot to write highly niched code that's based on my own repo. It's simply amazing at understanding the context and suggest things I was about to write anyway. Nothing it produces is just copypasted character by character. Not even close.


As others have pointed out, it means the model contains copyrighted material. So I guess that’s totally illegal. Like if I ripped a Windows ISO, zipped it up and shared it with half the world. You know what would happen to me don’t you ?


Not the same thing at all. The data isn't just sitting there in a store inside the model that you can query. No-one would be able to look at the raw data and find any copyrighted material, even if all it was trained on was copyrighted code (which I agree is an issue).


There’s a lot of misconceptions here but LLMs and stable diffusion have spat out copyrighted material verbatim.

So that’s not accurate.


What is not accurate? They are still not storing any material internally, even if the patterns they have learned can cause them to output copyrighted material verbatim. People need to break out of the mental model that an LLM is just a bunch of pointers fetching data from an internal data store.


Have a read through other comments on this thread, you'll see some good examples.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: