Hacker News new | past | comments | ask | show | jobs | submit login

You're saying plagiarism isn't if one mostly swaps a couple of things in the expression of the content?



the definition of plagiarism is "the practice of taking someone else's work or ideas and passing them off as one's own"...chatGPT might infer from thousands or millions of different possible works or ideas before creating its own sentences so I don't think that meets the definition of plagiarism


In general no, but there is a problem in that ChatGPT may end up "regurgitating" large chunks of source material regardless, even if mechanistically that's not what it's trying to do. Similarly it's been recently reported that Stable Diffusion has effectively memorized some entire images it was trained on, and is capable of generating those as output.

I don't think the word-by-word statistical mechanism of ChatGPT would stand up as a copyright defense in court. It's the output that counts, not the means of getting there. It'd be like me copying some copyright work word-for-word then trying to claim "well, your honor, I was only using that for inspiration, I was using my full creative abilities to write what I did, so you can't blame me if it's a word-for-word copy".

I think OpenAI (or any company with the resources to train such a model in the first place) could fairly easily self-police and check that what they are generating isn't an exact (or almost exact) copy of something it was trained on. It's a bit like the app Shazam/similar recognizing a song from a short snippet - you just need to generate some type of "hash code" for each generated sentence (or whatever level of granularity makes sense) and compare it to a database of "hash codes" from the source material it was trained on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: