Hacker News new | past | comments | ask | show | jobs | submit login

How is suing OpenAI different from suing an army of gifted individuals who can speed-read and who read most NYT articles and create responses based on them?

Assuming there is no plagiarism, of course.




Because larger LLMs can reproduce text verbatim, speed readers probably not:

https://nitter.net/maksym_andr/status/1740776900786626608


So what this tells me is that if you make a carefully constructed prompt and change the temperature to zero (not an option normally available), and know the exact article title and perhaps the beginning of the first paragraph (which is pretty difficult to do unless you already basically have access to the full article), you can potentially get verbatim articles back out.

Great, so all those paywalls that use verbatim subjects and early texts as a teaser are now broken! Well, for all old articles that the LLM was trained on, at least, I guess. Very simple countermeasure is to simply use AI to paraphrase the subject and article (paradoxically) until the person pays for the privilege of reading human-authored text.

I'm sorry but this is a very weak argument. For example, I don't even believe normal users have access to the GPT4 system prompt unless they use the API directly (and possibly not even then, I'd have to check).


One is real, another one is made up (and there is plagiarism, of course, many pages of examples of that)


The temperature setting controls randomness output of the LLM.

We paraphrase all the time to avoid plagiarism and that's just somewhat randomized retelling of the same idea.

If you set the temperature to 0 in an LLM it's basically in "decompress/rote mode". I don't think this is qualitatively the same as "copying", possibly more akin to "memorization". I haven't seen very many demonstrations of verbatim-copy output that wasn't done with a temperature of or near 0.


Also, you can't avoid plagiarism by paraphrasing because paraphrasing is a form of plagiarism. The key here is whether you cite the source, which the model doesn't

https://www.scribbr.com/frequently-asked-questions/is-paraph...


that's coming. some LLM's are already starting to do that


How is the fact that there is a flag to disable plagiarism relevant for the issue that there is plagiarism?


Because enabling "plagiarism mode" is a conscious action that a human takes, it does not default to "plagiarize" no more than a machine that has simply stored the verbatim copy of an article, when asked to print it out, is "plagiarizing". Plus, citations are showing up in LLM's now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: