How is suing OpenAI different from suing an army of gifted individuals who can s...

danieldk · 2024-01-07T14:42:17 1704638537

Because larger LLMs can reproduce text verbatim, speed readers probably not:

https://nitter.net/maksym_andr/status/1740776900786626608

pmarreck · 2024-01-07T18:40:20 1704652820

So what this tells me is that if you make a carefully constructed prompt and change the temperature to zero (not an option normally available), and know the exact article title and perhaps the beginning of the first paragraph (which is pretty difficult to do unless you already basically have access to the full article), you can potentially get verbatim articles back out.

Great, so all those paywalls that use verbatim subjects and early texts as a teaser are now broken! Well, for all old articles that the LLM was trained on, at least, I guess. Very simple countermeasure is to simply use AI to paraphrase the subject and article (paradoxically) until the person pays for the privilege of reading human-authored text.

I'm sorry but this is a very weak argument. For example, I don't even believe normal users have access to the GPT4 system prompt unless they use the API directly (and possibly not even then, I'd have to check).

eviks · 2024-01-07T16:03:34 1704643414

One is real, another one is made up (and there is plagiarism, of course, many pages of examples of that)

pmarreck · 2024-01-10T00:18:25 1704845905

The temperature setting controls randomness output of the LLM.

We paraphrase all the time to avoid plagiarism and that's just somewhat randomized retelling of the same idea.

If you set the temperature to 0 in an LLM it's basically in "decompress/rote mode". I don't think this is qualitatively the same as "copying", possibly more akin to "memorization". I haven't seen very many demonstrations of verbatim-copy output that wasn't done with a temperature of or near 0.

eviks · 2024-01-10T06:55:26 1704869726

Also, you can't avoid plagiarism by paraphrasing because paraphrasing is a form of plagiarism. The key here is whether you cite the source, which the model doesn't

https://www.scribbr.com/frequently-asked-questions/is-paraph...

pmarreck · 2024-01-11T05:08:30 1704949710

that's coming. some LLM's are already starting to do that

eviks · 2024-01-10T06:49:40 1704869380

How is the fact that there is a flag to disable plagiarism relevant for the issue that there is plagiarism?

pmarreck · 2024-01-11T05:10:05 1704949805

Because enabling "plagiarism mode" is a conscious action that a human takes, it does not default to "plagiarize" no more than a machine that has simply stored the verbatim copy of an article, when asked to print it out, is "plagiarizing". Plus, citations are showing up in LLM's now.