Okay, so the scale at which they sale their service is a good argument that this...

Covenant0028 · on Aug 30, 2023

The idea that individual source snippets at this scale diminish in aggregation, is undercut by the fact that OpenAI and MSFT are both selling enterprise-flavoured versions of GPT, and the one thing they promise is that enterprise data will not be used to further train GPT.

That is a fear for companies because the individual source snippets and the knowledge "learned" from them is seen as a competitive advantage of which the sources are an integral part - and I think this is a fair point from their side. However then the exact same argument should apply in favour of paying the artists, writers, coders etc whose work has been used to train these models.

So it sounds like they are trying to have their cake and eat it too.

fxnn · on Sept 6, 2023

Hmm. You sure this is the same thing? I would say it’s more about confidentiality than about value.

Because what companies want to hide are usually secrets, that are available to (nearly) no one outside of the company. It’s about preventing accidental disclosure.

What AIs are trained on, on the other hand, is publicly available data.

To be clear: what could leak accidentally would have value of course. But here it’s about the single important fact that gets public although it shouldn’t, vs. the billions of pieces from which the trained AI emerges.

gaganyaan · on Aug 29, 2023

It's really not different in scale. Imagine for a moment how much storage space it would take to store the sensory data that any two year old has experienced. That would absolutely dwarf the text-based world the largest of LLMs have experienced.