Guaranteed value is a licensing payment that compensates the publisher for allowing OpenAI to access its backlog of data, while variable value is contingent on display success, a metric based on the number of users engaging with linked or displayed content.
...
“The PPP program is more about scraping than training,” said one executive. “OpenAI has presumably already ingested and trained on these publishers’ archival data, but it needs access to contemporary content to answer contemporary queries.”
This also makes sense if they're trying to get into the search space.
Whether training a model on text constitutes copyright infringement is an unresolved legal question. The closest precedent would be search engines using automated processes to build an index and links, which is generally not seen as infringing (in the US).
No, they have not done that. Presumably they believe that the model training was done in fair use and no court has said otherwise yet.
It will take years for that stuff to settle out in court, and by that time none of that will matter, and the winners of the AI race will be those who didn't wait for this question to be settled.
Its not just the big companies you have to think about, lol.
Sure you can sue OpenAI.
But will you be able to sue every single AI startup that happens to be working on Open Source AI tech, that was all trained this way? Absolutely not. Its simply not feasible. The cat is out of the bag.
My point stands. Thats like one guy. Thats not ""an entire industry gets shutdown by the government".
That was my point. Sure, they might go after like one guy or one company. They aren't going to take out half of the tech startups in all of the US though. They also aren't going to confiscate everyone's gamer PCs.
I also think its funny that you literally posted a wikipedia page, where in the page itself it contains the "illegal" numbers.
So that proves my entire point. Your best example, is apparently an example where I can access the "illegal" information on a literal public wikipedia page!
...
“The PPP program is more about scraping than training,” said one executive. “OpenAI has presumably already ingested and trained on these publishers’ archival data, but it needs access to contemporary content to answer contemporary queries.”
This also makes sense if they're trying to get into the search space.