I still don't really get how compensation is supposed to work just based on the ...

Retric · 2025-09-30T16:00:28 1759248028

I think you’re overestimating the number of authors, and forgetting there’s several AI companies. A revenue sharing agreement with 10% going to creators isn’t unrealistic.

Google’s revenue was 300 billion with 100 billion in profits last year, the AI industry may never reach that size but 1$/person on the planet is only 8 billion dollars, drop that to 70% of people are online so your down to 5.6 billion.

That’s assuming you’re counting books and individual Facebook posts in any language equally. More realistically there’s only 12k professional journalists in the US but they create a disproportionate amount of value for AI companies.

AnthonyMouse · 2025-09-30T20:15:31 1759263331

> Google’s revenue was 300 billion with 100 billion in profits last year, the AI industry may never reach that size but 1$/person on the planet is only 8 billion dollars, drop that to 70% of people are online so your down to 5.6 billion.

Google is a huge conglomerate and a poor choice for making estimates because the bulk of their revenue comes from "advertising" with no obvious way to distinguish what proportion of that ad revenue is attributable to AI, e.g. what proportion of search ad revenue is attributable to being the same company that runs the ad network, and to being the default search in Android, iOS and Chrome? Nowhere near all of it or even most of it is from AI.

"Counting books and individual Facebook posts in any language equally" is kind of the issue. The links from the AI summary things are disproportionately not to the New York Times, they're more often to Reddit and YouTube and community forums on the site of the company whose product you're asking about and Stack Overflow and Wikipedia and random personal blogs and so on.

Whereas you might have written an entire book, and that book is very useful and valuable to human readers who want to know about its subject matter, but unless that subject matter is something the general population frequently wants to know about, its value in this context is less than some random Facebook post that provides the answer to a question a lot of people have.

And then the only way anybody is getting a significant amount of money is if it's plundering the little guy. Large incumbent media companies with lawyers get a disproportionate take because they're usurping the share of YouTube creators and Substack authors and forum posters who provided more in aggregate value but get squat. And I don't see any legitimacy in having it be Comcast and the Murdoch family who take the little guy's share at the cost of significant overhead and making it harder for smaller AI companies to compete with the bigger ones.

Retric · 2025-09-30T23:46:27 1759275987

> Google is a huge conglomerate

The point of comparison was simple a large company here, the current size of say OpenAI when the technology is still fairly shitty is a poor benchmark for where the industry is going. LLM’s may even get superseded by something else, but whatever form AI takes training it is going to require work from other people outside the company in question.

Attribution is solvable both at a technical and legal level. There’s a reasonable argument a romance novelist isn’t contributing much value, but that’s not an argument nobody should be getting anything. Presumably the best solution for finding value is let the open market decide the rough negotiations.

AnthonyMouse · 2025-10-01T06:45:14 1759301114

> LLM’s may even get superseded by something else, but whatever form AI takes training it is going to require work from other people outside the company in question.

It's going to require training data, but no incremental work is actually being done; it's being trained on things that were written for an independent purpose and would still have been written whether they were used as training data or not.

If something was actually written for the sole purpose of being training data, it probably wouldn't even be very good for that.

> Attribution is solvable both at a technical and legal level.

Based on how this stuff works, it's actually really hard. It's a statistical model, so the output generally isn't based on any single thing, it's based a fraction of a percent each on thousands of different things and the models can't even tell you which ones.

When they cite sources I suspect it's not even the model choosing the sources from training data, it's a search engine providing the sources as context. Run a local LLM and see what proportion of the time you can get it to generate a URL with a path you can actually load.

> Presumably the best solution for finding value is let the open market decide the rough negotiations.

That's exactly the thing that doesn't work here because of the transaction costs. If you write a blog, are you supposed to negotiate with Google so they can pay you half a french fry for using it as training data? Neither party has any use for that; the cost of performing the negotiations is more than the value of the transaction. But the aggregate value being lost if it can't be used as a result of that is significant, because it's a tiny amount each but multiplied by a billion.

And then what would happen in practice? Google says that in exchange for providing you with video hosting, you agree to let them use anything you upload to YouTube as training data. And then only huge conglomerates can do AI stuff because nobody else is in a position to get millions of people to agree to that term, but still none of the little guys are getting paid.

Restricting everyone but massive conglomerates from doing AI training in order to get them to maybe transfer some money exclusively to some other massive conglomerates is a bad trade off. It's even a bad trade off for the media companies who do not benefit from stamping out competitors to Google and the incumbent social media giants that already have them by the neck in terms of access to user traffic.