Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> LLM’s may even get superseded by something else, but whatever form AI takes training it is going to require work from other people outside the company in question.

It's going to require training data, but no incremental work is actually being done; it's being trained on things that were written for an independent purpose and would still have been written whether they were used as training data or not.

If something was actually written for the sole purpose of being training data, it probably wouldn't even be very good for that.

> Attribution is solvable both at a technical and legal level.

Based on how this stuff works, it's actually really hard. It's a statistical model, so the output generally isn't based on any single thing, it's based a fraction of a percent each on thousands of different things and the models can't even tell you which ones.

When they cite sources I suspect it's not even the model choosing the sources from training data, it's a search engine providing the sources as context. Run a local LLM and see what proportion of the time you can get it to generate a URL with a path you can actually load.

> Presumably the best solution for finding value is let the open market decide the rough negotiations.

That's exactly the thing that doesn't work here because of the transaction costs. If you write a blog, are you supposed to negotiate with Google so they can pay you half a french fry for using it as training data? Neither party has any use for that; the cost of performing the negotiations is more than the value of the transaction. But the aggregate value being lost if it can't be used as a result of that is significant, because it's a tiny amount each but multiplied by a billion.

And then what would happen in practice? Google says that in exchange for providing you with video hosting, you agree to let them use anything you upload to YouTube as training data. And then only huge conglomerates can do AI stuff because nobody else is in a position to get millions of people to agree to that term, but still none of the little guys are getting paid.

Restricting everyone but massive conglomerates from doing AI training in order to get them to maybe transfer some money exclusively to some other massive conglomerates is a bad trade off. It's even a bad trade off for the media companies who do not benefit from stamping out competitors to Google and the incumbent social media giants that already have them by the neck in terms of access to user traffic.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: