Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ChatGPT lists clickable sources in a lot of nontrivial queries. Those sites don’t even need to pay OpenAI for the traffic (yet). If you ask „what’s happening in the world today“, you might get 20 links. How is this worse, exactly?


How many people click the links? What happens to LLMs if people don’t provide training data anymore because nobody visits their sites?


Cloudflare publishes a "crawl-to-refer" ratio, which can be used to estimate the traffic from LLMs:

https://radar.cloudflare.com/ai-insights#crawl-to-refer-rati...


They will either pay for it to be generated or get good enough at producing synthetic data that actually improves LLM quality.


So either even higher costs and hope that a bug problem of LLMs get solved somehow.

Given how much data they need that will be pretty expensive, I mean really really expensive. How many people can write good training data and how much per day?

Doesn’t sound sustainable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: