Hacker News new | past | comments | ask | show | jobs | submit login

An LLM can be trained to find relevant knowledge online. It doesn’t have to be trained on the entirety of all existing knowledge.



> An LLM can be trained to find relevant knowledge online.

Why do you think chatGPT lost its Web Search plugin lately? Copyright lawsuits. You can't even use copyrighted content in the prompt because it will make the model makers liable.


But this doesn’t make sense - how is using ChatGPT to find information different from using a search engine? Especially if ChatGPT clearly lists its sources?


Good luck when everything is paywalled


Given the already huge cost of training, and the evident lack of concern the LLM folks seem to have for copyright, why wouldn't the AI groups purchase subs to scrape the paywalled content?

The would possibly need to apply some effort to appear human, but that should only throttle the rate, not stop their scraping all together.


It's more difficult to scrape pay walled content no?

Clearly, places like Reddit have wised up to this and are making API usages non-free for example, so while it's not impossible, you can see the limitations being put into place already. Twitter is another one.

It seems like all this data is now considered gold and people lock up gold?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: