Hacker News new | past | comments | ask | show | jobs | submit login

Well you could say same thing about the answers that Google displays on it's pages instead of search results! If you don't want these crawlers to index your content I am pretty sure you can disable via robots.txt just like Google.



The training model data sets have inconsistent respect for robots.txt. Also, I believe most of these models are not continuously crawling websites to update their data like a search engine does. That means if you're crawled once, you may not be crawled again and you'll still be in the datasets.

I'd also argue that Google directing traffic to your website is a good alignment of incentives. ChatGPT spitting out answers derived from your work with nothing given back to you in return is not.


I bet that fully half the time, I read the google answer, click on nothing and go on my way.


That's still better than 0%


The idea that a robots.txt will save you is laughable.


Agreed. At best, you can disallow: / and hope they're polite enough to listen.

I can't seem to find anything on OpenAI's crawler agent, so I'm skeptical they're considering robots.txt at all.


Even if they abide, this is capitalism. Somebody who wants an edge won't. Or OpenAI or Google will get desperate and stop abiding.


True. Robots.txt is already a very weak thing. I disallow all access using robots.txt, but there are many crawlers who ignore it and I have to maintain an overt blocklist for them.


It's the lack of attribution that really hurts, though I think its fairly shady of google to steal the ad revenue from smaller sites.


You can ask ChatGPT to cite its sources.


You can ask it to, but it will just make up sources. The connection between its knowledge and the original sources is not represented in the model.

(This is an active area of research, though, and version of GPT that could cite its sources is something people widely agree would be valuable.)


You can ask for the source of information.


They still link to the source though. Even when they show a snippet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: