Well you could say same thing about the answers that Google displays on it's pag...

ssharp · on Feb 5, 2023

The training model data sets have inconsistent respect for robots.txt. Also, I believe most of these models are not continuously crawling websites to update their data like a search engine does. That means if you're crawled once, you may not be crawled again and you'll still be in the datasets.

I'd also argue that Google directing traffic to your website is a good alignment of incentives. ChatGPT spitting out answers derived from your work with nothing given back to you in return is not.

tqwhite · on Feb 5, 2023

I bet that fully half the time, I read the google answer, click on nothing and go on my way.

altdataseller · on Feb 5, 2023

That's still better than 0%

2OEH8eoCRo0 · on Feb 5, 2023

The idea that a robots.txt will save you is laughable.

z3c0 · on Feb 5, 2023

Agreed. At best, you can disallow: / and hope they're polite enough to listen.

I can't seem to find anything on OpenAI's crawler agent, so I'm skeptical they're considering robots.txt at all.

2OEH8eoCRo0 · on Feb 5, 2023

Even if they abide, this is capitalism. Somebody who wants an edge won't. Or OpenAI or Google will get desperate and stop abiding.

JohnFen · on Feb 5, 2023

True. Robots.txt is already a very weak thing. I disallow all access using robots.txt, but there are many crawlers who ignore it and I have to maintain an overt blocklist for them.

hobs · on Feb 5, 2023

It's the lack of attribution that really hurts, though I think its fairly shady of google to steal the ad revenue from smaller sites.

theRealMe · on Feb 5, 2023

You can ask ChatGPT to cite its sources.

jefftk · on Feb 5, 2023

You can ask it to, but it will just make up sources. The connection between its knowledge and the original sources is not represented in the model.

(This is an active area of research, though, and version of GPT that could cite its sources is something people widely agree would be valuable.)

Spooky23 · on Feb 5, 2023

You can ask for the source of information.

altdataseller · on Feb 5, 2023

They still link to the source though. Even when they show a snippet.