Well you could say same thing about the answers that Google displays on it's pages instead of search results! If you don't want these crawlers to index your content I am pretty sure you can disable via robots.txt just like Google.
The training model data sets have inconsistent respect for robots.txt. Also, I believe most of these models are not continuously crawling websites to update their data like a search engine does. That means if you're crawled once, you may not be crawled again and you'll still be in the datasets.
I'd also argue that Google directing traffic to your website is a good alignment of incentives. ChatGPT spitting out answers derived from your work with nothing given back to you in return is not.
True. Robots.txt is already a very weak thing. I disallow all access using robots.txt, but there are many crawlers who ignore it and I have to maintain an overt blocklist for them.