The Google Books project also faced a copyright lawsuit, which was eventually decided in favor of Google.
After contacting major publishers about possibly licensing their books, [former head of the Google Books project] bought physical books in bulk from distributors and retailers, according to court documents. He then hired outside organizations to dissemble the books, scan them and create digital copies that could be used to train the company’s AI. technologies.
Judge Alsup ruled that this approach was fair use under the law. But he also found the company’s previous approach — downloading and storing books from shadow libraries like Library Genesis and Pirate Library Mirror — was illegal.
That wasn't done as a play for venture capital. The Google Books project began before eBooks existed; in the 2000s, they spent money on all kinds of projects that had no real strategy for monetization. I remember Google Books being a valuable resource as it digitized books that were out of print. Back when they actually cared about making information available widely.
That's not really the point, though, is it? Now Anthropic can afford to buy books and get them scanned. They likely didn't have the money or time to do that before.
And even if they didn't use the illegally-obtained work to train any of the models they released, of course they used them to train unreleased prototypes and to make progress at improving their models and training methods.
By engaging in illegal activity, they advanced their business faster and more cheaply than they otherwise would have been able to. With this settlement, other new AI companies will see it on the record that they could face penalties if they do this, and will have to go the slower, more expensive route -- if they can even afford to do so.
It might not make it impossible, but it makes the moat around the current incumbents just that much wider.
Didn't Google have a long standing project to do just that?
https://en.wikipedia.org/wiki/Google_Books