As far as I'm aware training does not currently constitute "piracy". It's fine t...

gnomewascool · on June 1, 2023

I think the point here is about the procurement of the training data, in violation of copyright laws ("piracy"), rather than that the training itself is piracy.

The suspicion[0] is that OpenAI trained their models on a large text dump including libgen (in the so-called "books2").

If a person downloads a book from Library Genesis, they're a pirate; if OpenAI does it, so are they.

[0] https://twitter.com/theshawwn/status/1320282152689336320