Part of that back-and-forth is the claim "this specific text was copied a lot all over the internet making it show up more in the output", and that means it's not a useful guide to things where one copy was added to The Pile and not removed when training the model.
(Or worse, that Google already had a copy because of Google Books and didn't think "might training on this explode in our face like that thing with the Street View WiFi scanning?")