Privacy questions aside (as important as they of course are), it's very important to know what a model was trained on exactly: if Wikipedia was used in the training set, you can't use questions from Wikipedia to test it (as that would be cheating) - test data must be as "unseen" as a good exam.
The Google BERT paper (Devlin et al., 2018) also references it: https://aclanthology.org/N19-1423/
Privacy questions aside (as important as they of course are), it's very important to know what a model was trained on exactly: if Wikipedia was used in the training set, you can't use questions from Wikipedia to test it (as that would be cheating) - test data must be as "unseen" as a good exam.