Kind of just skimmed the paper, but isn't it weird that researchers from Google would publish a paper that's basically about how Wikipedia hasn't properly adapted to its contents being used for training data sets?
I haven't seen anything yet on how much recent Wikipedia content is generated by systems like ChatGPT... it appears to be an issue however:
> "Wikipedians like Knipel imagine that ChatGPT could be used on Wikipedia as a tool without removing the role of humanity. For them, the initial text that’s generated from the chatbot is useful as a starting place or a skeletal outline. Then, the human verifies that this information is supported by reliable sources and fleshes it out with improvements. This way, Wikipedia itself does not become machine-written. Humans remain the project’s special sauce."
However, wouldn't a GPT trained exactly on Wikipedia be quite useful? It would be the biggest user-editable training material for a language model that can be asked about things.
In addition to the obvious use of responding to questions based on the material, perhaps it could be a tool for finding e.g. if and how a cited source relates to the article where it was cited. Abuse detection could also be one application.
I couldn't exactly find out what the goal of Wikipedia is from https://en.wikipedia.org/wiki/Wikipedia but it doesn't seem a "better search" would be opposite to those goals.
Claims that GPT produces "better search" are groundless until GPT demonstrably produces "better search" and the resulting product has been observed for unintended consequences. Gonna be a wait.
Oh boy, the best place for subtle errors. A public wiki, being proofread by someone who is likely not a subject matter expert! I don’t see anyway this can go poorly. And obviously search hasn’t been solved for decades.