Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Kind of just skimmed the paper, but isn't it weird that researchers from Google would publish a paper that's basically about how Wikipedia hasn't properly adapted to its contents being used for training data sets?


Wikipedia is interested in being an online encyclopaedia. They are not obliged to accommodate the needs of AI researchers unless they want to.


I haven't seen anything yet on how much recent Wikipedia content is generated by systems like ChatGPT... it appears to be an issue however:

> "Wikipedians like Knipel imagine that ChatGPT could be used on Wikipedia as a tool without removing the role of humanity. For them, the initial text that’s generated from the chatbot is useful as a starting place or a skeletal outline. Then, the human verifies that this information is supported by reliable sources and fleshes it out with improvements. This way, Wikipedia itself does not become machine-written. Humans remain the project’s special sauce."

https://slate.com/technology/2023/01/chatgpt-wikipedia-artic...

This might create a kind of structural-organizational conformity in all articles, which doesn't sound so great.


i've been using it for proposal copywrite in the same manner.


Yeah, my thoughts exactly. I see it as these AI researchers pointing out how to attack Wikipedia, so Wikipedia is getting indirectly forced here.

Feels wrong to me.


Can you elaborate?

I see this paper as explicitly giving Wikipedia useful information and the ability to make decisions.

I think keeping these things transparent is good for Google.


Training AI isn't a core goal of Wikipedia so the notion that this information is useful to them seems questionable.


However, wouldn't a GPT trained exactly on Wikipedia be quite useful? It would be the biggest user-editable training material for a language model that can be asked about things.

In addition to the obvious use of responding to questions based on the material, perhaps it could be a tool for finding e.g. if and how a cited source relates to the article where it was cited. Abuse detection could also be one application.

I couldn't exactly find out what the goal of Wikipedia is from https://en.wikipedia.org/wiki/Wikipedia but it doesn't seem a "better search" would be opposite to those goals.


Claims that GPT produces "better search" are groundless until GPT demonstrably produces "better search" and the resulting product has been observed for unintended consequences. Gonna be a wait.


Oh boy, the best place for subtle errors. A public wiki, being proofread by someone who is likely not a subject matter expert! I don’t see anyway this can go poorly. And obviously search hasn’t been solved for decades.


They paper is also telling them how to poison their databases, so if they really wanted to avoid being used in that way they could do.

The paper also tells them how to solve that issue, which is what google would prefer. But they are letting wikipedia make that choice.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: