Wikipedia would be a shambles without bots

bane · on July 25, 2012

The bots do make mistakes, however, if they encounter a new circumstance their programming cannot account for. ClueBot NG, the anti-vandalism bot, has a small rate of false positives - edits it mistakes for vandalism, but which are in fact legitimate.

Since Wikipedia closely tracks edits, however, mistakes can be repaired almost as quickly as they happened, administrators say.

I think fairly consistent commentary over the years demonstrates that this is patently false. Deletionists are capricious and arbitrary, reversals are beauracratic, complex and lengthy (if even possible) It dissapoints me to think about how much legitimate human knowledge has been wiped from WP, the authors discouraged and WP made poorer for it.

gwern · on July 25, 2012

> Deletionists are capricious and arbitrary, reversals are beauracratic, complex and lengthy (if even possible)

The article was clearly talking about repairing bot edits; since bots are almost* universally restricted to editor capabilities, all their actions are undoable. I have myself occasionally reverted bot edits - but very rarely, since generally bots are very good at what they do, and as the article says, they have made a huge difference in vandalism fighting.

(On the other hand, the article omits one of the major instances of bot abuse: the use of scripts/bots by a few editors to ram through removal of all the {{spoiler}} templates despite the absence of consensus to deprecate their use.)

* I remember a few proposals for admin-bots, but I don't remember whether they passed the Bot people or whether they were to do irreversible actions like deletion.

bane · on July 25, 2012

To be clear, in this context, I'm only talking about delete bots who kill new content seconds after it's created. A phenomenon so frustrating and common that there are probably tens of thousands of examples on open forums of this (including HN) and no doubt is a huge part of the massive reduction in new editors and new user engagement at WP (as tracked in several Foundation reports over the years -- but somehow always ignore this issue).

_delirium · on July 25, 2012

There aren't any bots that auto-delete articles. There are a handful of bots with administrator privileges, but they can only do things that humans have specified, with significant delay. For example, there was one (no longer active) that deleted empty categories that had been flagged by a human for deletion as empty, and not challenged for N days. There was also one (haven't followed up on whether it's still active) that deleted images that were tagged as copyright violations for more than N days.

bane · on July 25, 2012

They appear to be doing a marvelous job of failing the turing test then.

yichi · on July 25, 2012

> It dissapoints me to think about how much legitimate human knowledge has been wiped from WP

Wikipedia has never been about collecting sum of human knowledge, as it claims. It's about collecting sum of human knowledge that it can verify.

Wikipedia is designed from the beginning to be a tertiary source, and it's not an accident. This is why if you try to put any original content in it, it will usually get reverted, not because the content is wrong or even unhelpful, but the fact that nobody can verify it. (I'm not saying whether this is a good or a bad thing, it's just the way Wikipedia works) If you want to contribute original knowledge, don't use Wikipedia as a vehicle for doing so, you are better off writing to a blog/magazine/academic journal.

user49598 · on July 25, 2012

It dissapoints me to think about how much legitimate human knowledge has been wiped from WP

Then it must absolutely enthrall you to think about how much human knowledge is made available by wikipedia. You can be just as persistent as a "delitionist", theres nothing stopping you. It doesn't mean the system is broken just because some people are passionate about it.

bane · on July 25, 2012

It does enthrall me. But as time goes on, and the deletionist issue persists and becomes ingrained and is continuously ignored by WP and the Foundation, I can't help but think how much wasted potential is out there.

user49598 · on July 25, 2012

Did you downvote my comment because you don't agree with it or because you think it's non-constructive?

danielweber · on July 25, 2012

I didn't downvote you, but "if you don't like someone else's edit, edit it yourself" comes down to who can afford to spend more of their life babysitting Wikipedia.

I thought that line of reasoning was completed years ago. Yeah, the people who are more willing to waste their life on Wikipedia policy pages will win. Now, what are the bad effects of that?

user49598 · on July 25, 2012

Talk pages. If you have a problem with an article bring it up on the talk page. I don't see how else you could expect a free, user-edited, online encyclopedia to function.

For what it is, wikipedia is incredibly effective. If you don't care about wikipedia policy, how can you care about the exact content of wikipedia?

ars · on July 25, 2012

It used to be I would check the color of the talk tab, and if there was text there I'd go read it.

Then someone had the idea of putting a template with the "importance" of the page in the talk page. So now every page has a talk page with text, and I never check them.

It used to be I would ask questions in the talk page and get an answer within hours, now I'm lucky to get an answer that year! (Not exaggerating.)

The amount of content that has been lost by vandals changing text to garbage, and then someone comes and removes the garbage - but doesn't revert the edit, is staggering. I think it's time to auto-lock virtually all old pages, require a second opinion on every edit.

user49598 · on July 26, 2012

Vandalism is a bummer, but it's certainly not a reason to loose faith in wikipedia. "Staggering" loss of content is an enormous exaggeration, especially since the articles are versioned and the versions are easily comparable.

ars · on July 26, 2012

Unless you go searching in the history you'll never even know about all the missing content.

And yes, it is staggering, I am not exaggerating given that I've personally restored quite a number of pages - some of them have sat completely gutted of content for a year.

lmm · on July 26, 2012

There's no easy way to get, or even become aware of the existence of, an articles deleted as "not notable" or similar.

derleth · on July 25, 2012

> Deletionists are capricious and arbitrary

This is rather a broad brush, and a gross oversimplification of a complex debate that's been going on for a long time.

fghh45sdfhr3 · on July 25, 2012

Any sufficiently complex debate is indistinguishable arbitrary action.

mjn · on July 25, 2012

Reading some about it can help understand the issues, though. I'm honestly surprised that an online editable general-subject encyclopedia works as well as it does, and I think some of that is due to having pragmatically come up with approaches that sort-of-work over the years. I don't think it's reasonable to have a very strong opinion on it without some knowledge of the problems various policies were intended to solve, and the pros/cons of different approaches.

A curious thing with Wikipedia is that lots of people think they know the obviously right thing it should do, but many of these "obviously right" things are very different from each other. For example, co-founder Larry Sanger split and founded Citizendium because he thought Wikipedia was far too permissive in letting "unencyclopedic" crap into the encyclopedia, which is the exact opposite complaint of the people who are worried about "deletionism". That is also probably the most common complaint about Wikipedia from academics and in the mainstream media. The mainstream media gets particularly inflamed if something incorrect is found in Wikipedia, like the "Seigenthaler incident", and demands that Wikipedia should institute stricter edit controls.

One area of particular interest to me is where to draw the line on science articles. With no policies on inclusion at all, there would by now be thousands of physics articles on concepts that are not recognized by the physics literature, created by the same fringe-science people who post prolifically on Usenet, and cited to their own websites as a source. They tried to do so on Wikipedia in 2003-04, and that was the impetus for some of the policies such as "no original research" and "must be verifiable in reliable sources". (Fringe physics theories that are well-known and where third-party documentation exists, such as Time Cube, can of course still be covered.)

I wrote a bit more on that verifiability/notability history last year, in case anyone is interested: http://www.kmjn.org/notes/wikipedia_notability_verifiability...

derleth · on July 25, 2012

> Fringe physics theories that are well-known and where third-party documentation exists, such as Time Cube, can of course still be covered.

But only as fringe physics, which is where another great big bunch of ill-will towards Wikipedia is from: Even if your pet theory gets an article, it isn't going to be treated the same way as Quantum Electrodynamics. Most of the citations will be to the reliable sources, which are by the people who think you're a crackpot. This causes a lot of pain to the people who insist 'NPOV' means 'Treat my nuttery like it was a real Grown-Up Person science'.

espinchi · on July 25, 2012

Just some complementary information, extracted from http://en.wikipedia.org/wiki/Wikipedia:Bots (very interesting for those that didn't know much about the role of bots in the wikipedia).

Over 60 million edit operations have been performed by bots on the English Wikipedia.

Some bot examples are: Yobot, that categorizes individuals in categories by birth date, profession and other criteria, SineBot, that signs comments left on talk pages, MiszaBot, that archives talk pages, Xqbot, that solves double redirects, ...

Some bots, like RussBot (http://en.wikipedia.org/wiki/User:RussBot) have big red Emergency bot shutoff button.

gaius · on July 25, 2012

one called rambot created about 30,000 articles - at a rate of thousands per day - on individual towns in the US

Umm, no, someone wrote a script and ran it. This article massively overestimates what a "bot" is.

Sharlin · on July 25, 2012

That's not really a meaningful distinction especially in an article meant for a popular audience. Besides:

A bot (derived from 'robot') is an automated or semi-automated tool that carries out repetitive and mundane tasks in order to maintain the 4,009,598 articles of the English Wikipedia.

-- http://en.wikipedia.org/wiki/Wikipedia:Bots

Internet bots, also known as web robots, WWW robots or simply bots, are software applications that run automated tasks over the Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human alone.

-- http://en.wikipedia.org/wiki/Internet_bot

_delirium · on July 25, 2012

I agree they're all bot-ish, but in the context of Wikipedia, I think it's useful to distinguish. Possibly not for the audience of the original article, but they do pretty different things.

Heuristic reactive bots are an interesting component of the human/machine hybrid "Wikipedia immune system" that keeps most encyclopedia articles non-vandalized most of the time, despite that seeming implausible at first (the most surprising thing about Wikipedia, if you've ever run any other wiki, is that it doesn't get totally full of spam and garbage within hours).

Others are more like content-import scripts, running once; Rambot falls into that category. And still others are closer to external implementations of functionality that MediaWiki is missing internally. For example, MediaWiki lacks a "rename category" function, so there are helper bots that will "rename" categories by mass-removing every article in the category and mass-adding it to the new category.

pavel_lishin · on July 25, 2012

I wish they'd cited some examples of the mistakes. In general, this article doesn't offer much that the submission title didn't cover.

TazeTSchnitzel · on July 25, 2012

Yeah, I expected it to be slightly more in-depth and to maybe mention several particular bots and profile them, perhaps.