While the topic is intriguing, I dislike the use of "public services" for this t...

kmeisthax · on Sept 14, 2023

Someone tried something similar but with higher risk: inserting security backdoors into the Linux kernel. They were caught and (AFAIK) their entire school was permabanned from sending pull requests.

hakre · on Sept 14, 2023

Are you commenting about April of 2021 and do you mean the University of Minnesota?

- https://news.ycombinator.com/item?id=26887670

- https://news.ycombinator.com/item?id=26949539

- https://news.ycombinator.com/item?id=26955414

minetest2048 · on Sept 14, 2023

This was also my thought. Search for hypocrite commits, and a link to an lwn article: https://lwn.net/Articles/853717/ . They did ban their whole school

carbocation · on Sept 14, 2023

The department said they'd report their findings to the community[1]. I wonder if they ever did?

1 = https://cse.umn.edu/cs/statement-cse-linux-kernel-research-a...

Pazzaz · on Sept 14, 2023

It's the other two links on that page:

- https://cse.umn.edu/cs/open-letter-linux-community-april-24-...

- https://cse.umn.edu/cs/statement-computer-science-engineerin...

carbocation · on Sept 14, 2023

Thanks for pointing that out. That second link seems to close things out.

hakre · on Sept 14, 2023

Isn't community plural? Perhaps they meant their own community but didn't say specifically which one?

tobyjsullivan · on Sept 14, 2023

I'm of quite the opposite opinion. Within reason (importantly), I believe any public service, which is also managed by an anonymous, decentralized community, ought to be under test constantly and by anyone. What's the alternative, really?

Imagine if it was taboo to independently test the integrity of bitcoin for example.

The sibling mentioned the linux kernel case. I admit that one felt wrong. It was a legitimate waste of contributor time and energy, with the potential to open real security holes.

I don't pretend to have reconciled why one seems right to me and the other wrong.

dataflow · on Sept 14, 2023

> Imagine if it was taboo to independently test the integrity of bitcoin for example.

> The sibling mentioned the linux kernel case. I admit that one felt wrong.

> I don't pretend to have reconciled why one seems right to me and the other wrong.

The "how" is what matters here, not just the "what". "Testing the integrity of Bitcoin" by breaking the hash on your own machine (and publishing the results, or not) is one thing. "Testing" it by sending transactions that might drain someone else's wallet is quite another. Similarly with Linux, hacking it on your own machine and publishing the result is one thing. Introducing a potential security hole on others' machines is another. Similarly with water: messing with your own drinking water is one thing. Messing with someone else's water is quite another.

lll-o-lll · on Sept 14, 2023

> Similarly with Linux, hacking it on your own machine and publishing the result is one thing. Introducing a potential security hole on others' machines is another.

Playing devils advocate for a moment. How else do you test the robustness of the human process to prevent bad actors? Don’t you need someone to attempt to introduce a security hole to know that you are robust to this kind of attack?

xxs · on Sept 14, 2023

You do it w/ a buy-in, e.g. permission from some of the maintainers - so they are aware. If you do not get permission, you do nothing. It's similar to penetration testing/

dataflow · on Sept 14, 2023

Interestingly, while I 100% agree with you regarding the parent's question about security holes, I'm actually not sure how an experiment like the one on Wikipedia could be performed even with proper buy-in from all the owning entities (Wikimedia Foundation?) Is it even in principle possible to test this ethically without risking misleading the users (the public)? If not, does that mean it's better if nobody researches it at all? The best I can think of is by making edits that as harmless as possible, but their very inconsequentiality would make them inherently less likely for them to be removed. Any thoughts?

xxs · on Sept 14, 2023

The usual answer is the chain of trust. However, that might be against the wikipedia principles. There is "importance scale" for articles, for anything considered C+ class important, editing becomes similar to pull request, or the page has a warning of having unverified info.

It's a hard problem having fully editable storage by anyone, while maintaining integrity.

maxbond · on Sept 14, 2023

This seems really easy to test ethically.

You sift through the edit log to find edits correcting factual errors.

Then you find the edit where the error was introduced.

You can probably let an LLM do the first pass to identify likely candidates. With maybe 20 hours of work you could probably identify hundreds of factual errors. (Number is drawn from a hat.)

dataflow · on Sept 14, 2023

How do you find the factual errors that weren't corrected to figure out what the correction rate was?

maxbond · on Sept 14, 2023

Excellent point. That's more difficult but I think the ethical way to do it would be to recruit subject matter experts to fact check articles across a variety of disciplines. Bonus, you can then contribute corrections.

In general what I'm saying is, this is a fertile ground for natural experiments. We don't need to manufacture factual errors in Wikipedia. They occur naturally.

dataflow · on Sept 14, 2023

I mean, you're asking for a retrospective study, as opposed to a randomized controlled trial. It's useful and a great idea, but it's not like it's an equivalent way of getting equal quality data.

maxbond · on Sept 14, 2023

But is the goal to conduct a randomized controlled trial, or to measure the correction rate within the bounds of ethics? You go to war with the army you have.

dataflow · on Sept 14, 2023

Well the goal is to measure the correction rate within the bounds of ethics, but the question is how accurate the result would be without an RCT. Intuitively I would hope it's accurate, but how would you know without an experiment actually doing it? How do you know there aren't confounding factors greatly skewing the result?

maxbond · on Sept 14, 2023

If you'll grant that we're also able to replicate the study many times, we're left with errors that are not caught by Wikipedians or independent teams of experts. At that point I think we're looking at errors that have been written into history - the kind of error that originates in primary sources and can't be identified through fact checking. We could maybe estimate the size of that set by identifying widely-accepted misconceptions that were later overturned, but then we're back to my first suggestion and your objection to it.

But more importantly we probably won't catch that sort of error by introducing fabrications, either. Fabrications might replicate a class of error we're interested in, but if we just throw it onto Wikipedia, it's not going to be a longstanding misunderstanding which is immune to fact checking (at least without giving it a lot of time to develop into a citogenesis event, but that's exactly the kind of externality we're trying to avoid).

(Of course, "how many times do we need to replicate it?" remains unanswered. I think maybe after we have several replications and have data on false negatives by our teams of experts, we could come up with an estimate.)

dataflow · on Sept 14, 2023

> Playing devils advocate for a moment. How else do you test the robustness of the human process to prevent bad actors? Don’t you need someone to attempt to introduce a security hole to know that you are robust to this kind of attack?

How do you test that the White House perimeters are secure, or that the president is adequately protected by the Secret Service?

hp6 · on Sept 14, 2023

I think the key difference is supervision, is there another party keeping an eye on what is tested and how. And maybe insuring no permanent damage is done at the end.

dredmorbius · on Sept 14, 2023

That's frankly one of the first thoughts that came to my mind.

I've asked the author about ethical review and processes on the Fediverse.

That said, both Wikipedia and the Linux kernel (mentioned in another response to this subthread) should anticipate and defend against either research-based or purely malicious attacks.

tetris11 · on Sept 14, 2023

If it's a mature product, you should be able to pick it up and rattle it without it breaking. If it's still maturing, then maybe the odd shock here and there will prepare it for maturity?

maxbond · on Sept 14, 2023

It's true that the system must be tolerant to these sorts of faults, but that doesn't mean we have a right to stress it. The margin for error is not infinite, and by consuming some of it we increase the likelihood of errors going undetected for longer.

Sometimes it will be worth it anyway, and I don't have an opinion about this Wikipedia example, but I think it's pretty uncontroversial that the Linux example was out of line.

viknesh · on Sept 14, 2023

I think one would have to weigh the pros and cons of this kind of research. In particular, the main cons (IMO) are:

* users are misled about facts * trust is lost in Wikipedia * other users/organizations use this as a blueprint to insert false information

Harm 3 seems to be the most serious, but I suspect it has happened/will happen irrespective of this research. As opposed to the water reservoir example, these harms seem quite small by contrast. I would have liked to see a section discussing this in the blog post, but perhaps that's included in the original paper.

hakre · on Sept 14, 2023

Everything was reverted with 48 hours, your arguments might all apply theoretically but given scope, size, practice and handling, I wonder - apart from the theory - what your opinion is how they practically apply for this case.

viknesh · on Sept 18, 2023

I didn't make it very clear, but I agree that the specific example isn't problematic. The false claims weren't meant to be any sort of targeted disinformation, and like you mention they reverted it in 48 hours.