I wonder what this says about AI in the future, since you can imagine that as AI becomes more powerful, it starts to make inferences that aren't explicit, and discovers connections that aren't obvious. In some cases, those connections will be wrong, in others, they will be right, but whether or not harm comes from inferred facts, it'll be hard to make a system aware of it and foolproof, as even human beings can't yet do this.
And how would this be tested in court? If a human being and a machine can both "read between the lines", a human has the right to publish a blog article about his investigation, but a machine cannot? If the conclusions are 100% fact and not slander/libel, what does this mean for free speech?
It seems to me some kind of global blacklist would be needed for these protected individuals, that courts could update, but then we'd have to be on guard about abuse of these, both from hackers, and also governments (e.g. let's add Paul Manafort to the blacklist) using them beyond the intended purpose.
> I wonder what this says about AI in the future, since you can imagine that as AI becomes more powerful, it starts to make inferences that aren't explicit, and discovers connections that aren't obvious.
That's why we should invest more in AI interpretability. I heard some famous researchers saying that it's not important, but for these reasons it actually is.
A Google search for the name of an Ottawa-based RCMP officer convicted of confining, starving and abusing his son links to coverage of that court case. The officer’s identity is protected by a judge’s order designed to shield his son from publicity.
The officer’s name had never been reported by the Citizen or any other media outlet. The abused boy, now 15, was never identified in any article published online. Yet a search of the boy’s name produces results that link to coverage of the case.
That’s just horrendous, and in a world of social media, inescapable for the boy.
Well criminal acts that result in punishment are by definition matters of public record and familial data is pretty easy to figure out even if you can't figure it out in 2 seconds via google.
Since you can't keep the former secret it seems hard to imagine successfully keeping the latter secret.
Also the stigma attaches here to the victimizer who rather deserves it. Nobody is going to deny him employment or decline to date him based on being abused.
Could it be that some people who do know the identity of the victims are running searches with some type of pattern like "John Doe nightclub abuse ontario" and then clicking on the link to the reports about the case which Google then translates to relevant articles to "John Doe" in the future?
This. Chatter on the public web gets scraped and a connection is inferred by some alogithm somewhere.
However, what's an inside joke (or just sarcasm) to the in-ground might not be a joke to google.
A link to a news story to the tune of "boy steals car and crashes it into river" is obviously a joke when introduced as "look it's $driver's son" in the context of $driver's performance at a formula off road event. Sarcasm on the internet is hard for people to read and harder for machines. While the overall percent of straight up sarcasm and jokes are probably pretty small there's probably a lot of other noise signals in there as well that reduce accuracy.
When you start introducing data from other sources (IP addresses, geolocation, usage patterns) it gets very easy to spot correlations.
I'd assume Google is very good at making these sort of connections out to a few degrees of separation.
Makes sense, especially if someone divulges this in the comments like "that happened to my friend's son John Doe - read about the terrible event here!"
What method did they use to determine this was a genuine phenomenon? There's so many "weasel words" as wikipedia calls it in this article. "Computer experts." Yeah, which computer experts? They're just generic experts in computers, not any actual field within computer science? They found 6 results, but obviously those results aren't made available for us, or anyone else, to scrutinize and fact-check.
And how would this be tested in court? If a human being and a machine can both "read between the lines", a human has the right to publish a blog article about his investigation, but a machine cannot? If the conclusions are 100% fact and not slander/libel, what does this mean for free speech?
It seems to me some kind of global blacklist would be needed for these protected individuals, that courts could update, but then we'd have to be on guard about abuse of these, both from hackers, and also governments (e.g. let's add Paul Manafort to the blacklist) using them beyond the intended purpose.