Google is linking court-protected names to online coverage

cromwellian · on Sept 21, 2017

I wonder what this says about AI in the future, since you can imagine that as AI becomes more powerful, it starts to make inferences that aren't explicit, and discovers connections that aren't obvious. In some cases, those connections will be wrong, in others, they will be right, but whether or not harm comes from inferred facts, it'll be hard to make a system aware of it and foolproof, as even human beings can't yet do this.

And how would this be tested in court? If a human being and a machine can both "read between the lines", a human has the right to publish a blog article about his investigation, but a machine cannot? If the conclusions are 100% fact and not slander/libel, what does this mean for free speech?

It seems to me some kind of global blacklist would be needed for these protected individuals, that courts could update, but then we'd have to be on guard about abuse of these, both from hackers, and also governments (e.g. let's add Paul Manafort to the blacklist) using them beyond the intended purpose.

solomatov · on Sept 21, 2017

> I wonder what this says about AI in the future, since you can imagine that as AI becomes more powerful, it starts to make inferences that aren't explicit, and discovers connections that aren't obvious.

That's why we should invest more in AI interpretability. I heard some famous researchers saying that it's not important, but for these reasons it actually is.

QAPereo · on Sept 21, 2017

I thought this might be minor, but oh god, no,

A Google search for the name of an Ottawa-based RCMP officer convicted of confining, starving and abusing his son links to coverage of that court case. The officer’s identity is protected by a judge’s order designed to shield his son from publicity.

The officer’s name had never been reported by the Citizen or any other media outlet. The abused boy, now 15, was never identified in any article published online. Yet a search of the boy’s name produces results that link to coverage of the case.

That’s just horrendous, and in a world of social media, inescapable for the boy.

michaelmrose · on Sept 21, 2017

Well criminal acts that result in punishment are by definition matters of public record and familial data is pretty easy to figure out even if you can't figure it out in 2 seconds via google.

Since you can't keep the former secret it seems hard to imagine successfully keeping the latter secret.

Also the stigma attaches here to the victimizer who rather deserves it. Nobody is going to deny him employment or decline to date him based on being abused.

calbear81 · on Sept 21, 2017

Could it be that some people who do know the identity of the victims are running searches with some type of pattern like "John Doe nightclub abuse ontario" and then clicking on the link to the reports about the case which Google then translates to relevant articles to "John Doe" in the future?

jlgaddis · on Sept 21, 2017

It could be. That's what two people quoted in TFA suggested.

cwmma · on Sept 21, 2017

or it could be pages that mention the people link to the articles

dsfyu404ed · on Sept 21, 2017

This. Chatter on the public web gets scraped and a connection is inferred by some alogithm somewhere.

However, what's an inside joke (or just sarcasm) to the in-ground might not be a joke to google.

A link to a news story to the tune of "boy steals car and crashes it into river" is obviously a joke when introduced as "look it's $driver's son" in the context of $driver's performance at a formula off road event. Sarcasm on the internet is hard for people to read and harder for machines. While the overall percent of straight up sarcasm and jokes are probably pretty small there's probably a lot of other noise signals in there as well that reduce accuracy.

When you start introducing data from other sources (IP addresses, geolocation, usage patterns) it gets very easy to spot correlations.

I'd assume Google is very good at making these sort of connections out to a few degrees of separation.

newman8r · on Sept 21, 2017

I think you may be onto something, I just posted the same thing and then saw your comment

calbear81 · on Sept 21, 2017

Makes sense, especially if someone divulges this in the comments like "that happened to my friend's son John Doe - read about the terrible event here!"

newman8r · on Sept 21, 2017

they claim it wasn't from meta. I'm curious if other sites/forums/social media accounts had connected the name to the story

for example, someone might post on facebook "hey look at this crazy article {link}, isn't this {name} from school?"

and google used this data to show the result

jlgaddis · on Sept 21, 2017

> I'm curious if other sites/forums/social media accounts had connected the name to the story

That's one of the theories mentioned in TFA.

newman8r · on Sept 22, 2017

you're right. I stopped reading at 'algorithms 101' the first time. reading comprehension test failed.

dsfyu404ed · on Sept 21, 2017

I expect they use metadata to put some sort of confidence/quality rating on the link though.

The primary source might not be metadata but once the connection is made between the two pieces of content metadata is probably used to evaluate it.

c3534l · on Sept 21, 2017

What method did they use to determine this was a genuine phenomenon? There's so many "weasel words" as wikipedia calls it in this article. "Computer experts." Yeah, which computer experts? They're just generic experts in computers, not any actual field within computer science? They found 6 results, but obviously those results aren't made available for us, or anyone else, to scrutinize and fact-check.

jlgaddis · on Sept 21, 2017

There are a couple of "computer science professors" named in TFA, along with lawyers and a Google employee or two.