Melbourne professor quits after government pressure about reporting data breach

DarthGhandi · on March 8, 2020

This is horrible but not surprising, the government was told beforehand it was a bad idea and within a few months ended up with egg on their faces. Instead of remedying the situation they shoot the messenger.

Dr Teague was also part of the team that found flaws in the Swiss e-voting system used in Australia state elections, nothing was done about and she was written off, the attack was deemed impractical as it required a corrupt official.

She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.

Aeolun · on March 8, 2020

> she was written off, the attack was deemed impractical as it required a corrupt official

I think that hit a bit too close to home for most of the government.

oska · on March 8, 2020

Vanessa Teague:

> I can't believe @healthgovau is still saying "The dataset does not contain the personal information of patients." We have shown many of the patients' records can be easily and confidently identified from a few points of medical or childbirth info.

https://twitter.com/VTeagueAus/status/1236402085974798336

ShroudedNight · on March 8, 2020

> "The dataset does not contain the personal information of patients."

As far as I can tell, 'personal information' is potentially the only thing this data set contains. Further, the information is so personal that the Australian government hoped that it would be infeasible to cross-reference it with other data and use it to identify the persons involved.

DEADBEEFC0FFEE · on March 8, 2020

She might be referring to something link the SLK581 statistical linking method.

I did some work with it a few years ago, and you easily generate the key.

Thorrez · on March 8, 2020

> The breach so shocked the government, the then attorney general, George Brandis, quickly announced plans to criminalise the act of re-identifying previously de-identified data, although ultimately the legislation never passed before the 2019 election.

If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

rs23296008n1 · on March 8, 2020

If you're going to start using logic and reason with this issue then that government will simply outlaw those as well. This government has already set a precedent of having overridden the basic limits of mathematics before. See also: anything to do with encryption.

Tecuane · on March 8, 2020

Relevant article, for the curious: https://www.independent.co.uk/news/malcolm-turnbull-prime-mi...

incompatible · on March 8, 2020

Former prime minister. His mathematics quote was a worded a bit strangely, but I think he was basically saying https://www.xkcd.com/538/

brokenmachine · on March 9, 2020

No, he was literally saying that he didn't care about reality.

He was a lawyer, so would be quite aware of the meaning of the words he was uttering.

incompatible · on March 9, 2020

I doubt that he imagined that he could write a law that would force the encryption algorithms to yield to the ASIO. It was all about using a big stick to force people to help break encryption, by inserting back doors etc.

Sure he was a lawyer and banker and probably never should have been involved with things like encryption and the NBN, but that's politics for you.

shakna · on March 8, 2020

> If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

The letter sent to the university [0], claims that re-identifying information is actually illegal, according to the department's understanding. (Nevermind that they also admit that particular law is completely irrelevant to the work of the researcher).

[0] https://www.righttoknow.org.au/request/correspondence_on_re_...

tastroder · on March 8, 2020

Fascinating read, thank you. Another kicker in there seems to be bottom of page 2, start of page 3. They basically assert that now that the data has been taken down no harm can be done anymore and to top it off, suggest that presenting the findings a GovData conference because there subsequently is "no public interest" anymore.

dathinab · on March 8, 2020

You don't have to go that far.

Anyone wanting to abuse the information (i.e. a criminal) would not really care to commit a crime by reidentifying so this law would only prevent people who want to help from doing so.

In the end the are always intelligent criminals (or foreign countries acting against your country for their interest). So you will always be able to buy deanonymized data on the black market. Even more the people doing it for that reason can include leaked/stolen datasets to do deanonymization invested if just public data making it potentially much easier.

throwawayjava · on March 8, 2020

Consumer data brokers are a legal (non-criminal) business and they would definitely reidentify then well information if it weren't illegal.

Aeolun · on March 8, 2020

This sounds so much like a ‘stick my head in the sand’ policy that it’s unreal. Who the hell would stop doing it because it’s illegal...

DoofusOfDeath · on March 8, 2020

When organizations claim to have "anonymized" a data set, what exactly does that mean?

I.e., do they mean that nobody they talked to could think of a way to recover the identity of even one individual in the set with 100% certainty? Or is there some information-theoretical or legal standard of anonymization they're claiming to have met?

MaulingMonkey · on March 8, 2020

> When organizations claim to have "anonymized" a data set, what exactly to that mean?

For "organizations" in general? It means approximately nothing, or if you're feeling particularly generous, it means "we probably remembered to drop the column containing your social security number before publishing this data... this time". You're asking exact specifics of a vague and broad category.

There are some legal standards, information theory, and non-legal organization standards that might being met in some cases - involving adding noise or removing data / making it sparse. https://en.wikipedia.org/wiki/Data_re-identification goes into all the ways that it can go wrong despite the best of intentions. My basic take on this all is: data "always" gets more identifying, not less. Two datasets that were successfully anonymized individually can still be correlated to de-anonymize some or all of the data when combined. Even organizations applying information theory with the best of intentions and proper diligence will eventually make a mistake.

emmelaich · on March 8, 2020

In this particular example, they produced a random number which use the real id[0] as a seed, then mixed the result with the original id. It was not enough, and, as Teague et al note:

> Indeed, encryption was not necessary – a randomly chosen unique number for each person would have worked.

Scroll down from here: https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

[0] The data had ids for providers (e.g. doctors) as well as patients.

throwawayjava · on March 8, 2020

There are some mathematical definitions [1], but the fundamental problem is that with enough cross-referencing between databases it's hard to say anything for sure [2]. You never know what data other people might publish in the future.

I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.

[1] https://en.wikipedia.org/wiki/K-anonymity

[2] https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...

DEADBEEFC0FFEE · on March 8, 2020

In hwalthcare there usually an ethics panel, that will look at the data, and look for way to reduce re-identification.

The common example is the one-legged child with cancer from a remote town. You can remove a the PII columns and it's pretty easy to find that person.

rzzzt · on March 8, 2020

One way around that is to drop all cases below a certain occurrence threshold, ie. if there aren't at least 1000 people in the same town with the same condition, they aren't getting into the dataset.

(The downside is that rare diseases might fall through the cracks.)

alfiedotwtf · on March 8, 2020

How long before she gets raided and her copy of the dataset and research gets taken away

... all the while as the government forgets that it’s all available on the internet ️

raxxorrax · on March 8, 2020

This is the worst kind of personal data leak. Government cannot keep any data safe. The only way is to not collect the information. The reaction of the government is predictable and poor.

Now it has hit Australia, but it could be have been any other country since data collection seems to be en vogue. Probably gives the impression of control, the usual.

basicplus2 · on March 8, 2020

There needs to be Australian Standards developed that everyone must comply with to annonymise personal data

emmelaich · on March 8, 2020

There was a guideline: "Process for Publishing Sensitive Unit Record Level Public Data as Open Data"

and now a standard: "Privacy (Australian Government Agencies – Governance) APP Code 2017"

See https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

eloop · on March 8, 2020

If the university followed through on that last paragraph why did she resign?

kop316 · on March 8, 2020

To anyone coming to the comments, the title is misleading. The health department is pressuring her "to stop her speaking out about the Medicare and PBS history of over 2.5 million Australians being re-identifiable online due to a government bungle."

rstuart4133 · on March 10, 2020

I'm not sure what is misleading about it. She has actually resigned from the University or Melbourne: https://twitter.com/VTeagueAus/status/1233241830994481152

kop316 · on March 10, 2020

The title changed since I posted this. The original title implied the professor leaked the offending data.

aschatten · on March 8, 2020

The title is just horrible.

dang · on March 8, 2020

The best way to complain about a title is to suggest a better one. Better means: more accurate and neutral, preferably using representative language from the article. When someone suggests a better title, we're happy to change it.

Edit: I've taken a crack at fixing it now.

nbgl · on March 8, 2020

Yeah, I agree. I had copied it verbatim from the article.

forkexec · on March 8, 2020

Pardon my ignorance, but it seems like there should be standard ways of irrevocably anonymizing data and reversible means given a private key.

Off the top of my head, only the latter is necessary if throwing away a random key for the previous to be equivalent (or run the plaintext through SHA-3 20 times in feedback instead.). Say 100 rounds of AES-256 in feedback. Fixed integer-only fields could be XORed with a private key of the length of the field (OTP).

Any other ideas, please add a comment.

akiselev · on March 8, 2020

Yes, turning data into a bunch of (ideally) random bits using encryption is an effective way of annonimizing.

yoloClin · on March 8, 2020

Just because the primary key is gone doesn't mean other data can't be cross referenced. Birth dates with external sources, addresses with public registers, etc.