This is horrible but not surprising, the government was told beforehand it was a bad idea and within a few months ended up with egg on their faces. Instead of remedying the situation they shoot the messenger.
Dr Teague was also part of the team that found flaws in the Swiss e-voting system used in Australia state elections, nothing was done about and she was written off, the attack was deemed impractical as it required a corrupt official.
She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.
> I can't believe @healthgovau is still saying "The dataset does not contain the personal information of patients."
We have shown many of the patients' records can be easily and confidently identified from a few points of medical or childbirth info.
> "The dataset does not contain the personal information of patients."
As far as I can tell, 'personal information' is potentially the only thing this data set contains. Further, the information is so personal that the Australian government hoped that it would be infeasible to cross-reference it with other data and use it to identify the persons involved.
> The breach so shocked the government, the then attorney general, George Brandis, quickly announced plans to criminalise the act of re-identifying previously de-identified data, although ultimately the legislation never passed before the 2019 election.
If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?
If you're going to start using logic and reason with this issue then that government will simply outlaw those as well. This government has already set a precedent of having overridden the basic limits of mathematics before. See also: anything to do with encryption.
I doubt that he imagined that he could write a law that would force the encryption algorithms to yield to the ASIO. It was all about using a big stick to force people to help break encryption, by inserting back doors etc.
Sure he was a lawyer and banker and probably never should have been involved with things like encryption and the NBN, but that's politics for you.
> If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?
The letter sent to the university [0], claims that re-identifying information is actually illegal, according to the department's understanding. (Nevermind that they also admit that particular law is completely irrelevant to the work of the researcher).
Fascinating read, thank you. Another kicker in there seems to be bottom of page 2, start of page 3. They basically assert that now that the data has been taken down no harm can be done anymore and to top it off, suggest that presenting the findings a GovData conference because there subsequently is "no public interest" anymore.
Anyone wanting to abuse the information (i.e. a criminal) would not really care to commit a crime by reidentifying so this law would only prevent people who want to help from doing so.
In the end the are always intelligent criminals (or foreign countries acting against your country for their interest). So you will always be able to buy deanonymized data on the black market. Even more the people doing it for that reason can include leaked/stolen datasets to do deanonymization invested if just public data making it potentially much easier.
When organizations claim to have "anonymized" a data set, what exactly does that mean?
I.e., do they mean that nobody they talked to could think of a way to recover the identity of even one individual in the set with 100% certainty? Or is there some information-theoretical or legal standard of anonymization they're claiming to have met?
> When organizations claim to have "anonymized" a data set, what exactly to that mean?
For "organizations" in general? It means approximately nothing, or if you're feeling particularly generous, it means "we probably remembered to drop the column containing your social security number before publishing this data... this time". You're asking exact specifics of a vague and broad category.
There are some legal standards, information theory, and non-legal organization standards that might being met in some cases - involving adding noise or removing data / making it sparse. https://en.wikipedia.org/wiki/Data_re-identification goes into all the ways that it can go wrong despite the best of intentions. My basic take on this all is: data "always" gets more identifying, not less. Two datasets that were successfully anonymized individually can still be correlated to de-anonymize some or all of the data when combined. Even organizations applying information theory with the best of intentions and proper diligence will eventually make a mistake.
In this particular example, they produced a random number which use the real id[0] as a seed, then mixed the result with the original id. It was not enough, and, as Teague et al note:
> Indeed, encryption was not necessary – a randomly chosen unique number for each person would have worked.
There are some mathematical definitions [1], but the fundamental problem is that with enough cross-referencing between databases it's hard to say anything for sure [2]. You never know what data other people might publish in the future.
I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.
One way around that is to drop all cases below a certain occurrence threshold, ie. if there aren't at least 1000 people in the same town with the same condition, they aren't getting into the dataset.
(The downside is that rare diseases might fall through the cracks.)
This is the worst kind of personal data leak. Government cannot keep any data safe. The only way is to not collect the information. The reaction of the government is predictable and poor.
Now it has hit Australia, but it could be have been any other country since data collection seems to be en vogue. Probably gives the impression of control, the usual.
To anyone coming to the comments, the title is misleading. The health department is pressuring her "to stop her speaking out about the Medicare and PBS history of over 2.5 million Australians being re-identifiable online due to a government bungle."
The best way to complain about a title is to suggest a better one. Better means: more accurate and neutral, preferably using representative language from the article. When someone suggests a better title, we're happy to change it.
Pardon my ignorance, but it seems like there should be standard ways of irrevocably anonymizing data and reversible means given a private key.
Off the top of my head, only the latter is necessary if throwing away a random key for the previous to be equivalent (or run the plaintext through SHA-3 20 times in feedback instead.). Say 100 rounds of AES-256 in feedback. Fixed integer-only fields could be XORed with a private key of the length of the field (OTP).
Just because the primary key is gone doesn't mean other data can't be cross referenced. Birth dates with external sources, addresses with public registers, etc.
Dr Teague was also part of the team that found flaws in the Swiss e-voting system used in Australia state elections, nothing was done about and she was written off, the attack was deemed impractical as it required a corrupt official.
She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.