Hacker News new | past | comments | ask | show | jobs | submit login
K-anonymity (wikipedia.org)
136 points by dedalus on Aug 11, 2018 | hide | past | favorite | 11 comments



A friend is writing a good series of blogposts about formal privacy definitions. The first one was about k-anonymity: https://desfontain.es/privacy/k-anonymity.html

Follow-ups:

- k-map: https://desfontain.es/privacy/k-map.html

- l-diversity: https://desfontain.es/privacy/l-diversity.html

- δ-presence: https://desfontain.es/privacy/delta-presence.html

- differential privacy: https://desfontain.es/privacy/differential-privacy-awesomene...


We recently gave a workshop on k-anonymity (+l-diversity and t-closeness) and differential privacy, you can find the iPython notebooks and slides here:

https://github.com/KIProtect/data-privacy-for-data-scientist...

In the workshop we implement the "Mondrian algorithm" to produce a k-anonymous dataset. We then look at the problems of this approach (i.e. missing diversity in the sensitive attribute) and try to fix it using l-diversity (which is also not optimal) and finally t-closeness. The third notebook includes an implementation of a differentially private "randomized response" scheme, showing how it changes the data and how we can take into account the added noise when working with the randomized data.

I think it's important to keep in mind that k-anonymity and differential privacy are not algorithms but mathematical privacy definitions. To implement them, you need a suitable method like the "Mondrian" algorithm or a randomized response scheme.

If you have any questions or suggestions for improvements please open an issue or PR on Github!


A recent common usage is by the https://haveibeenpwned.com leaked password check database, worked on in part by the Cloudflare team:

https://www.troyhunt.com/ive-just-launched-pwned-passwords-v...

https://blog.cloudflare.com/validating-leaked-passwords-with...


Okta's PassProtect Chrome extension also uses k-anonymity.

https://www.okta.com/blog/2018/05/add-passprotect-to-your-we...


Looks like that's just a wrapper around the Have I Been Pwned API, of which there are many examples. 1Password built it into their password manager which is probably one of the best integrations for it.


If you’re curious about the matter, there’s an alternative technique called differential privacy that adds randomization to the data, and is able to provide guarantees that individual records cannot be identified. k-anonymity is subject to e.g. linkage attacks of the kind that differential privacy seeks to eliminate.


Bonus: The differential privacy algorithm is so simple. "When you're about to insert some data, flip a coin. If it's heads, insert it as-is. If it's tails, insert a random value."


But by adding random data, differential privacy destroys the ability to apply the usual statistical analysis techniques to the dataset.


I brought k-anonymity and l-diversity to nodeJS a couple of weeks ago.

https://www.npmjs.com/package/node-mondrian


ARX [1, 2] is an open source software that (among other features) supports most of the methods mentioned in this thread. Full disclosure: I'm one of the developers of ARX.

[1] Website: http://arx.deidentifier.org

[2] Source: https://github.com/arx-deidentifier/arx


We wrote a review of anonymisation methods here: https://hazy.com/files/Hazy_Anonymisation_Review_110718.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: