We recently gave a workshop on k-anonymity (+l-diversity and t-closeness) and differential privacy, you can find the iPython notebooks and slides here:
In the workshop we implement the "Mondrian algorithm" to produce a k-anonymous dataset. We then look at the problems of this approach (i.e. missing diversity in the sensitive attribute) and try to fix it using l-diversity (which is also not optimal) and finally t-closeness. The third notebook includes an implementation of a differentially private "randomized response" scheme, showing how it changes the data and how we can take into account the added noise when working with the randomized data.
I think it's important to keep in mind that k-anonymity and differential privacy are not algorithms but mathematical privacy definitions. To implement them, you need a suitable method like the "Mondrian" algorithm or a randomized response scheme.
If you have any questions or suggestions for improvements please open an issue or PR on Github!
https://github.com/KIProtect/data-privacy-for-data-scientist...
In the workshop we implement the "Mondrian algorithm" to produce a k-anonymous dataset. We then look at the problems of this approach (i.e. missing diversity in the sensitive attribute) and try to fix it using l-diversity (which is also not optimal) and finally t-closeness. The third notebook includes an implementation of a differentially private "randomized response" scheme, showing how it changes the data and how we can take into account the added noise when working with the randomized data.
I think it's important to keep in mind that k-anonymity and differential privacy are not algorithms but mathematical privacy definitions. To implement them, you need a suitable method like the "Mondrian" algorithm or a randomized response scheme.
If you have any questions or suggestions for improvements please open an issue or PR on Github!