> Data anonymized with Amnesia are *statistically guaranteed* that they cannot b...

_wmhc · on Dec 2, 2020

However that statistical guarantee also requires your pseudoidentifiers to be picked correctly, i.e. it only holds true if you select all variables the attacker could possibly know about a subject. I think that is the hard part here, it's not something I would recommend someone doing without a lot of research and experience for highly dimensional data.

antisyzygy · on Dec 2, 2020

Right. Even if you assume the worst-case-scenario there isn't some standard risk metric nor threshold to meet.

I feel like differential privacy is the strongest definition we have, but it is also lacking from a practical standpoint. What does it mean to have N nats/bits of information gain from seeing the result of a query? How does this translate to my risk of a PII leak?