Hacker News new | past | comments | ask | show | jobs | submit login

> Data anonymized with Amnesia are statistically guaranteed that they cannot be linked to the original data.

It looks like (from other text on their site) they use variants on k-anonymity. This can prevent re-linking attacks back to the original data, but we've also known for a decade that this isn't especially strong. For example, two independent k-anonymous releases can unique identify everyone in the dataset[0].

[0]: https://dl.acm.org/doi/10.1145/1401890.1401926




However that statistical guarantee also requires your pseudoidentifiers to be picked correctly, i.e. it only holds true if you select all variables the attacker could possibly know about a subject. I think that is the hard part here, it's not something I would recommend someone doing without a lot of research and experience for highly dimensional data.


Right. Even if you assume the worst-case-scenario there isn't some standard risk metric nor threshold to meet.

I feel like differential privacy is the strongest definition we have, but it is also lacking from a practical standpoint. What does it mean to have N nats/bits of information gain from seeing the result of a query? How does this translate to my risk of a PII leak?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: