Hacker News new | past | comments | ask | show | jobs | submit login

> with the only privacy guarantees being that the data is encrypted during transport, and a "promise" that they will run internal audits to make sure private data isn't released from their servers.

There's much more than that, including: privacy and security review before a study launches, a data minimization requirement, a sandboxed data analysis environment with strict access controls, and IRB oversight for academic studies.

> IMO this seems to provide worse privacy than even Google and Micro$oft's telemetry, which at least use differential privacy to make sure that each individual's privacy is somewhat protected (the data you send is randomised so even if the aggregator is compromised by a malicious third party (e.g. NSA) individuals have some degree of plausible deniability).

The vast majority of Google and Microsoft telemetry does not involve local differential privacy. Google, in fact, has almost entirely removed local differential privacy (RAPPOR) from Chrome telemetry [1].

We've been examining the feasibility of local differential privacy for Rally. The challenge for us—and why local differential privacy has limited deployment—is that the level of noise makes answering most (often all) research questions impossible.

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=101690...




Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?

E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."

It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.


> Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?

Yes. Central differential privacy is a very promising direction for datasets that result from studies on Rally.

> It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.

I've done a little re-identification research, and my faculty neighbor at Princeton CITP wrote the seminal Netflix paper, so we take this quite seriously.


Interesting. I can see that RAPPOR seems to be deprecated in favor of something else called ukm (Url-keyed metrics) but not why this change is being made. Is there somewhere I can read more about it?


I am not aware of any public announcement or explanation. Which is... probably intentional, since Google is removing a headline privacy feature from Chrome.


How did you learn about it? By studying the code?


Our team looked closely at the Google, Microsoft, and Apple local differential privacy implementations when building Rally. It helped that we have friends who worked on RAPPOR.


Did you end up using differential privacy in Rally? What's the thinking behind this?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: