These seem to be cases where Google might be collecting less data in one area, but where they can easily supplement or cross-reference the data from another area to not actually lose any information.
For example, they mention Rapor for Chrome, but with almost all websites having Google analytics installed, they clearly have full data on what the user is doing regardless.
For Google maps it may be used to show how busy a restaurant is, but they still collect location history, query history, route history, etc from users, so it's trivial for the data to still be used or reconstructed.
Even mentioning usage of differential privacy for a single feature of Gmail seems pointless to me when they obviously not only have everyone's emails in full history, but also develop countless algorithms to scrape content from emails (e.g. purchase history from common retailers).
Perhaps I'm too cynical, but at least to me it seems to be a very common pattern. I'd personally bet that differential privacy techniques that actually give users notable information-theoretic anonymity are very rarely used by Google in general. A few usage example of differential privacy are good and better than none, but with a company of their size I don't think it (yet) makes any true statement.
You seem to be missing what differential privacy is. It's not about collection of data, it's about the _use_ of that data. It's no secret that Google has an incredible amount of logging data, but the ways we can use it are very limited. Folks seem to be under the impression that we can wily-nily just go ahead and build products that harvest everything about you and link up the dots across organizations. That's so funny, because it'd make things so much easier sometimes. :P
Instead, we have very strict privacy rules and experts to review the designs for the use of this data. If I even want to train a ML model over real data I have to have an approved privacy review that shows how you maintain privacy.
Where I use differential privacy algorithms in my line of work is to do ad-hoc analysis over suggestions placed in front of users. I have dimensions to aggregate across, but I want to ensure that no one bucket can deanonymize a user. k-anonymity used to be the thing (e.g. if a bucket has <50 people in it, that's too few), but even a large bucket can deanonymize users which is where k-anonymity comes in. I sincerely don't care who the users are, I just want to know how our features get used to try and save them more time.
Do I have access to the underlying logs? Yes. Can I use that to make decisions? No. I can however use the anonymized data to make decisions, and even store that longer than the underlying data exists (most logs exist for <14d).
Differential privacy also makes it possible to train models like SmartCompose by ensuring that the tokens it trains over are diffuse enough to not point back to any one person.
> I'd personally bet that differential privacy techniques that actually give users notable information-theoretic anonymity are very rarely used by Google in general.
For existing things, sure. They did their best, but this is new, reified research. As they're replaced they're being replaced by features which use differential privacy techniques.
I appreciate the quality response. A lot of the focus here seems to be 'prevent other consumers from finding things out about our users', which is good and important. I usually think more about it from Google's perspective, which is that they have the data, and perhaps they're not using it for X right now, but they have the potential to, and that potential is what creates this significant power imbalance and centralization that I'm often concerned over.
Obviously Google employees cannot go around reading+using all of my personal communications for whatever they want to, but just that Google has all of them, to me, is too much power given to a single actor, even if they are generally not abusing this power.
With those said, differential privacy is still a great tech, so it's still great that they're open-sourcing and encouraging things like this. But I'll likely remain concerned about the centralization of the world's data at the same time.
> Even mentioning usage of differential privacy for a single feature of Gmail seems pointless to me when they obviously not only have everyone's emails in full history, but also develop countless algorithms to scrape content from emails (e.g. purchase history from common retailers).
That's an unsubstantiated claim, and I don't think that it can be in any way reconciled with Google's own privacy policies.
Google is going to bring differential privacy framework to Chrome before deprecating third party cookies. Which means, it's going to incorporate DP into their ads product. Maybe this makes you feel better.
- Gboard
- Smart Compose
- Chrome
- Maps
https://blog.google/technology/safety-security/privacy-every...