No it does not, it would be irresponsible to do that on private data. There's a very clear line between data posted publicly and data held privately, especially in terms of copyright. I doubt it will ever be default opt-in for something as sensitive e-mail and docs.
One exception to that is scanning for CSAM and Terrorism and DMCA. And with DMCA, it's automated based on file hash, and you still maintain access to your files, you are just limited from sharing them. Ads in gmail aren't based on content, but other online activity while logged in.
I think the other exception to that is smart compose. AI models do use email content for training data, but the output of those are strictly for use locally while writing emails. I imagine it's also siloed per user.
EDIT:
Not a google employee, I apologize if my assertions seem too strong.
"We have always maintained that you control your data and we process it according to the agreement(s) we have with you. Furthermore, we will not and cannot look at it without a legitimate need to support your use of the service -- and even then it is only with your permission. Here are some of the additional measures we take to ensure your privacy: (reference: GCP Terms).
In addition to these commitments, for AI/ML development, we don’t use data that you provide us to train our own models without your permission. And if you want to work together to develop a solution using any of our AI/ML products, by default our teams will work only with data that you have provided and that has identifying information removed. We work with your raw data only with your consent and where the model development process requires it. "
You mentioned would be "irresponsible to do that on private data" but Google it seems were doing that, until at least 2017. Or I am incorrect about it?
We have no evidence either way. And referring to someone who's weirdly in favor of a multi billion dollar corporate opposed to the rights of individuals at least an agent of that group seems appropriate verbage
>No it does not, it would be irresponsible to do that on private data.
Doing irresponsible things on private data is hot business model of the day. I'm not saying it's google; I'm saying common expectations about "responsibility" are worse than useless.
>We have always maintained that you control your data and we process it according to the agreement(s) we have with you.
Ah the "we surveil you fair and square, get over it" clause.
Nothing in the quotes you posted preclude them using my emails to train AI. They say things like "we process your data according to the agreements we have with you," but that isn't a denial. A denial would be "we don't store or process your data to train AI".
They're just implying they don't process your data, while actually saying "the answer to your question lies in the text of the service agreements". Evasive at best.
In the absence of a denial, and the presence of an obvious motive to do so, I have to say my guess is they do use your gdocs and gmail data to train AI.
Idk why you posted the GCP terms of service as evidence. I also don't think Google uses emails and docs, but there is definitely a higher bar with company data than with normal user data.
> I think the other exception to that is smart compose. AI models do use email content for training data, but the output of those are strictly for use while writing emails. I imagine it's also siloed per user.
So if there is an AI model for each user trained on the user's writing does that mean Google now also has the means to forge convincing emails?
Not in general true. Something I write in a diary I keep under my pillow has the exact same copyright status as something I publish on my blog (assuming no Creative Commons, etc. licenses).
,, One exception to that is scanning for CSAM and Terrorism and DMCA''
It's enough to have one exception. We have to assume that a language model will be trained from them and government officials are using it soon. Just think about how much it can tell about the next planned terrorist events with their organizers.
There is no such thing as "scanning for DMCA". DMCA prescribes a process for reacting to complaints of copyright infringement, not a process for preemptively scanning content for material that might potentially generate such complaints.
>No it does not, it would be irresponsible to do that on private data. [...] One exception to that is [describes two exceptions] I think the other exception to that is [...]
One exception to that is scanning for CSAM and Terrorism and DMCA. And with DMCA, it's automated based on file hash, and you still maintain access to your files, you are just limited from sharing them. Ads in gmail aren't based on content, but other online activity while logged in.
I think the other exception to that is smart compose. AI models do use email content for training data, but the output of those are strictly for use locally while writing emails. I imagine it's also siloed per user.
EDIT: Not a google employee, I apologize if my assertions seem too strong.
EDIT2: https://en.wikipedia.org/wiki/Federated_learning
"We have always maintained that you control your data and we process it according to the agreement(s) we have with you. Furthermore, we will not and cannot look at it without a legitimate need to support your use of the service -- and even then it is only with your permission. Here are some of the additional measures we take to ensure your privacy: (reference: GCP Terms).
In addition to these commitments, for AI/ML development, we don’t use data that you provide us to train our own models without your permission. And if you want to work together to develop a solution using any of our AI/ML products, by default our teams will work only with data that you have provided and that has identifying information removed. We work with your raw data only with your consent and where the model development process requires it. "
https://cloud.google.com/blog/products/ai-machine-learning/g...
https://support.google.com/mail/answer/6603?hl=en
https://arxiv.org/abs/1906.00080
https://ai.googleblog.com/2017/04/federated-learning-collabo...