Microsoft Word and Excel AI data scraping switched to opt-in by default

qup · 2024-11-26T13:26:25 1732627585

Is this the correct use of "opt-in?"

To me, having things "opt-in" means they're off and you can turn them on if you want.

If it's "opt-out" it's automatically on, and you can turn it off.

elAhmo · 2024-11-26T13:32:20 1732627940

Likewise, I think the title is literally of opposite what is actually happening.

alt227 · 2024-11-26T13:33:02 1732627982

I think they mean 'Enabled by default'

mejutoco · 2024-11-26T13:38:59 1732628339

Thus opt-out would be the correct term.

jyunwai · 2024-11-26T13:30:27 1732627827

You are correct. The headline author likely meant "opted in by default" or "enabled by default."

Ukv · 2024-11-26T14:10:41 1732630241

> Microsoft's Connected Experiences feature automatically gathers data from Word and Excel files to train the company's AI models. This feature is turned on by default, meaning user-generated content is included in AI training unless manually deactivated.

Not to say that Microsoft products respect privacy, but I don't see evidence that user Word/Excel files are being used for training.

The linked services agreement has had the same language (copy/transmit/etc. "to the extent necessary to provide the services") since at least 2015[0], and "connected experiences" seems to group a wide range of integrations; some like dictation/translation probably utilise ML, but that does not mean training on user content.

[0]: https://web.archive.org/web/20150608000921/https://www.micro...

itishappy · 2024-11-26T15:10:30 1732633830

To play devil's advocate, I don't see any evidence they're NOT training on user content either. Compared to how explicitly they indicate they're not using user content for targeted advertising, this seems like a huge oversight. Given how carefully they've put together these documents, I'm doubtful it was an oversight.

cptskippy · 2024-11-26T20:59:18 1732654758

I think it's appropriate to be concerned and seek clarification. And I don't like people immediately seeking to vilify Microsoft as if they came over to their house and shot their dog in front of their kids.

Eddy_Viscosity2 · 2024-11-29T12:54:27 1732884867

> And I don't like people immediately seeking to vilify Microsoft

Microsoft has the reputation for villainy that it has because of the the villainy it has done. You not liking it doesn't mean it isn't deserved.

ca_tech · 2024-11-26T14:20:23 1732630823

Agreed. This was raised within our corp the other week and we read through the privacy and security documentation as it relates to Connected Experiences. Microsoft has outlined specifically what Connected Experiences covers.[1] [2] You could argue that predictive text is a product of machine learning but there is no clause allowing for training any generalized large language models using this data. The confusion may have arisen, if they read an article about CoPilot. If the user had a Microsoft Copilot 365 license, then the data would be used as grounding for their personal interaction with CoPilot. But still not used to train any foundational LLMs. However, even this data is still managed in compliance with Microsoft's data security and privacy agreements.

[1] https://learn.microsoft.com/en-us/microsoft-365-apps/privacy...

[2] https://learn.microsoft.com/en-us/microsoft-365-apps/privacy...

hulitu · 2024-11-27T20:52:22 1732740742

> Not to say that Microsoft products respect privacy

"Your privacy is very important for us" when you need to install an extension to have a blank start page (without ads) in Edge.

HelloUsername · 2024-11-26T14:00:03 1732629603

"In the M365 apps, we do not use customer data to train LLMs. This setting only enables features requiring internet access like co-authoring a document." @Microsoft365 https://twitter.com/Microsoft365/status/1861160874993463648

binarymax · 2024-11-26T17:24:41 1732641881

It’s absurd that Microsoft 365 uses Twitter as its official support announcement platform.

Official announcements about the outage the last couple days we’re posted there.

Twitter is a hostile platform that requires an account to view. Why does MS365 continue to use it?

Smar · 2024-11-26T22:25:48 1732659948

I wonder whether Twitter or M365 is more hostile towards users...

tjqgG · 2024-11-26T13:43:33 1732628613

A word processor stealing the user's IP by default should carry massive fines in the EU. This is pure deception. 20% of annual revenue should be appropriate.

jmclnx · 2024-11-26T14:52:46 1732632766

Hopefully full pretax revenue for Microsoft and all their subsidizes.

alt227 · 2024-11-26T13:32:29 1732627949

This seems like a security shit show.

Can we disable it by group policy across entire domains?

Surely no business would ever allow Microsoft to 'reformat, display, and distribute' confidential company documents?

Or am I missing something.

Thorrez · 2024-11-26T14:03:55 1732629835

Well, if there's some sort of cloud feature allowing you to share documents you write with others, it would make sense you would have to allow Microsoft to "reformat, display, and distribute" for the purpose of providing you that service.

However, the terms of service says "To the extent necessary to provide the Services to you and others, [...] and to improve Microsoft products and services". So they're saying they can use your content not just to provide you service, but to provide other people service and to improve all Microsoft products.

alt227 · 2024-11-26T16:06:15 1732637175

> it would make sense you would have to allow Microsoft to "reformat, display, and distribute" for the purpose of providing you that service.

That would be me sharing a specific document with a specific person. If their terms sepcified that they would only ""reformat, display, and distribute" to people we personally give permission to then that would be fine, but it doesn't.

Thorrez · 2024-11-27T14:56:11 1732719371

>That would be me sharing a specific document with a specific person.

If you're sending it to them directly (e.g. emailing a file), then sure. But if Microsoft is hosting it on their website, then I think Microsoft would be displaying it to the person you shared it with.

>If their terms sepcified that they would only ""reformat, display, and distribute" to people we personally give permission to then that would be fine, but it doesn't.

I think you're basically saying the same thing I said. I said it would be fine if the terms said "To the extent necessary to provide the Services to you." I think that has the same effect as what you're saying. I'm not a lawyer though.

alt227 · 2024-11-28T16:48:29 1732812509

> But if Microsoft is hosting it on their website, then I think Microsoft would be displaying it to the person you shared it with.

The point is not that microsoft is displaying it, it is that they are displaying to a person I have given explicit permision to. They could easily put that in their terms if they wanted to to clarify the point. The fact that they dont shows they want to display it to other people/machines than the ones I give them permission to.

Thorrez · 2024-11-28T18:32:23 1732818743

I think we're agreeing. We're both saying that Microsoft should change their ToS to reduce the amount of permission they have to share users' content.

HPsquared · 2024-11-26T15:16:23 1732634183

The word 'necessary' can do a lot of heavy lifting.

mschuster91 · 2024-11-26T13:23:13 1732627393

> "To the extent necessary to provide the Services to you and others, to protect you and the Services, and to improve Microsoft products and services, you grant to Microsoft a worldwide and royalty-free intellectual property license to use Your Content, for example, to make copies of, retain, transmit, reformat, display, and distribute via communication tools Your Content on the Services," the clause reads.

Well, this does make sense in the context of Office 365, OneDrive and the Office web apps in general. (Still dodgy regarding the "worldwide" part but there's no way around that because people can and do expect to access their stuff even while on vacation)

Silently enabling the training of remote AI however? That's not covered under any reasonable interpretation of the above legalese.

genrilz · 2024-11-26T13:29:29 1732627769

IANAL, but I think the "to improve Microsoft products and services" bit does mean that they do legally get to train their AI (which is a Microsoft service) on your data. Still a bastard move though.

jagged-chisel · 2024-11-26T13:28:10 1732627690

>… intellectual property license to use Your Content

Seems clear to me. Use any way Microsoft wants. The “for example” list is not exhaustive nor limiting.

genrilz · 2024-11-26T13:35:20 1732628120

IANAL again, but I don't think they get to do literally anything with your data. The phrase used is "to the extent necessary". For instance, I don't think they could scrape their user data for trade secrets and then sell those to the highest bidder.

jagged-chisel · 2024-11-26T14:03:14 1732629794

Who defines “necessary?” Use of Your Content is Necessary to support Microsoft’s business activities, including, but not limited to, training their AI.

There are other laws protecting things like trade secrets and corporate privacy, so it would indeed be foolish for Microsoft to attempt gathering and selling trade secrets. But the wording gives them carte blanche to do anything not already illegal, including using your Most Awesome Word Template in Word’s collection of templates that they distribute to everyone.

ada1981 · 2024-11-26T13:36:59 1732628219

Why not? Isn’t that the essential ethos Microsoft was founded on?

genrilz · 2024-11-26T14:10:29 1732630229

Because they boxed themselves in with legalese. Companies would definitely switch off Microsoft services if at all possible if the company's lawyers thought their trade secrets were getting sold off. So I think the "as necessary" framing does probably prevent them from doing some things.

As I laid out in my other comment, I think training AI in particular is covered under the "improving Microsoft products or services" bit of legalese. I do wonder how companies lawyers will respond to this though. They probably thought of that phrase as just allowing Microsoft employees access to documents to see how Word or other pieces of software were being used, or to fix crashes, etc.

cudgy · 2024-11-26T13:50:49 1732629049

I thought it was founded on Bill Gates’s mommy having strong connections to IBM that allowed little Bill to keep the rights to the source code they paid him to write. And the privileged position of having access to a computer at his school when 99.9% of the population did not.

jasonjayr · 2024-11-26T13:45:06 1732628706

"The funds from the bidder will be invested in to products in order to make a better user experience" /s

cudgy · 2024-11-26T13:53:04 1732629184

Reminds me of “this call will be used for training and quality purposes.”

orev · 2024-11-26T14:34:37 1732631677

Title as of the time of this comment:

> Microsoft Word and Excel AI data scraping slyly switched to enabled by default — the opt-out toggle is not that easy to find

—

As a tech person, keeping up with disabling and avoiding all this is becoming exhausting. I can’t imagine any regular non-tech person having any chance at avoiding it.

Is it time to just give up? At what point do you have to accept that the tsunami is here and there’s nothing you can do about it?

trod1234 · 2024-11-26T17:22:16 1732641736

Worse than exhausting, this is clearly a pattern of abuse done by purposeful intent.

Security fatigue is a well known thing in IT. Configuration fatigue where your configurations malevolently switch back on after the options you chose, disabled them is just as bad, resulting in vexatious experiences.

This is the problem when antitrust is not enforced, and regulation has killed all other smaller market participants. It creates dynamics (abuses) that cause societal upheaval which inevitably lead to violence.

Its really stupid, but the people making these decisions are evil people. Every reasonable person knows that actions have consequences.

greentxt · 2024-11-26T15:01:11 1732633271

>At what point do you have to accept that the tsunami is here and there’s nothing you can do about it?

Around the late 2000's, but maybe it was earlier. The best time to buy msft stock is always right now.

squigz · 2024-11-26T14:42:18 1732632138

The solution isn't to give up or attempt to avoid it - it's to make this sort of thing illegal.

rurp · 2024-11-26T16:34:27 1732638867

Yes, exactly. There's no reason for the burden to be on every single user of every product to disable this crap. The law should require companies to behave more ethically with real consequences if they do not.

robin_reala · 2024-11-26T13:38:21 1732628301

I just checked and this is turned off in my installation, but I’m not sure that’s from being EU based, or because my org has disabled it.

paravz · 2024-11-29T21:10:10 1732914610

From the article:

>To do so, users must actively opt out by finding and disabling the feature in settings. The process requires unchecking the box 'Turn on optional connected experiences' that is enabled by default.

>On a Windows PC, the steps include going to File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences and unchecking the box.

daft_pink · 2024-11-26T13:32:33 1732627953

Microsoft = Spyware

cheschire · 2024-11-26T13:57:17 1732629437

Most tech theses days seems to fall into that classification.

There are not too many pieces of technology these days that intentionally avoid collecting your data in order to be sold to another company.

jmclnx · 2024-11-26T14:51:34 1732632694

>To do so, users must actively opt out by finding and disabling the feature in settings

Odd. So, lets say I wrote a article and it is copyrighted and on some newspaper WEB Page. If I understand this completely, in theory, I need to find everyone who uses this version of Word and tell them to disable this feature ?

If so, looks to me the lawyers are going to have a great time with this and will clog the courts for centuries.

protoster · 2024-11-26T14:17:43 1732630663

The linked "Services Agreement" doesn't appear to be specific to this "Connected Experiences" thing, but is rather the basic agreement required to use any MS software. Correct me if I'm wrong here, but opting out of this won't restrict MS from having a license to all Your Content?

formerly_proven · 2024-11-26T13:37:36 1732628256

Does this circumvent Azure Information Protection policies as well? Would be fucking hilarious if it did.

Filligree · 2024-11-26T17:39:31 1732642771

> Microsoft says Word and Excel AI data scraping was not switched to enabled by default (Updated)

This seems to be a misunderstanding.

Aaargh20318 · 2024-11-26T14:00:05 1732629605

This would certainly be the cause of lots of GDPR violations, considering the kinds of information processed in Word and Excel. I know our condo's owners association keeps contact information of their members in Excel sheets, that's considered PII. It can also contain sensitive information like who is behind on their monthly contributions and by how much.

That's just the first thing I thought of. There must be tons of companies and organisations processing sensitive data in Word and Excel. What about doctor's offices and insurance companies handling medical information? What about banks, financial advisors, lawyers, etc.

trod1234 · 2024-11-26T17:26:30 1732641990

This is why they chose to put AI into word processors and excel. So they can take the PII in a derived but reconstructable form (weights).

lousken · 2024-11-26T14:38:07 1732631887

servers are already on debian, client PCs left