I did my undergrad internship on federated learning. I was tasked with implementing in a simulator different federated algorithms, so to have a way to compare them in a meaningful way. The last that had to be implemented was FedMA. We didn't manage to do it. That algorithm is absolutely devilish. Every issue that I solved made other two issue arise, and neither my supervisors could help. The sheer idea of matching neurons in different networks might (and does) make sense, but the way the approximate costs are calculated require other 2/3 math papers that I could follow for only the first lines of the abstract. I'm happy for the time I spent in my internship there. I'm also happy it's over
The general understanding of how it works is surprisingly easy though, you can find the paper here https://arxiv.org/abs/2002.06440
That's the point of the privacy scheme. It would only be able to learn things common to multiple clients. Private data wouldn't make it through the noise.
That is beside the point. No lawyer will allow any "learning" on private data. Even without legal counsel, companies would be stupid to do it without compensation.
How is it beside the point? Legal might initially advise against cloud storage but if you come back to them with one that's encrypted on the client side that likely changes things.
Compensation could be of the form "we get a cheaper rate from Google" or even "this is the only form in which the service is offered" or perhaps "we aren't big enough to qualify for the fully airgapped offering".
The point is, for this to work, your system has to be connected to the internet. And if that's the case, there's no material difference between hosting it on prem and hosting it on GCP (where Google can promise you that it won't exfiltrate data). That's if you trust Google. And if you don't (and you shouldn't - it's an ad / mass surveillance company first and foremost), your only option for sensitive data is to self host and air gap.
https://federated.withgoogle.com/