What is Differential Privacy?

ianmiers · on June 15, 2016

This is something Apple really needs to release all the details of. Even if they got the crypto exactly right, they could have picked a privacy budget/ security parameters that just leaks everything.

And there is every reason to be skeptical about Apple's ability to design even mildly complex crypto given iMessage's flaws. Although the break in iMessage wasn't practically exploitable, that was luck and the fact that the only way to detect if a mulled ciphertext decrypted required attachment messages. The cryptographic mistakes were bad. Given any way to detect decryption of mulled ciphertexts for standard messages (e.g. sequence numbers, timing, actively synching messages between devices, delivery receipts from iMessage instead of APSD), Apple's crypto design bugs would have eliminated nearly all of the E2E security of iMessage.

Remember, this isn't a boon for user privacy. Apple is now collecting far more invasive data about users under the claim that they have protections in place. At best it preserves the status quo and does so only if Apple both picked the parameters correctly and implemented it correctly.

At this point Apple's position should be best summed up as: we have drastically reduced your privacy except not because magic that we (i.e. Apple) do not fully understand.

mnem · on June 15, 2016

Apple have designed/implemented several quite successful crypto and security systems too.

ianmiers · on June 15, 2016

What have they actually designed from scratch? Most of what comes to mind (Facetime, filevault) it is off the shelf stuff/ at least there were well known designs to ape which had been subject to analysis. This, well if they just copied what google did, then maybe.

But at some level, my comment is pretty harsh on Apple when they have a better track record than most for privacy and encryption. But remember, this is Apple making you less secure by grabbing data and then saying they took care of the issue.

With iMessage, Facetime, Filefault, if it failed, you were were no worse off than if you used something else. In this case, you actually are. That means there is a higher burden to getting it right than just name dropping magic crypto pixie dust.

czr80 · on June 15, 2016

> With iMessage, Facetime, Filefault, if it failed, you were were no worse off than if you used something else. In this case, you actually are.

What would someone be better off using?

ianmiers · on June 16, 2016

Something that didn't report these statistics. An old version of IOS for one. Does android do this type of reporting?

bo1024 · on June 15, 2016

I've read a couple articles and haven't seen any details about how they're going to apply DP (differential privacy).

It's important to clearly distinguish what DP can and cannot do. DP is just a technique for taking a database and outputting some statistic or fact about it. The output has some noise added to it.

The guarantee of DP is (roughly) that anyone looking at the output alone won't learn much about anyone in the database. This also holds for anything you do with that statistic.

Think about this carefully when thinking about what DP does and doesn't promise. Also think about the difference between "privacy" and "security".

Example of what DP does protect against: If Apple is recommending products to people based on others' download habits, and this recommendation is based on differentially private statistics, then no other user or group of users can infer anything about my downloads. In fact, even engineers at Apple, if they can only see the statistics and not the original database, cannot infer anything about my downloads.

Example of what DP does not protect against: government accessing the data. The database still has to exist on Apple's servers. The government can get to it just as easily as before via warrants or so on. DP is not cryptography.

My assessment: On one hand it is awesome that Apple is taking a lead in using differential privacy and thinking about mathematical approaches to privacy. On the other, there are many facets of privacy and right now I think people are more concerned about security of their data and privacy from the government, or else privacy from companies like Apple itself. DP doesn't address these; it only addresses the case where Apple has a bunch of data and wants the algorithms it runs not to leak much info about that data to the world at large.

frankmcsherry · on June 15, 2016

> The database still has to exist on Apple's servers.

It doesn't, which is part of the reason Apple wants to do this. You can still do differential privacy without collecting all the data, you just get less accurate results. See page 232 in [1], re "The Local Model".

[1]: http://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf

Edit: the article even says this down when it talks about RAPPOR.

bo1024 · on June 15, 2016

Hi Frank and thanks for responding; you're right of course.

If Apple really did implement some sort of randomized response (or more sophisticated variant), I think that would a real breakthrough for user privacy since they'd be giving up control of the data.

tjl · on June 15, 2016

According to the State of the Union (where they explain how they're doing differential privacy) there is an overall database, but only in aggregate. It's specifically designed so you don't know the individual answer from each person.

conradev · on June 15, 2016

Even though the technique doesn't require collecting all of the data, Apple still does collect a huge well of data.

They store all iCloud sync material (backups, photos, contacts, calendars, mail, documents, etc.) without end to end encryption, and have all of the iMessage metadata.

Twisell · on June 15, 2016

This is so wrong. The only thing that isn't user-encrypted so far is the iPhone backup on their server (Of course it is encrypted but Apple have the key to decrypt it as needed).

The official explanation so far is that if the user forgot the password a user-encrypted backup would just become some useless junk.

This is (officially) the sole remaining non user-encrypted personal data on apple server that authority can reclaim using a warrant.

However after San Bernardino FBI mess, Apple start considering to also encrypt iCloud backup.

So if you think you are right, proofs please...

Natanael_L · on June 15, 2016

What's wrong? AFAIK the only thing fully end-to-end encrypted is the keychain.

https://support.apple.com/en-us/HT202303

Take note of what the wording leaves out. Apple holds the decryption keys for just about everything.

> iCloud Keychain encryption keys are created on your devices, and Apple can't access those keys. Only encrypted keychain data passes through Apple's servers, and Apple can't access any of the key material that could be used to decrypt that data

This wording is used for nothing else than the keychain.

Fully consistent with the behavior where disconnecting all your devices from your account, to then do a password reset and logging in on a new one device will make all your data available to you again in plaintext format. Including iCloud backups of iMessage chats.

corv · on June 15, 2016

Are you saying Notes, Safari Bookmarks, Photos, etc are encrypted on iCloud?

How come they are accessible from iCloud.com? Decrypted by the browser on the fly?

Twisell · on June 15, 2016

It seems so: https://support.apple.com/en-us/HT202303

However I reckon that technically Apple could access data or give data stored on iCloud to NSA/FBI because they actually still hold the keys for that part too (not only backup as I thought). Only the password/creditcard Keychain is now claimed to be fully user-encrypted and can't be recovered by any mean by apple.

For anything else than a warrant, they'll "just" have to breach every engagement they made in their contract which would, as far as I know constitute a pretty solid legal case that could only lead a public walk of shame that could compromise the whole company's future.

If you don't trust them, don't use their cloud, I totally respect that. In the end it always appeal to some degree of trust, even GitHub could be spying on paid private repositories under the hood if they really wanted to. But for what gain?

arviewer · on June 15, 2016

> For anything else than a warrant, they'll "just" have to breach every engagement they made in their contract which would, as far as I know constitute a pretty solid legal case that could only lead a public walk of shame that could compromise the whole company's future.

This is something I doubt. It would be rather easy to change the software and make it sync passwords, even on an individual basis. If this would come out, it would mean a big marketing problem, and could result in sales losses like 10-20%.

I said "could", but to be honest I think 2-3% is more realistic. Most people don't care. They want their data to be safe in case of theft, and have a backup in case of loss. Here on HN it's a big thing, but most users don't know, don't care.

HappyTypist · on June 15, 2016

CISPA grants civil immunity for sharing information with the government.

Natanael_L · on June 15, 2016

AFAICT only storage is encrypted. They decrypt server side.

Twisell · on June 15, 2016

Do you suggest that Apple is blatantly lying in the article I just cited?

Natanael_L · on June 15, 2016

https://www.apple.com/privacy/approach-to-privacy/

> All your iCloud content like your photos, contacts, and reminders is encrypted when sent and, in most cases, when stored on our servers. All traffic between any email app you use and our iCloud mail servers is encrypted. And our iCloud servers support encryption in transit with other email providers that support it.

> If we use third-party vendors to store your information, we encrypt it and never give them the keys. Apple retains the encryption keys in our own data centers, so you can back up, sync, and share your iCloud data. iCloud Keychain stores your passwords and credit card information in such a way that Apple cannot read or access them.

The End.

I always find it amusing when people downvote me for telling them Apple is doing what they admit to be doing.

Natanael_L · on June 15, 2016

Misdirection.

singularity2001 · on June 15, 2016

The burden of proof should be on Apple's site. Given their secrecy all you can do is pray or switch to open source.

Twisell · on June 15, 2016

Alternatively you can also RTFM before complaining about secrecy... https://www.apple.com/business/docs/iOS_Security_Guide.pdf

Of course that will never be as audit friendly as an Open-Source code. But don't call it a secret, while you actually just didn't search for the information...

drdaeman · on June 15, 2016

It's not only encryption, what's important, but proper key management. Just saying.

tchalla · on June 15, 2016

> Apple still does collect a huge well of data.

I don't think Apple has ever claimed it does not collect data.

arnarbi · on June 15, 2016

> The output has some noise added to it.

You can also add noise to the input samples, so your database doesn't contain statistically significant information about any single individual. But aggregate queries will work as the noise evens out. See http://research.google.com/pubs/pub42852.html

Edit: Just saw that the linked article actually explains this in the last section as well.

cromwellian · on June 15, 2016

Some dissenting views on the utility of differential privacy: https://medium.com/@Practical/differential-privacy-considere...

Also, Apple is woefully low on details, theoretical privacy should be accompanied by openly published research papers that are peer reviewed. I understand they won't release the source, but would you trust Apple if they said they invented a new encryption algorithm, but refuse to publish an academic paper on it? I'd be interested precisely in what they're doing. Are they claiming they're doing federated learning, by gathering anonymous image data from photos, uploading it to their cloud, training DNNs on it, and then shipping the results back down to clients for local recognition? Surely they're not training on device, as this is very RAM and CPU intensive.

guelo · on June 15, 2016

Apple backed themselves into a corner by marketing themselves as the super-privacy company in contrast to Google. The problem is that all the data collection lets you do some really useful stuff that benefits the user. So now they're spreading FUD while trying to pretend that they're not collecting the same type of data that Google does. Google has been using differential privacy for a while in different projects.

anshukla · on June 15, 2016

How do you explain their insistence, then, on doing object recognition in photos on-device? Is it possible that they have understood the affordances of various data collection and obfuscation techniques and will apply the appropriate ones after taking into account their desire to protect privacy?

Google indeed has RAPPOR (and other projects, I'm sure), but the cultural difference Apple claims is "we consider privacy in everything we do" instead of "we add privacy where we can."

kbenson · on June 15, 2016

> the cultural difference Apple claims is "we consider privacy in everything we do"

I'm pretty sure that should be interpreted as "we've determined privacy is a differentiator in the market, so as of some indeterminate time in the past, ranging from a few years ago to our inception, we consider privacy in everything we do."

Now, there's nothing wrong with that, and that's not to say they haven't been privacy conscious in the past, but let's not confuse the current stance as entirely altruistic, when there are are multiple incentives at play, one of which is concern for the user.

Edit: s/months/years/, that's much more accurate.

spullara · on June 15, 2016

Steve Jobs was pretty passionate about this and they are continuing on with it.

http://www.recode.net/2016/2/21/11588068/heres-what-steve-jo...

kbenson · on June 15, 2016

Sure. But you can replace Apple in my comment with Jobs, and it would apply equally as well, had Jobs not passed away. The article you reference points this out specifically:

> His comments arrived as Apple started to identify Google, and its ascending Android operating system, as its chief competitor. Here we see the first signs of the hardware seller deploying its privacy position as a branding and competitive tactic, a strategy that has come to the fore during its current standoff with the feds.

peteretep · on June 15, 2016

If Tim Cook wasn't part of a frequently and historically persecuted minority, I'd be more cynical too.

kbenson · on June 15, 2016

I don't see it as being cynical, just as being rational. People rarely have a single motivation for their actions, even if they may report a singly motivation if asked (possibly the primary reason, or the one they feel comfortable talking about). I don't see why this would be any different for a corporation, generally being made up of many people.

Like I said, there's nothing wrong with this. We just need to be sure we don't fall into the trap of thinking we can take what is presented at face value as the whole story, just as you can't when dealing with individuals much of the time. Apple is not our trusted old friend, that will look out for our best interests. They are at best an acquaintance that we have a business relationship with. That doesn't mean they won't act in a manner we appreciate, but it does mean we should not assume they will act as a good friend.

Twisell · on June 15, 2016

Whoa you just opened my eyes I always believed big company were fundamentally altruistic!

What a bummer!

GrumpyYoungMan · on June 15, 2016

Agreed. I find it vastly amusing to hear praise for Apple because they're applying some sort of obfuscation to the telemetry they collect ("Proprietary and totally secret, of course. Oh, what telemetry is Apple collecting? That's secret too but, trust us, Apple cares about your privacy.") and yet people are up in arms about Windows 10 telemetry and little, if anything, is ever said about the telemetry collected by Android. How does that work?

ahartman00 · on June 15, 2016

I agree, I just wanted to point out that it is slightly different, as historically microsoft has had a "pay us and we dont care what you do" (my words) agreement with users, whereas google has always been a free and ad supported company. But yeah there is a triple standard here.

Twisell · on June 15, 2016

But I can't forget that Google is the company that somehow managed to suggest ads on my personal phone based on browsing on my professional PC.

Theses devices are never on the same network, the only shared parameters is an exchange account. As a rule I always log-out of the only Google service I rarely use, so this must be some cookie/tracker dark magic.

Sadly I have no proof, but I use gosthery to block trackers since then. (Side note: gosthery also claim to use DP btw)

mtbcoder · on June 15, 2016

Facebook recently recommended to me a "friend" who was a person that worked at another company which was a client for my previous employer. The only means of online communication I've had with this person was through my old work email. My Facebook account uses a unique email address used only for Facebooking, I've never friended anyone from my previous employer, my demographic information is all made up (except for my name), I've set privacy controls in Facebook to be as strict as possible, I run uBlock Origin/NoScript on all browsers, I clear browser history/cookies/etc on exit...yet, here is this person being recommended to me. The lengths that these companies go to fingerprint you online is incredibly scary and creepy.

casion · on June 15, 2016

It's possible that the person being recommended has lax settings.

Connections are a two way street. Facebook can assume that if he has connections to you, then you may have a connection to them.

Nothing nefarious is necessary.

techie64 · on June 15, 2016

It's fairly straight forward - cookies + Google's ad network + Analytics on your phone allow them to track you across devices.

Twisell · on June 15, 2016

Ok, so maybe we are talking about the same field of science, but it seems that, from what they say there is two very distinct approach.

1.Collect personal data, send them to mainframe, use them to profile and deliver custom tailored services. When sharing hash so that individual records can't be extrapolated

2.Collect personal data, use them locally sometimes using mainframe provided 'models'. Return to mainframe hashed records to improve models.

Google is clearly using 1. while Apple claim to target 2. (I don't think it's actually the case now because I though Siri actually store some personal data on the cloud so far)

bla2 · on June 15, 2016

How would the cookie from the work get to the phone?

mtbcoder · on June 15, 2016

You don't need to rely on cookies anymore to identify someone online: https://en.wikipedia.org/wiki/Device_fingerprint

bla2 · on June 15, 2016

Sure, but that's not cross device.

Twisell · on June 15, 2016

I must have been connected to the same Gmail or Facebook account at some point on both devices. However as I said I purposely never stay logged in to google services. Yeah, I can flush all cookies (thus annoyingly resetting all legit cookies as well).

But this is not how privacy should work, cause there is a lot of people out there that don't read HN and only recently found out that there is a lot pastry inside their computers.

HappyTypist · on June 15, 2016

Your browser (Firefox, Chrome, Safari) most likely uses Google SafeSearch. That phones home with a super cookie every 30 minutes.

fn1 · on June 15, 2016

Apple is a hardware company. The not as dependant on data as Google is.

ekianjo · on June 15, 2016

> On the other hand, when the budget was reduced to a level that achieved meaningful privacy, the "noise-ridden" model had a tendency to kill its "patients".

Uh, the graph is just showing you get an increased 25% estimated risk of mortality from Warfarin, nothing close to "killing patients". Complete exageration, since the mortality baseline is probably very low in the first place.

frankmcsherry · on June 15, 2016

And, just to be clear, in their paper it not the presence of noise that kills people, it is their treatment of noisy measurements as high-confidence measurements that kills them. When you get low confidence reads from data you should use them as such; the authors just act on them no matter how strong (or weak) the confidence and report what happens.

It's roughly analogous to a doc saying "should I deviate from the baseline treatment?" and when the data say "dunno" the doc prescribes a totally random medicine rather than the baseline, because that is what "dunno" means.

aub3bhat · on June 15, 2016

Good luck selling that to doctors. People who are trained to cut open a patient aren't receptive to hypothetical harms and imaginary "budgets". Consider the fact that cause of death records are public. The usual perception of privacy by theory and security researchers and norms practiced in healthcare differ enormously.

cyphar · on June 15, 2016

The point of differential privacy is to allow for aggregate analysis, without destroying the privacy of outliers. Researchers deal with noise all the time, so is it so odd that a field of researchers believe that adding enough noise to data released with studies will allow for conclusive analysis without ruining privacy for individuals?

aub3bhat · on June 15, 2016

As someone with access to data on 50 million patients and having studied aggregation of medical data for last 5 years I can assure you that its not easy as it sounds.

The amount of noise that "theoretically guarantees" privacy protection in terms of epsilon renders any reasonable analysis impossible. E.g. How about CDC telling you that there are 0 - 2000 cases of Ebola in Massachusetts.

There are theoretical guarantees provided by Differential privacy and then there are actual requirements of conducting public health or biostatistical reaseach with certain evidence value. The gap between the noise added by the former and tolerated by the latter is enormous.

skiningham · on June 15, 2016

0-2000 in a population of 3.6 million does not seem unreasonable to me.

aub3bhat · on June 15, 2016

Sure, but you are neither a doctor nor an infectious medicine specialist.

Its trained doctors and researchers who get to decide quality of medical evidence. There is a reason why we have detailed set of protocols and levels used for assessment of evidence.

http://www.cebm.net/oxford-centre-evidence-based-medicine-le...

Create · on June 15, 2016

Outliers are the reason databases exist. Any "average" is simply readily apparent, therefore irrelevant for serious in depth analysis.

Adding noise and fuzzing has a long history in statistics since the '70s [1], and while it does work on large numbers, it almost always messes up the details ie. the error bars.

C.D. DP is essentially a cheap ripoff of the ideas implemented in ARGUS[2].

[1] 1977 Dalenius, see Do Not Fold, Spindle or Mutilate movement and earlier:

http://tpscongress.indiana.edu/impact-of-congress/gallery/fi...

[2] http://neon.vb.cbs.nl/casc/

cyphar · on June 15, 2016

> Outliers are the reason databases exist.

Disagree. Data is why databases exist.

> Any "average" is simply readily apparent, therefore irrelevant for serious in depth analysis.

I said "aggregate", not "average". There are many kinds of aggregate analysis useful (in Astrophysics, you can take many different samples from different stars and use the aggregate to compute commonalities in the sample that you would've detect with a single measurement). There is more to aggregate analysis than averaging data.

As for the rest of your points, I'm not a statistician so I can't comment. Also, I didn't downvote you (HN rules).

Create · on June 15, 2016

sorry -- please substitute "average" (original was also in quotes) to category or factor, and you still have the bin I am talking about. You can put any label on it you like such as "commonality", as long as you remove details, ie. other bins.

But as you say: your "aggregate analysis" NEEDS "many different samples from different stars". Commonality is the result of your analysis based on different samples. But since they are common, you can go and sample and have the result without doing mass surveillance on every star.

ps: I am fully aware of photo stacking, but also note, that stars are not humans, see context of privacy. Please look at argus or sdcMicroGUI from CRAN to get a feeling for data utility vs. reidentification risk.

cyphar · on June 15, 2016

> But since they are common, you can go and sample and have the result without doing mass surveillance on every star.

"Mass surveilance" reduces noise and lets you get more data in a shorter period of time (telescopes have large fields of view, but they can't make time pass faster). Stacking (which is what the technique is called in Astrophysics) is very useful in this case. Not to mention that you can also do individual analysis as well.

Actually, most interesting of all is that you can do this type of analysis on objects like neutron stars that we can't observe directly because they're too faint. Because noise in telescopes can be modelled as a Poisson process, stacking actually increases S/N in a way you can't do without making much bigger telescopes.

PS. I'm not a statistician, so I can only speak to what I know. But my whole point is that researchers do know how to deal with noisy data, regardless of whether or not that noise is man-made or not. Interestingly enough, I found out recently that the NASA pipeline actually breaks certain data sets they have released (which have papers written about them) so man-made noise is a problem regardless of whether or not it's intentional.

Create · on June 15, 2016

"Not to mention that you can also do individual analysis as well."

This is the key point to argue against in the context of people, privacy and mass surveillance.

It is the touchstone of privacy, anonymity and crowd protection.

Regarding noise suppression: yes, the more queries (available data whether raw or extracted) the more you can filter (ask a Kalman student) to reduce your error bars and margins. This is a reason why DP is overhyped. Also, if there are no differences between queries, then data is redundant. See deduplication (database) or scaling (measurement).

About the analysis pipeline: this is why the mantra "know your detector". Coincidentally, this is why releasing only recorded datasets is next to useless for people outside the given research group. You would need to capture detailed knowledge of your data taking operations and instruments, which happens rarely, if ever. Please cite a thing such as "the NASA pipeline", perhaps you mean a given mission/experiment? In any case, detector recalibration is a usual, almost daily activity...

cyphar · on June 15, 2016

> Please cite a thing such as "the NASA pipeline", perhaps you mean a given mission/experiment?

The specific pipeline I was referring to is the Kepler pipeline that NASA uses to take their raw pixel data and produce photon counts that everyone uses for their research (this wasn't a detector issue, it was a software bug at the final stage of the data publishing process). The point was not the pipeline issue, it was that noise is everywhere.

But as to your point, yeah okay. Maybe I shouldn't talk about statistics when that's not my field. :D

Create · on June 15, 2016

downvoter: care to elaborate on the usefulness of a database with almost identical entries with MEANingless values?

cyphar · on June 15, 2016

Outliers are not the only useful thing in a set of data. If you remove outliers from most data sets, it doesn't suddenly become "almost identical" -- unless your outlier rejection system is "is it equal to 1".

EGreg · on June 15, 2016

If you collect values of random variable Y from phones, where Y = X + N (N being normally distributed with mean 0 and var Y = var X, say) then many statistics can be calculated with that.

The law of large numbers says that after gathering statistics from many values of Y, they will converge (for continuously differentiable functions of X) to the values for X.

Yes?

Meanwhile each individual user will not send so many samples as to identify the true values of X with any useful accuracy.

hfsbtnye · on June 15, 2016

I have to admit, I'm really starting to like the direction that Apple is heading despite being previously disenchanted. I only wish that they would go ahead and put everything under a free software license, since they're in the business of selling hardware that's coincidentally bundled with their software.

forgotpwtomain · on June 15, 2016

> I only wish that they would go ahead and put everything under a free software license, since they're in the business of selling hardware that's coincidentally bundled with their software.

That's never going to happen. Apple sells a 'User Experience' not just hardware - having a complete and mostly closed product is an inevitable consequence of the former - and the number of Linux users that would buy a Macbook isn't a large enough part of the market for them to worry about.

With that said I've had a Macbook Pro and it was pretty much a better piece of hardware (at least as far as build quality) than any other notebook I've used.

hfsbtnye · on June 15, 2016

In what way does changing the software license impede user experience? No other company would have the proverbial balls to straight up copy their software either.

nixy · on June 15, 2016

Have you heard about a company called Xiaomi? They have the balls to straight up copy the industrial design of Apple, so pretty sure they would have no problem copying the software.

7Z7 · on June 15, 2016

>No other company would have the proverbial balls to straight up copy their software either.

How can you say this looking at the hardware landscape?

applester · on June 15, 2016

To paint another viewpoint, Apple initially went all gung ho about privacy and wanted to make not collecting data a big play (and fairly so, full respect to them).

The recent WWDC obviously shows a big shift towards AI and ML applications within the company. Some things are possible on the device, but many neural nets just cannot be served from an iPhone reasonably. Hence, the move towards more data collection. I really wish they give out more information here. Until then, I'm not sure how much they are actually collecting after their realization that they do need the data to do AI well.

tjl · on June 15, 2016

They talked a bit more about differential privacy in the State of the Union. Basically, they hash the data and add noise. By collecting data from a bunch of people that noise gets averaged out. They also limit the amount of samples (over a relatively short period of time) they can get from a single person so they won't be able to identify them.

xufi · on June 15, 2016

Intersting. That's a smart way to collect data while not having too much noise flood in the dataset. I need to watch this State of the Union.

jon-wood · on June 15, 2016

State of the Union is what the WWDC keynote used to be before it started being watched by press and the public. Much more technical detail, and information in the underlying frameworks rather than user visible features.

tjl · on June 15, 2016

https://developer.apple.com/videos/play/wwdc2016/102/

You can find all the videos from WWDC 2016 some time after the session is done. I usually check the next day. They have the videos for several previous WWDCs up as well.

quantumtremor · on June 15, 2016

At some point, regardless of adding noise, you're definitely losing your privacy. I'd be happy with an "opt-out" feature that I know worked (as far as I can see, only if it was open-source). I didn't watch WWDC, perhaps they mentioned this.

johncolanduoni · on June 15, 2016

I agree opt-out is definitely something that should be deployed alongside differential privacy, but what makes you so sure that it doesn't work "at some point"? If the noise means an specific query to one user's information has a significant chance of being wrong, how does this not equate to privacy? You can add a lot more noise than you might imagine if you know the kind of analysis you'll be doing with the data; for example, a lot of statistical techniques are constructed to be mostly immune to Gaussian noise since it's very common with some kinds of data.

fauigerzigerk · on June 15, 2016

The whole point of collecting the data is to predict the actions or information needs of individual users. That in itself is a privacy issue.

If a recommender system for iTunes can predict the likelihood of me appreciating movies that contain violence against women, that information could be subpoenaed when I am falsely accused of having strangled my girlfriend.

I appreciate that Apple is trying to protect our privacy where they can. But if we want them to make predictions about or behavior, we have to be aware of the fact that we are necessarily giving up some privacy.

johncolanduoni · on June 15, 2016

You're misunderstanding where this is to be used. It is specifically not for things like iTunes suggestions, where it would be useless. It's for situations where they want to get aggregated metrics without collecting identifiable information. The obfuscation can be performed by the client so that they never have a database on the server with accurate (at the specific user level) data.

fauigerzigerk · on June 15, 2016

I don't think I am misunderstanding (although I'm not completely sure about that). My point isn't about iTunes. My point is about the purpose of data collection. If that purpose is predicting our actions, then that in itself is a privacy issue.

I understand that the database Apple wants to build does not contain accurate information about individual users. But if that database allows them to make predictions of our behavior, then there is a privacy issue. If the purpose is not prediction, then what is it?

johncolanduoni · on June 15, 2016

It could be a number of things, but one possibility is identifying broad correlations between metrics. Ssince you can't trust the accuracy of the individual metrics, you will have a limited ability to apply the correlation to individual users, but if you use the right kind of noise aggregated conditional probabilities may survive.

So Apple can (for example) predict that listing to band A means you are likely to like band C, and then send a list of correlations to your device so the predictions can be made there by examining your library locally. A more probable use is analytics for marketing purposes. Another is selling just these correlations and other aggregate statistics to other parties; this is actually how Mint makes money.

fauigerzigerk · on June 15, 2016

>So Apple can (for example) predict that listing to band A means you are likely to like band C

And how is that different from my iTunes example?

johncolanduoni · on June 15, 2016

I used "you" incorrectly, my bad. They can predict that people who listen to band A are likely to like band C, but their data for whether you listen to band A still has a significant chance of being wrong.

fauigerzigerk · on June 15, 2016

Yes, the data has a significant chance of being wrong. But it is useful only insofar as it supports a prediction with a probability of being right that is greater than 0.5.

That's what makes the data useful and that's what makes it a privacy issue at the same time.

johncolanduoni · on June 15, 2016

It doesn't have to support that prediction in specific instances, just in a general trend, where random noise tends to average itself out in a lot of cases. There are lots of different distributions with the same averages, the same conditional probablities, etc. with wildly different data. If you have some mathematical proofs that say you can not reach one of these other distributions by injecting random noise to mask individual contributions, then please write a paper on it! But to my knowledge, Cynthia Dwork's work and others still stands. There is definitely no simple, common sense reason that it doesn't work.

How does sending the same list of conditional probabilities for liking pairs of bands to everyone's device and then having the device pick out the ones actually pertinent to your library compromise your privacy?

fauigerzigerk · on June 15, 2016

I don't doubt the validity of Dwork's work. I think we're talking past each other.

What I'm saying is that if Apple keeps data on its servers that is sufficient to predict some of my actions or likes with any accuracy greater than 50%, then that is a privacy concern.

But if you're saying that the data in Apple's database does not have any predictive power on its own, then I agree that it is not a privacy concern.

In that case, my device would have to download some of Apple's data and combine it with data that resides only on my device in order to make a prediction locally on my device.

If that's how it works then I have no concerns.

tjl · on June 15, 2016

I believe that's how it works.

They even limit the number of samples they get from a specific person so they can't filter out the noise for that person and get their individual response.

But, keep in mind that Apple will have records of all your iTunes rentals and purchases at least for billing purposes. However, at least in the US there's a law about keeping that data private (because of Robert Bork).

https://epic.org/privacy/vppa/

dudusmaximus · on June 15, 2016

Differential Privacy is now $AAPL's licence to collect more "anonymised" personal data & benefit. Such personal data collection is not an evil anymore. I'll have some of the "PR" they're having!

tchalla · on June 15, 2016

> Apple initially went all gung ho about privacy and wanted to make not collecting data a big play

My impression always has been that Apple does not collect data that can lead you to be personally identified. I never got any impression that "Apple does not collect data".

lalos · on June 15, 2016

They know it's one of the ways to differentiate from the other big companies, by offering privacy and marketing in such a way that it's impossible for the other big players to do the classic "me too" approach.

yrhcz · on June 15, 2016

Who knows, maybe one day that will be the reality. They are taking a lot of steps in the right direction towards facilitating development on their platform – In my opinion it doesn't seem too far fetched to believe that one day they might open up their software too.

kordless · on June 15, 2016

Fuck Apple. The only thing they are programmed to care about is the bottom line and getting more users. That programming is complicated and unlikely to change to suit the consumer's needs, or desires.

liquidise · on June 15, 2016

You say that despite this privacy push being a direct response to consumer sentiments. Apple knows well that as tech becomes wearables and the Internet of things, privacy concerns skyrocket. My grandparents won't buy things online. Soon my generation will be those old and anxious curmudgeons unless our concerns are eased.

Do they care about their bottom line? Of course. It's for that very reason they are investing the time now to secure the trust of generations of consumers.

gear54rus · on June 15, 2016

This 'push' is just another unprovable unverifiable marketing bs that is just enough for the type of audience they attract until it's all opened to FS licenses and hardware is made to accept any software that user wants it to (like it should if you bought it).

kordless · on June 19, 2016

Agreed. Apple is only doing as much security as they reasonably needed to satisfy customers. Most customers don't understand the various types of suffering (and suffering risk I might add) caused by software, but large companies do. Large companies end up "making decisions" for users en masse. This is fine for button placement but falls apart logically when presented in a discussion based on trusted infrastruture. You will note that liquidse's response is conflicted. Those rationalizations are the indicators of dissonance in the privacy debate.

nxzero · on June 15, 2016

Seems to be a huge amount of speculative commentary, which is acknowledged, but to me, not a way that shows the potential variation in implementing DP.

For example, Apple could easily download all the data, do a DP on the impact of adding the data to the existing aggregate data, clean out indentiers, and add it to the database.

Key here is that Apple has all the data, then purges the indentiers from it, which is completely different than removing the indentifiers before sending to Apple. _______

(Apple:) "Hi, I'm Apple, Trust Me! Don't mind the black bag, I just likely being mysterious, it's cool, right?"

(Me:) "Umm, no, no thanks!" _________

Apple needs to let go of the whole security through secrecy ploy, since it looks more and more shady.

Imagine if security modules for devices where public and non-secure section of the devices had to be encapsulated for EmSec and tamper proof. If this was the case, security literally wouldn't be an issue; either everyone is impacted, or no is impacted.

chmike · on June 15, 2016

Does it mean that Apple will randomly insert turds in my messages so that it looks like the average user ?

kordless · on June 15, 2016

It's time for an Open Communications initiative. Time for companies to stop owning the platform. Time for all of us to stand up for our right to communicate with who we want, when we want, without being monitored, inspected, blamed, or advertised to. Enough is enough. It's time for a change.