For what it's worth, the most private data here is shared to analytics companies for Grindr's only analytical use. My guess is that Grindr's agreement with Apptimize and Localytics asks for the strictest possible protection of that data. If anyone at Apptimize or Localytics has access to that data, I'd be incredibly surprised.
This sort of deal isn't the same as sharing the HIV status to Google or Facebook so that advertisers can target or exclude that user information for the purposes of advertising.
For people who think this is still wrong, I'm curious what their pragmatic alternative is. How else are app developers supposed to analyze their app performance? The open source, self-hosted pickings are slim. (I can only think of Piwik, which in my experience has a dated feature set and severe performance issues.) Not everyone can afford to perform their own product analysis. Using a third-party analytics saas is kind of the only way to go and seems like a reasonable tradeoff of security for product visibility.
As someone who has been working in security for a long time, and has seen how the sausage is made at even the biggest, most reputable companies who “take security very seriously”, the “strictest possible protection of that data” means approximately nothing. The only serious way to protect sensitive data is not to take it in the first place. Hell, not even the NSA can keep a lid on their sensitive data.
”For people who think this is still wrong, I'm curious what their pragmatic alternative is. How else are app developers supposed to analyze their app performance?”
Remember, customers first, your “needs” come second. That goes double when they are placing their trust in you by allowing you to be a custodian of their data.
Not long ago, desktop software phoning home would have been a scandal. Not long before that, it was offline and couldn’t phone home. Yet, we still had software. Unfortunately, developers have taken the slipperly slope all the way to outright abuse of their privileges in order to collect information that customers don’t know about or understand. This has led us to things like GDPR. It doesn’t matter if your intentions are good or your usage is benign. It isn’t yours to begin with, those aren’t your decisions to make, and developers need to learn to seriously respect that.
1) At least don't send any personal data over http. It's 2018 for fucks sake. I can't believe there are companies out there with such a hand-wavy approach to this. Is it so hard to do https in this day and age? It's so basic wrt to a security audit, my head hurts. The fact that extra data is sent over https shows that they made an active decision to partition this data into non-important/important.
2) Just don't fucking send it to a third party. Every single time you do that you yield control over the data, introduce another party to the mechanics thus doubling the risk of disclosure and they you cry 'breach of trust'.
> Not everyone can afford to perform their own product analysis.
Then don't do it and don't store sensitive information. You're taking on a risk and if you don't have the money to roll your own analytics then you probably don't belong on the market. This is no longer a playground, this is the real world, especially for this kind of information. People can get killed based on Grindr leaks. It's the big boys game and if you don't have the backing, you shouldn't play in the first place. And this app specifically should not have any problems with funding, give me a break.
So not used for performance, but instead "A people-centered and personalized approach to app marketing and analytics". I am not sure if this is better or worse.
> My guess is that Grindr's agreement with Apptimize and Localytics asks for the strictest possible protection of that data. If anyone at Apptimize or Localytics has access to that data, I'd be incredibly surprised.
Honest question, are you in the SAAS analytics industry or is anyone else that can comment on this? I am not (though I do do data work) and I would actually be surprised if the SAAS company _didn't_ have access to the data.
That would require some kind of dedicated setup so that Grindr's data was not at rest with other company's data which is a) super expensive, b) no reason to expect that the SAAS company would not have access for maintenance/troubleshooting and c) kind of defeats the purpose of using SAAS.
For startups of their sizes, it's unlikely they have strict data controls. So, probably anyone working on the product side of things, support, engineering, services, has access to their analytics data. Basically, most of the company likely has access to that data. Grindr really shouldn't be sending that data to their analytics providers.
Why would you ever default to an opt-out for that information? That's like saying "people should read the contracts" while waving about a 10,000 EULA in 6 point type, or burying an option checklist so deeply that most users don't even know it's there.
“But the plans were on display…”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well, the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice, didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”
I don't think the distinction between "third party service" and "hosting company" is all that clear. You're sending data to a third party service when you host an app on AWS. The only data protection you have is contractual.
> You're sending data to a third party service when you host an app on AWS.
Amazon neither receives nor requires access to the raw underlying data (in this case: data in your database indicating HIV status, or decrypted bodies of requests sent over TLS indicating same) when you host your web services on AWS. While, yes, it's possible for a dedicated attacker to intercept and snoop on this data, it's (a) not easy, and (b) very much outside the bounds of the scope of the relationship you have with them.
Contrast to the setup described here, where the third parties in question both received and required access to the raw underlying data in order to perform the services they were explicitly contracted for.
You may not think this is an important distinction, but legally, it is, and it makes a world of difference.
I honestly don't get the distinction you're making here. I understand how people _can_ use AWS without ever letting sensitive data touch their disks, but most apps hand everything over wholesale (and frequently in a nicely structured format on RDS).
The legal distinction you're making doesn't sound right to me. Contractors for companies that access your data aren't usually about whether or not an attacker can get at it, but about what kind of access an employee of the service itself has.
Amazon _technically_ has complete access to your data when you run on AWS, but they're contractually limited in how they can use it. The same goes for third party SaaS services. The major difference is "who writes the logic".
But I'm not a lawyer and won't ever have to argue that somewhere it matters.
Amazon is selling an abstraction, and goes to great expense to not have access to customer data. If you are a HIPPA covered entity, they sign a BAA that puts them on the hook.
It's like the difference between putting your papers in a storage locker versus your friends garage. The storage company ultimately has access to the locker, but is less likely to snoop (either consciously or accidentally) than any of the folks with access to that garage.
AWS does not care about the data, does not want to see the data and goes out of its way to make it damn hard for it to see the data. The data is a black box to them and this is by design. You are not sending them the raw data in a format that they require for analyses. You are just sending them bits and bytes that they store for you.
The analyses third-parties in this case are the exact opposite. They explicitly require access to their data in a certain format for analysis. In fact, their business fails if they don't have access to this data.
They are both technically third parties but the way they handle the data is completely different.
One has every incentive to avoid reading the data, the other has every incentive to hoover everything it can.
I just don't think that's a meaningful distinction. There's no distinct line between "company that hosts all your data but doesn't analyze it" and "company that does data analytics on your data". It's a gradient, there are all kinds of companies that fit on that gradient, and it's weird to lambast people for using those companies as if it's a technical choice, when what we really want is people making good choices about the data protections their providers have in place.
AWS even has analytics products that require access to your data. I generally trust those more than sketchy analytics companies, but it's entirely because of the contractual protections AWS has in place, not because they're inherently different.
What is the privacy distinction between a third party with a contractual agreement and an employee with a contractual agreement?
Remember that Russian intelligence got a spy hired by Microsoft: https://www.theguardian.com/technology/2010/jul/14/russian-s... Will your interview questions find a foreign spy, or someone who isn't even a spy but is interested in looking at private data for personal amusement?
If Microsoft implemented proper security policies, I imagine that guy didn't have access to all of Microsoft's user data.
So that would be the main difference. Virtually all of a company's employees shouldn't have access to user data at all, and those that do would only have access to parts of it.
>If Microsoft implemented proper security policies, I imagine that guy didn't have access to all of Microsoft's user data.
This is precisely the sort of thing Microsoft takes incredibly seriously internally. Tim Cook may be a more vocal spokesman for treating user data with care but Microsoft is fanatical about it internally. They recognize the risk they face in the event of compromise and have made just enough mistakes in the past to appreciate how hard it is to actually protect their customers’ information.
This might be a naive view, but I think companies that are good at this sort of segmentation will also be good at picking trustworthy third parties and limiting (both technically and legally) what they can do, and conversely, that companies that just send a bunch of sensitive data to third parties out of laziness have no meaningful internal controls either.
"Avoid third parties" is an occasional effect of conscientious care of data, not a cause of it.
The reasonable tradeoff, in this case, would be to continue using the third-party analytics saas, but exclude personally identifiable information, or at the very least, exclude this extremely sensitive information.
I think it depends on your reasoning for not sharing data to third parties.
It seems like you're arguing that sharing data is wrong because, in the wrong hands, the data could be used to personally identify someone. In my mind, these are the ways that can happen:
1. The data is sent to an advertiser who can target based on that data. Seems possible, so it's relevant that this data isn't being shared with an ad firm.
2. The data is sent to a third party, whose employees can access and leak the data.
3. The data is sent to a third party, whose data gets compromised.
So the trade-off is, what is the value of having user information in a tool for analytics purposes, versus the chance that (2) or (3) (or any unknowns happen)? My argument is that analytics firms are not in the business of leaking or selling data; their business hinges on their client's data privacy. So to me, this seems like a reasonable trade-off for certain types of data.
As for whether HIV status is the type of data that's unreasonable... I can buy that argument either way. I've never used Grindr but I can imagine it being extremely relevant to its users. And any data that has product impact is useful in an analytics setting. For example, if Grindr has some features that make it easier for HIV-positive or negative people to filter, then they'd be interested in understanding whether it's being used in the product. Then again, I can equally see them deciding it's not worth the risk, and removing it.
If you think sharing sensitive data is wrong under all circumstances, on principle, then you're entitled to your beliefs, but that would seem to me awfully close to religion.
(Copy-pasting the message I already posted in this thread. Seems more relevant here)
I think that adoption of privacy preserving data aggregation/analysis will become the norm. The most immediate applications are 1) telemetry data that is used for monitoring (for example, Google Chrome uses differential privacy for collecting this data), and, 2) services like Google Maps and Tinder-like dating apps. In these applications, essential user information can be represented as integer/boolean values (is user present in location X? True or False. how old is the user? device CPU usage right now? ...)
Based on my limited understanding* of differential privacy, it falls short on exactness (of aggregate values) and robustness (against malicious clients). I've lately been studying the literature on function secret sharing and I think it is a better alternative to DP. Take this paper: https://www.henrycg.com/files/academic/pres/nsdi17prio-slide....
Prio: Private, Robust and Scalable Computation of Aggregate Statistics
Data collection and aggregation is performed by multiple servers. Every user splits up her response into multiple shares and sends one share to each server. I've understood how private sums can be computed. Let me explain it with a straw-man scheme.
Example (slide 26):
x_1 (user 1 is on Bay Bridge):- true == 1 == 15 + (-12) + (-2)
x_2 (user 2 is on Bay Bridge):- false == 1 == (-10) + 7 + 3 ...
If all users send shares of their data to the servers in this manner AND as long as at least one server doesn't reveal the identities of the people who sent it responses, the servers can exchange the sum of the shares they've received. Adding the three responses will allow the servers to infer that there are _ number of users on Bay Bridge without revealing their identities.
This system can be made robust by using Secret-shared non-interactive proofs (SNIPs). This allows servers to test if Valid(X) holds without leaking X.
The authors also bring up the literature on computing interesting aggregates using private sums: average, variance, most popular (approx.), min and max (approx.), quality of regression model R^2, least-squares regression, stochastic gradient descent.
Bottom line: I found the discussion on deployment scenarios very interesting. Data servers with jurisdictional/geographical diversity, app store-app developer collaborations for eliminating risk in telemetry data analysis, enterprises contracting with external auditors for analyzing customer data, etc.
* - I understand the randomized response and, to some extent, the RAPPOR technique (used for collecting Chrome telemetry data) but the other literature in that community goes over my head.
* * - This technique is a black box to me at the moment.
If one is HIV positive it would probably be a draw of the app to find only others who are also afflicted. Turning it off might result in some illegal decisions.
Grindr wouldn’t be the only way to declare one’s STD status though. It could be omitted from one’s profile, but declared during chat, for example, or prior to hooking up.
Methinks you don't understand the problem space. Putting it in your profile is intended to save the afflicted from wasting tons of time and energy on a. talking with people who will immediately nope out when they learn your status and b. dealing with a lot of emotional BS from people who want to see themselves as nice but who aren't really ready to deal with you and your situation.
That can be a hard enough conversation to have even if they know. Being straight up rejected by some high percentage of people who started to chat you up and are done the minute you mention HIV would be a dreadful experience. It's possible they are on the app precisely for the ability to pre-screen people for their willingness to hook up with someone HIV positive.
I imagine a lot of folks declared it in their profile without knowing it would be shared and they probably declared to try to have a more positive experience over having one dreaded discussion after another. I think those folks have reason to be upset.
I believe in certain places that if you withhold the information of being HIV positive from someone you have sex with and infect them, it's a pretty serious crime. I think it might even be a crime if you don't infect them.
Depends on the state. The majority of states either have laws relating to disclosing known STDs or laws specifically about disclosing known HIV status, or both.
This sort of deal isn't the same as sharing the HIV status to Google or Facebook so that advertisers can target or exclude that user information for the purposes of advertising.
For people who think this is still wrong, I'm curious what their pragmatic alternative is. How else are app developers supposed to analyze their app performance? The open source, self-hosted pickings are slim. (I can only think of Piwik, which in my experience has a dated feature set and severe performance issues.) Not everyone can afford to perform their own product analysis. Using a third-party analytics saas is kind of the only way to go and seems like a reasonable tradeoff of security for product visibility.