Hacker News new | past | comments | ask | show | jobs | submit login

> privacy-preserving data collection system

I hate this. Viscerally. Why, why, WHY does every “privacy first” system, platform, whatever, start with the presumption that some people should get your data and it’s just a matter of vetting the “good” groups from the “bad” groups.

No. That isn’t privacy.

No one should get my data. And it’s ridiculous that all of these companies try to position their data grabbing projects as “privacy” oriented when what they really mean is they’re not quite as invasive and/or are slightly more transparent about their data theft compared to others.

</rant>




I understand this is a sensitive area, and I understand that reasonable people have good reasons to be concerned. But it seems that you are directing your frustrations at Mozilla unfairly.

I find this to be a good definition of privacy:

> Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. - Wikipedia

From what I've read, Mozilla's Rally gives people the ability to choose what research studies to participate in.

I think there is plenty of discussion to be had about what level of control and granularity works best for different people, but I have confidence that Mozilla has both the right incentives and technical capability to contribute meaningfully in this space.


Sorry, I definitely did not put as much thought into my comment as I should have and I left out the critical piece of information that really ticked me off. I'm one of the nerds who actually reads the privacy documents and Rally's privacy policy[1] has a section titled "How We Use Your Information" that includes:

> improving Mozilla’s existing products and services

> creating and developing new products

All of the marketing copy is about "donating" your data to important research and how "Big Tech has built its success by exploiting your data." Meanwhile Mozilla is doing the exact same thing they're criticizing "big tech" for doing. Tucked away in the fine print is the fact that your data _isn't_ just going to be used for research studies, it's going to be exploited by yet another for-profit tech company. They've just put a nice warm and fuzzy do-gooder wrapper on it.

If Rally is transparent about how your data is used like they claim to be they would either (1) not use your data in that way and exclusively allow the data to be used for research, as advertised, or (2) make it abundantly obvious it will be used that way.

[1]https://rally.mozilla.org/privacy-policy/index.html


>> improving Mozilla’s existing products and services

>> creating and developing new products

I agree that these are concerning. They seem out of place. If you want to start a petition asking Mozilla to clarify and/or remove these clauses, I would sign it.


I almost opened an issue on their GitHub[1] (one of their privacy-related documents invites people to "call them on it" if you have privacy concerns) but I decided against it because I worried about harassment from sleuthing HN readers finding me on other platforms. Such are the compromises you make as a lady on the internet sometimes.

[1]https://github.com/mozilla-rally


>but I decided against it because I worried about harassment from sleuthing HN readers finding me on other platforms. Such are the compromises you make as a lady on the internet sometimes.

You make it easy for the internet assholes to do this to you. If your HN username is your real name that is a really big problem for your privacy. Your occupation is stated in your profile. Likewise, stating your sex might as well be your privacy's death knell.

Become more anonymous to provide less ammunition to those with nothing more to do than torment others, then continue doing the things you feel are the right things to do without excuse.


I've been interneting-while-woman for three decades now so of course my username isn't my real name. But (like most developers) my GitHub includes my real name, my photo, and my company, hence my hesitation. I have to wonder, though... are you as quick to chastise the "internet assholes" you see harassing women online as you were to chastise me for having the audacity to admit my gender online?


That wasn't chastising you. I don't know you so there was no way I could have known how long you've been on the internet nor your depth in experience in identity within it. I was attempting to point out some of the bigger factors feeding into your complaint, regarding the potential of people harassing you, with the intent of no more than to bring attention to something you may have overlooked, as humans tend to do sometimes.

You're being combative for no reason so I'll leave it here but, in the future, don't always assume malice.


I see both sides here. On the whole, I think both people are trying to contribute and help in their own ways.

Some more specific comments:

> You're being combative for no reason so I'll leave it here but, in the future, don't always assume malice.

Saying "no reason" doesn't ring true to me. What one person considers to be (valid) reasons is subjective. In my view, 'reasons' includes a person's identity and experiences.

With that in mind, how do you think this alternative message would have been received... ?

"It was not my intention to chastise you. I meant well in offering some ways to reduce the chances that trolls come after you. Please don't assume malice. I'm happy to listen if you have suggestions on how I could communicate the message more effectively."


I don’t think it’s in any way “combative” to point out that I am indeed competent and capable of interacting with the online world safely without the assistance patronizing of strangers on the internet. And, whether your intention or not, that’s exactly what your comment was: patronizing. With undertones of “well what did you think would happen when you present yourself in such a way”. Whether you’re willing to admit it or not, comments like yours aren’t helpful — they’re part of the problem.


I'd suggest looking up the "principle of charity".

Additionally, claiming the self-perceived "undertones" as "the real truth" vs clearly (and reasonably) stated (subsequent) stance/explanation indeed is confrontative, at best. In my opinion and experience.

When X claims they know better what was the intent or tone of Y's message when Y is already there precisely elaborating... X has to be realistic, admit no prior knowledge of Y (nor their idiosyncrasies, nor their style of communication), be aware of the limitations of the medium (no voice tone, no body language), and take things at face value as written (and/or ask for explanation/further details in neutral and a non-confrontational way). Y has to do the same. It really is the only way that maintains functionality of the conversation.

(Also, forseeing a potential conflict: X and Y has nothing to do with bio-sex, it just signifies two unknowns.)

At least, that's the self-defined framework I use, as a probably somewhat autistic/ADHD person (people confuse and frustrate me to no end).


Well said.

This is quite similar to a concept in psychology called Theory of Mind (ToM):

> Theory of mind as a personal capability is the understanding that others have beliefs, desires, intentions, and perspectives that are different from one's own.

I have a theory: It is relatively more difficult for people who have faced adversity (whether it be from systemic bias and/or personal situations) to make unemotional assessments when conditions relate to those adverse situations.


@I have a theory:

I would agree, in principle and mostly in practice. However, if one knows better, then they must also know that the same "better" can be done. It is hard, yes, and it takes practice, but is achievable. Exposure therapy of sorts helps immensely. And, if/when feeling overwhelmed, simply ask for a recess and postponement of the discussion. I call it "processing time", and it usually takes a few days, or even longer. I call upon it, when I sense it is required (on my and other side, too).

I also try to familiarize the other side with my own (aforementioned) communication style and idiosyncrasies. I'd say we all seek to at least not be misunderstood, if understanding (in sense of the agreement) is not possible... and being frank and upfront about it - helps.


> And, whether your intention or not, that’s exactly what your comment was: patronizing.

E: Not quite. You perceived the comment as patronizing. This is not a universal assessment. From my point of view, I didn't find it patronizing. I'm not saying I'm right and you are wrong; I'm simply saying it is far from clear cut.

Here is one definition of patronizing that I find useful:

> apparently kind or helpful but betraying a feeling of superiority; condescending

You may think that someone else feels superior to you. That is your assessment, I respect that, and I'll listen. At the same time, it is subjective and is uncertain, because your knowledge is incomplete.

The principle of charity is useful here. I hope you can see alternative interpretations that show N does not perceive himself as superior. In particular, their commentary, in my view, is by and large very thoughtful, with the exception of a few sharp edges (which everyone has). From what I can tell, N's edgier comments came out because they felt attacked.

That's the pattern I see here. A person feels attacked and their communication becomes less charitable and even abrasive. At least two people fell into this trap in this thread. As a community, we don't benefit when this happens, but this is human nature.

The solutions are not easy. In my view, we should try to observe, be thoughtful, and attempt to deescalate tensions. I believe a vast majority of people are here for positive reasons and have plenty to learn from each other.


Now that's just nuts.


I can understand that some people may disagree with and/or not understand what she said, it is both unkind and not helpful to say "Now that's just nuts." The comment does not move the conversation forward in terms of clarification or understanding. The comment does not demonstrate patience nor does it show curiosity of other perspectives.


> If Rally is transparent about how your data is used

If Rally was not transparent about this, then you would not have seen that and would not have gotten emotionally triggered by it.

But given that you’ve identified a visceral opposition to that, you should consider not opting-in to that particular study.


I'm one of the very few people who actually read all of the disclosure documents. I wouldn't be surprised if I were the only person who read these documents in their entirety aside from the document drafter(s) themselves. And this wasn't in the privacy notice for a study but for the Rally browser extension. Rather confusingly, Rally has one privacy policy and each individual study will have their own, separate, privacy policy in addition to the Rally privacy policy.


Admirable digging, for sure.


> you would not have seen that and would not have gotten emotionally triggered

Don’t do that here. The comment is about Rally pretending to be about altruistic academic research while actually being a platform for Mozilla product development.


My reply predates this edit from the parent:

> Sorry, I definitely did not put as much thought into my comment as I should have and I left out the critical piece of information that really ticked me off.


Such are the perils of edit logs not being available and/or not quoting what you are responding to.


TBF, if the stuff that would have needed to be quoted was added after he replied...


> But given that you’ve identified a visceral opposition to that, you should consider not opting-in to that particular study.

The commenter referenced Rally's privacy policy. It is not specific to a particular study.


I’m being charitable and leaving room for the possibility that they provide a tighter set of policies for some studies or revise their general policy to make commercial use be a study-by-study determination.

Or, just don’t opt-in to any of the Rally studies. Your call, it’s your data.


> Or, just don’t opt-in to any of the Rally studies. Your call, it’s your data.

First, a caveat. I don't know the people behind the comments in this sub-thread. I have read almost all of them and find them to be informative and thoughtful. So thanks for that.

That said, when I read a comment like the above, what I hear is a mentality of "you are an individual, with power, if you don't like it, act individually". That mentality is not wrong, but is quite limited and incomplete. It overlooks the power and importance of individuals discussing and organizing together, which is often much more powerful than simply "voting with your feet".


As always, with your replies to me, your spirit/style is much appreciated.

> It overlooks the power and importance of individuals discussing and organizing together, which is often much more powerful than simply "voting with your feet".

I did indeed purposefully overlook that, but only out of a working assumption that the consent of the governed is there, overall.

If there truly is no consent of the governed for informed consent being a sufficient tool for empowerment in these cases, then there are deeper topics that need to be discussed. Some of those topics could lead to conclusions that may have catastrophic implications for modern society, if not discussed in a reasonable and considerate manner. Cancelling entire industries, in one fell swoop, are some of the conclusions being drawn in the current climate, for example.

If there is consent of the governed that opt-in is a sufficient tool for empowerment in these cases, then these cases may ultimately be logically reducible to failures to uphold the laws, as they currently exist.

To further expound on the latter line of reasoning, these cases seem to anecdotally belong to a few categories: 1) I don’t like advertising, 2) I didn’t know that they could do that, 3) I don’t want anyone watching what I do.

Category 1) is a completely understandable sentiment with implications that I currently lack the energy to comment upon.

Category 2) should be taken as that individual’s human rights having been violated. Specifically, that situation could arguably be pursued in the United States under the ADA. Enforced broadly, this could have catastrophic consequences to modern society. Enforced judiciously, this could provide much needed social progress.

Category 3) is a completely understandable sentiment with implications that I currently lack the energy to comment upon.

The category 2) implications do seem to be most relevant to society, at this juncture.

edit: I can imagine that a class-action ADA lawsuit against a carefully selected set of defendants (if legally possible) could lead to resolution of this matter in the courts, without calling on the legislature to comment on topics for which they are under-qualified to comment upon.


> I’m being charitable and leaving room for the possibility that they provide a tighter set of policies for some studies or revise their general policy to make commercial use be a study-by-study determination.

I just reviewed these pages:

[1]: Mozilla Rally Privacy Policy https://rally.mozilla.org/privacy-policy/index.html

[2]: Political and COVID-19 News https://rally.mozilla.org/current-studies/political-and-covi...

Read together, the problematic parts of the general privacy policy are not addressed nor remedied by the specific study's details, because a specific study addresses how that study uses the data.

Perhaps a future study would be different? I doubt it. My take is that the concerning parts of general privacy policy's language will stand (quoted a few messages above). Here's why I say this... Based on my experience with organizations and lawyers, Mozilla is unlikely to want to modify its general privacy policy based on particular discussions with each organization involved in a study; it would be too time-consuming and expensive, and it would create a path-dependence such that every previous study details would need to be reevaluated in the light of a modification to the general policy. Instead, Mozilla probably crafted their privacy policy in a general way, hoping that it will be acceptable to participants and partners. I expect they will modify it as little as possible.


> Mozilla is unlikely to want to modify its general privacy policy based on particular discussions with each organization involved in a study

They may ultimately have a legal responsibility to do so, depending on the nature of their contracts with their research partners. I’m not a lawyer, but if I were, I’d be digging into the case law to see if Mozilla + (”public university” or “federal funds”) = “a combination that must meet all severally applicable laws”.


Part of the problem with large internet platforms is that parts of 'my data' is inextricably linked to 'your data', even to the extent that 'my data' only exists on some platforms as data points in 'your data'. In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.

I've seen the opinion expressed that part of the reason society allows this type of surveillance is that so many members of our society don't understand the details or scope. If true, whatever discussions we have about this should include the idea that we're proposing to increase the scope of the problem while researching it.


Yes, calling out linkage as a key challenge for data privacy is very important.

To dig in one level deeper... Have you looked into privacy-preserving record linkage (PPRL) or similar ideas? (I have not, but I'm interested.)

> The process of linking records without revealing any sensitive or confidential information about the entities represented by these records is known as privacy-preserving record linkage (PPRL). > Source: "Privacy-Preserving Record Linkage". > DOI: https://doi.org/10.1007/978-3-319-63962-8_17-1

See also: "A taxonomy of privacy-preserving record linkage techniques" at https://www.sciencedirect.com/science/article/abs/pii/S03064...


> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.

That is a non-sequitor, when we are discussing opting-in to social science research.

Rally is not a social network platform. It is a social science platform. There is no reason for it to be directly, as a platform, concerned with your contacts.

Per their FAQ:

> We abide by a series of principles known as Lean Data Practices. Lean Data means that we collect just the data we need, and do everything we can to protect what we collect. Studies only collect data that is essential to creating a broader understanding of a research topic.

Institutional Review Boards, privacy policies, and the various contractual agreements between parties operating and building the Rally research platform would be held to task by scientific principles of treating participants humanely and ethically.

If an IRB deemed that it was unethical to conduct a study due to the design implications indicating that data could be obtained that did not originate from informed consent, then that study would not be able to be conducted and the research design would have to be modified to correct itself or that specific research methodology would be considered to be generally unethical by the wider scientific community, just the same as the scientific community deems it unethical to do genetic experiments on unwilling human subjects.


>> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example. That is a non-sequitor, when we are discussing opting-in to social science research.

> That is a non-sequitor, when we are discussing opting-in to social science research.

As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.

Regarding the risk of data linkages, how confident are you that Mozilla and others with access to the data will manage it ...

1. ... up to the currently-accepted level of knowledge (including hopefully some theoretical guarantees, if possible, and if not, mitigations with known kinds of risk) and ...

2. ... that the current level is acceptable given that history of data privacy doesn't paint a rosy picture?

To be open, I'm not interested in your confidence level per se, but rather the reasoning in your risk assessment. I want to weight the various factors myself, in other words. For example, you appear to have more confidence in IRB's than I do.

Knowing the history of the "arms race" between deidentification and reidentification, I don't put a whole lot of trust in Institutional Review Boards. Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.

P.S. In my view, using "non-sequitor" here is a bit strong, perhaps even off-putting. It is only a "non-sequitor" because you are making different logical assumptions than the commenter. Another approach would be to say "your conclusion only holds if..." This would make your point without being so pointed. It also helps show that you want to understand the other person's assumptions.


> As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.

It appears that the parent commenter revised its content to indicate that the concern was indeed “your data getting mixed with my data, when browsing Facebook”, to paraphrase.

My response there was essentially: ethical review would have to determine if all data must be provided through informed consent of all the originating humans.

Held to the gold standard of ethics, an IRB would likely have to contraindicate a research design if it did not provide a way for every individual human involved to provide informed consent. If any single individual in a data set indicated that they did not consent, then that data set would need to be reshaped to not include that individual. In lieu of that, the entire data set would have to be excluded from study.

Of course, that has some complex implications, when it comes to broad categories of data sources for browser usage: social networking sites would be a minefield. Did the website author provide consent for their content to be machine analyzed for sentiment, etc., if one really wanted to get down to it. You’d have to consider each and every resource location. Can’t assume that all browser traffic is open web traffic - someone could have left their Rally extension running while navigating to a corporate confidential network, complex copyrights, etc.

My understanding is that the US Supreme Court is about to decide on whether “if you can read it, you can keep it” as a consequence of Microsoft/LinkedIn vs. hiQ Labs, so don’t forget the “arms race” of justice, either.

> Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.

Indeed, even just basic double-blind medical studies are hard to defend when you consider operational security, let alone information security.


Thank you for your PS feedback, it is appreciated and will be incorporated.

My overall point is that if you don’t want data being captured that may provide data about your contacts, then dont opt-in to providing it.

Informed consent is the bedrock upon which social science ethics rests.


Sure, I understand your point. Have you dug into the problems of data linkage attacks? (see questions above)


Not yet! I’m vacuously familiar with the basics, but I’m curious about the details and their relation to research design.

Will comment more fully as I find the energy.


In case it is of interest, here is a fairly short article with a short historical look at data de-identification. If nothing else, it is one jumping off point.

"Data De-identification: Possibilities, Progress, and Perils". 2019. https://forge.duke.edu/blog/data-de-identification-possibili...


Rally is entirely opt-in, so I think you're off the mark here?


I see where you are coming from but it is opt-in so I don’t believe that is theft.

All scientific research requires data and certain types of research require certain types of data to be useful. I’m personally against ad driven data collection because 1) I don’t think as a whole its outcomes, both first order and second order, are in the long term interests of the viewers or society. However assuming you can trust the data collected and how it is used is only as stated then studies looking to understand online interactions, where more and more of our lives are being lived, with the interest of improving long term outcomes seems like a good thing to me. Of course once the data is out of your hands you loose control over it so it’s good that Mozilla is doing data minimization and aggregation to help reduce the impacts of that.

I guess it really boils down to trust, intent, our ability to choose, and transparency which has not been respected in many cases so I very much do understand the skepticism. Here is to hoping this will be different. So far, in my opinion, this looks to be the case.

Some of the current studies [1]:

- Political and COVID-19 News

- Your Time Online and "Doomscrolling"

Edit: I read your comment [2] in another thread about the privacy policy, and that is a good point. I sent an email to mozilla asking for clarification and if I get a reply I will add it here.

1: https://rally.mozilla.org/current-studies/

2: https://news.ycombinator.com/item?id=27633918


> start with the presumption that some people should get your data

Some people do want others to have their data.

Dropbox is a great example of a successful business that is built on the idea of sharing data.


Yes, I want to use Dropbox to share my invoice with my client and my client to share the assets from their designer with me.

I don't want anybody tracking what websites I visit, articles I read or facebook videos I watch.

These are two VERY different definitions of "data", mingling the two is not helpful to discourse.


>I don't want anybody tracking what websites I visit, articles I read or facebook videos I watch.

This whole thing is opt-in, is it not? If you don't want to share your information, you simply do not opt-in.


I'm replying to the person who used Dropbox as an example of "Some people do want others to have their data." in order to clear up their confusion.


> These are two VERY different definitions of "data", mingling the two is not helpful to discourse.

Indeed, data has complex nuances that are sometimes missed.

Let’s not mingle data intended for advertisement with data intended for social science research.


> Let’s not mingle data intended for advertisement with data intended for social science research.

Well those are two different USAGES of the same type of data (tracking). That one is good and the other bad is a different discussion.


> That one is good and the other bad is a different discussion.

In my estimation, that is the current discussion.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: