Rally, a novel privacy-first data sharing platform

thombles · on June 25, 2021

Whoever wrote this piece is way too close to... whatever it is Mozilla is doing here. There seems to be an assumption that users will be gleeful to throw their data at legitimate researchers from legitimate institutions doing legitimate work. What "data"? Browsing history? Identity? Something else? Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.

As a nerd I can read between the lines—clearly they have come up with some sort of privacy-preserving data collection system that is useful. But at face value this whole article is just saying "hey use Firefox and give your data to scientists for reasons we don't bother to explain because obviously it's good."

jonathanmayer · on June 25, 2021

Princeton research collaborator here. Glad to answer questions about Rally.

> What "data"? Browsing history? Identity? Something else?

That depends on the Rally study, since research questions differ and studies are required to practice data minimization. Each study is opt in, with both short-form and long-form explanations. Academic studies also involve IRB-approved informed consent. Take a look at our launch study for an example [1].

> Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.

The motivation is enabling crowdsourced scientific research that benefits society. Think Apple Research [2], NYU Ad Observatory [3], or The Markup's Citizen Browser [4]. There are many research questions at the intersection of technology and society where conventional methods like web crawls, surveys, and social media feeds aren't sufficient. That's especially true for platform accountability research; the major platforms have generally refused to facilitate independent research that might identify problems, and platform problems often involve targeting and personalization that other methods can't meaningfully examine.

[1] https://rally.mozilla.org/current-studies/political-and-covi... [2] https://www.apple.com/ios/research-app/ [3] https://adobservatory.org/ [4] https://themarkup.org/citizen-browser

thombles · on June 25, 2021

These "This Study Will Collect" and "How We Protect You" sections are really good. It probably wouldn't convince me personally to sign up, but it's as comprehensive as I would expect. It's a shame that these comments didn't make it into the blog post.

mushufasa · on June 25, 2021

I think that the motivation of 'enabling citizen science' is not a very strong one. You will get very, very skewed results, moreso than typical WEIRD, if you conduct studies on the people for whom that is sufficient motivation.

A stronger motivation would be providing a product or service that tangibly adds value to someone's life.

After reading this, I have no idea how Rally would provide any tangible benefits to me.

rglullis · on June 26, 2021

Exactly. It is so weird to see all this marketing speak that makes it sound like users can get to benefit from something, but in the end this is just something that gets people to work and provide data for free to multi-billionaire universities.

We don't any more studies or research to know that the best privacy policy is to not collect any data in the first place.

lucideer · on June 25, 2021

I know you mean well but I think you completely missed the above commenters point.

You've replied here with answers to address their (our?) potential concerns, but the commenter never said they had concerns about the project itself, rather that this particular blog post doesn't "sell" or explain the value add well. That's feedback on the project's communication strategy, not on what it's actually doing.

> > Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.

> The motivation is enabling crowdsourced scientific research that benefits society.

You seem to be confusing "theys". The question is what motivates participants, not what motivates researchers.

9wzYQbTYsAIc · on June 25, 2021

> You seem to be confusing "theys". The question is what motivates participants, not what motivates researchers.

Contrarily, you seem to be confusing “theys”, yourself.

There exist participants that are motivated by participating in research that benefits society.

Just like there exist individuals motivated by lending their computing resources to the various @Home research efforts.

thayne · on June 25, 2021

But if the participants are limited to people who are motivated solely by participating in research, wouldn't that add significant bias to that research?

9wzYQbTYsAIc · on June 26, 2021

Indeed, sampling bias is a large concern.

Nonetheless, much of psychology research conducted in the US has made do with ridiculous sampling bias - the US college student is anecdotally considered to be the most-studied population in the world.

yissp · on June 26, 2021

Doesn't the field of psychology have pretty serious issues with the replicability of their experimental results?

9wzYQbTYsAIc · on June 26, 2021

Indeed, anecdotally, if not empirically, that is the case. Nonetheless, psychology is a highly operationalized field.

In other words, every thorough study begins with an assessment and revision of the consensus language being used to describe reality.

On that front alone, psychology is one of the most hard sciences around.

Deep learning is directly attributable to psychology research, for what it is worth.

chobytes · on June 25, 2021

Personally I don't think that researchers have any more business doing this kind of surveillance than Google and company do.

The idea that this will benefit society seems naive to me. I feel like it will only serve to legitimize the practice by putting ostensibly trustworthy faces on the packaging.

splistud · on June 25, 2021

Not just surveillance, but conducting research within corporate platforms. Therefore, they would have access to my data and a corporation's engine. If I think that google knows too much about me, do I get to opt-in whether that hyper-knowledge is shared to researchers (because I won't).

9wzYQbTYsAIc · on June 25, 2021

> Personally I don't think that researchers have any more business doing this kind of surveillance than Google and company do.

As other commenters have noted, then you should decline to opt-in to participating in research such as this.

kortilla · on June 25, 2021

> The motivation is enabling crowdsourced scientific research that benefits society.

Oh, well since it “benefits society”...

Tell me, how is it that you filter for the research that benefits society vs the research that doesn’t?

elliekelly · on June 25, 2021

> privacy-preserving data collection system

I hate this. Viscerally. Why, why, WHY does every “privacy first” system, platform, whatever, start with the presumption that some people should get your data and it’s just a matter of vetting the “good” groups from the “bad” groups.

No. That isn’t privacy.

No one should get my data. And it’s ridiculous that all of these companies try to position their data grabbing projects as “privacy” oriented when what they really mean is they’re not quite as invasive and/or are slightly more transparent about their data theft compared to others.

</rant>

xpe · on June 25, 2021

I understand this is a sensitive area, and I understand that reasonable people have good reasons to be concerned. But it seems that you are directing your frustrations at Mozilla unfairly.

I find this to be a good definition of privacy:

> Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. - Wikipedia

From what I've read, Mozilla's Rally gives people the ability to choose what research studies to participate in.

I think there is plenty of discussion to be had about what level of control and granularity works best for different people, but I have confidence that Mozilla has both the right incentives and technical capability to contribute meaningfully in this space.

elliekelly · on June 25, 2021

Sorry, I definitely did not put as much thought into my comment as I should have and I left out the critical piece of information that really ticked me off. I'm one of the nerds who actually reads the privacy documents and Rally's privacy policy[1] has a section titled "How We Use Your Information" that includes:

> improving Mozilla’s existing products and services

> creating and developing new products

All of the marketing copy is about "donating" your data to important research and how "Big Tech has built its success by exploiting your data." Meanwhile Mozilla is doing the exact same thing they're criticizing "big tech" for doing. Tucked away in the fine print is the fact that your data _isn't_ just going to be used for research studies, it's going to be exploited by yet another for-profit tech company. They've just put a nice warm and fuzzy do-gooder wrapper on it.

If Rally is transparent about how your data is used like they claim to be they would either (1) not use your data in that way and exclusively allow the data to be used for research, as advertised, or (2) make it abundantly obvious it will be used that way.

[1]https://rally.mozilla.org/privacy-policy/index.html

xpe · on June 25, 2021

>> improving Mozilla’s existing products and services

>> creating and developing new products

I agree that these are concerning. They seem out of place. If you want to start a petition asking Mozilla to clarify and/or remove these clauses, I would sign it.

elliekelly · on June 25, 2021

I almost opened an issue on their GitHub[1] (one of their privacy-related documents invites people to "call them on it" if you have privacy concerns) but I decided against it because I worried about harassment from sleuthing HN readers finding me on other platforms. Such are the compromises you make as a lady on the internet sometimes.

[1]https://github.com/mozilla-rally

Nicksil · on June 25, 2021

>but I decided against it because I worried about harassment from sleuthing HN readers finding me on other platforms. Such are the compromises you make as a lady on the internet sometimes.

You make it easy for the internet assholes to do this to you. If your HN username is your real name that is a really big problem for your privacy. Your occupation is stated in your profile. Likewise, stating your sex might as well be your privacy's death knell.

Become more anonymous to provide less ammunition to those with nothing more to do than torment others, then continue doing the things you feel are the right things to do without excuse.

elliekelly · on June 25, 2021

I've been interneting-while-woman for three decades now so of course my username isn't my real name. But (like most developers) my GitHub includes my real name, my photo, and my company, hence my hesitation. I have to wonder, though... are you as quick to chastise the "internet assholes" you see harassing women online as you were to chastise me for having the audacity to admit my gender online?

Nicksil · on June 25, 2021

That wasn't chastising you. I don't know you so there was no way I could have known how long you've been on the internet nor your depth in experience in identity within it. I was attempting to point out some of the bigger factors feeding into your complaint, regarding the potential of people harassing you, with the intent of no more than to bring attention to something you may have overlooked, as humans tend to do sometimes.

You're being combative for no reason so I'll leave it here but, in the future, don't always assume malice.

xpe · on June 25, 2021

I see both sides here. On the whole, I think both people are trying to contribute and help in their own ways.

Some more specific comments:

> You're being combative for no reason so I'll leave it here but, in the future, don't always assume malice.

Saying "no reason" doesn't ring true to me. What one person considers to be (valid) reasons is subjective. In my view, 'reasons' includes a person's identity and experiences.

With that in mind, how do you think this alternative message would have been received... ?

"It was not my intention to chastise you. I meant well in offering some ways to reduce the chances that trolls come after you. Please don't assume malice. I'm happy to listen if you have suggestions on how I could communicate the message more effectively."

elliekelly · on June 25, 2021

I don’t think it’s in any way “combative” to point out that I am indeed competent and capable of interacting with the online world safely without the assistance patronizing of strangers on the internet. And, whether your intention or not, that’s exactly what your comment was: patronizing. With undertones of “well what did you think would happen when you present yourself in such a way”. Whether you’re willing to admit it or not, comments like yours aren’t helpful — they’re part of the problem.

BugWatch · on June 26, 2021

I'd suggest looking up the "principle of charity".

Additionally, claiming the self-perceived "undertones" as "the real truth" vs clearly (and reasonably) stated (subsequent) stance/explanation indeed is confrontative, at best. In my opinion and experience.

When X claims they know better what was the intent or tone of Y's message when Y is already there precisely elaborating... X has to be realistic, admit no prior knowledge of Y (nor their idiosyncrasies, nor their style of communication), be aware of the limitations of the medium (no voice tone, no body language), and take things at face value as written (and/or ask for explanation/further details in neutral and a non-confrontational way). Y has to do the same. It really is the only way that maintains functionality of the conversation.

(Also, forseeing a potential conflict: X and Y has nothing to do with bio-sex, it just signifies two unknowns.)

At least, that's the self-defined framework I use, as a probably somewhat autistic/ADHD person (people confuse and frustrate me to no end).

xpe · on June 26, 2021

Well said.

This is quite similar to a concept in psychology called Theory of Mind (ToM):

> Theory of mind as a personal capability is the understanding that others have beliefs, desires, intentions, and perspectives that are different from one's own.

I have a theory: It is relatively more difficult for people who have faced adversity (whether it be from systemic bias and/or personal situations) to make unemotional assessments when conditions relate to those adverse situations.

BugWatch · on July 5, 2021

@I have a theory:

I would agree, in principle and mostly in practice. However, if one knows better, then they must also know that the same "better" can be done. It is hard, yes, and it takes practice, but is achievable. Exposure therapy of sorts helps immensely. And, if/when feeling overwhelmed, simply ask for a recess and postponement of the discussion. I call it "processing time", and it usually takes a few days, or even longer. I call upon it, when I sense it is required (on my and other side, too).

I also try to familiarize the other side with my own (aforementioned) communication style and idiosyncrasies. I'd say we all seek to at least not be misunderstood, if understanding (in sense of the agreement) is not possible... and being frank and upfront about it - helps.

xpe · on June 26, 2021

> And, whether your intention or not, that’s exactly what your comment was: patronizing.

E: Not quite. You perceived the comment as patronizing. This is not a universal assessment. From my point of view, I didn't find it patronizing. I'm not saying I'm right and you are wrong; I'm simply saying it is far from clear cut.

Here is one definition of patronizing that I find useful:

> apparently kind or helpful but betraying a feeling of superiority; condescending

You may think that someone else feels superior to you. That is your assessment, I respect that, and I'll listen. At the same time, it is subjective and is uncertain, because your knowledge is incomplete.

The principle of charity is useful here. I hope you can see alternative interpretations that show N does not perceive himself as superior. In particular, their commentary, in my view, is by and large very thoughtful, with the exception of a few sharp edges (which everyone has). From what I can tell, N's edgier comments came out because they felt attacked.

That's the pattern I see here. A person feels attacked and their communication becomes less charitable and even abrasive. At least two people fell into this trap in this thread. As a community, we don't benefit when this happens, but this is human nature.

The solutions are not easy. In my view, we should try to observe, be thoughtful, and attempt to deescalate tensions. I believe a vast majority of people are here for positive reasons and have plenty to learn from each other.

Nicksil · on June 25, 2021

Now that's just nuts.

xpe · on June 26, 2021

I can understand that some people may disagree with and/or not understand what she said, it is both unkind and not helpful to say "Now that's just nuts." The comment does not move the conversation forward in terms of clarification or understanding. The comment does not demonstrate patience nor does it show curiosity of other perspectives.

9wzYQbTYsAIc · on June 25, 2021

> If Rally is transparent about how your data is used

If Rally was not transparent about this, then you would not have seen that and would not have gotten emotionally triggered by it.

But given that you’ve identified a visceral opposition to that, you should consider not opting-in to that particular study.

elliekelly · on June 25, 2021

I'm one of the very few people who actually read all of the disclosure documents. I wouldn't be surprised if I were the only person who read these documents in their entirety aside from the document drafter(s) themselves. And this wasn't in the privacy notice for a study but for the Rally browser extension. Rather confusingly, Rally has one privacy policy and each individual study will have their own, separate, privacy policy in addition to the Rally privacy policy.

9wzYQbTYsAIc · on June 26, 2021

Admirable digging, for sure.

kortilla · on June 25, 2021

> you would not have seen that and would not have gotten emotionally triggered

Don’t do that here. The comment is about Rally pretending to be about altruistic academic research while actually being a platform for Mozilla product development.

9wzYQbTYsAIc · on June 25, 2021

My reply predates this edit from the parent:

> Sorry, I definitely did not put as much thought into my comment as I should have and I left out the critical piece of information that really ticked me off.

xpe · on June 26, 2021

Such are the perils of edit logs not being available and/or not quoting what you are responding to.

CRConrad · on June 28, 2021

TBF, if the stuff that would have needed to be quoted was added after he replied...

xpe · on June 25, 2021

> But given that you’ve identified a visceral opposition to that, you should consider not opting-in to that particular study.

The commenter referenced Rally's privacy policy. It is not specific to a particular study.

9wzYQbTYsAIc · on June 25, 2021

I’m being charitable and leaving room for the possibility that they provide a tighter set of policies for some studies or revise their general policy to make commercial use be a study-by-study determination.

Or, just don’t opt-in to any of the Rally studies. Your call, it’s your data.

xpe · on June 25, 2021

> Or, just don’t opt-in to any of the Rally studies. Your call, it’s your data.

First, a caveat. I don't know the people behind the comments in this sub-thread. I have read almost all of them and find them to be informative and thoughtful. So thanks for that.

That said, when I read a comment like the above, what I hear is a mentality of "you are an individual, with power, if you don't like it, act individually". That mentality is not wrong, but is quite limited and incomplete. It overlooks the power and importance of individuals discussing and organizing together, which is often much more powerful than simply "voting with your feet".

9wzYQbTYsAIc · on June 25, 2021

As always, with your replies to me, your spirit/style is much appreciated.

> It overlooks the power and importance of individuals discussing and organizing together, which is often much more powerful than simply "voting with your feet".

I did indeed purposefully overlook that, but only out of a working assumption that the consent of the governed is there, overall.

If there truly is no consent of the governed for informed consent being a sufficient tool for empowerment in these cases, then there are deeper topics that need to be discussed. Some of those topics could lead to conclusions that may have catastrophic implications for modern society, if not discussed in a reasonable and considerate manner. Cancelling entire industries, in one fell swoop, are some of the conclusions being drawn in the current climate, for example.

If there is consent of the governed that opt-in is a sufficient tool for empowerment in these cases, then these cases may ultimately be logically reducible to failures to uphold the laws, as they currently exist.

To further expound on the latter line of reasoning, these cases seem to anecdotally belong to a few categories: 1) I don’t like advertising, 2) I didn’t know that they could do that, 3) I don’t want anyone watching what I do.

Category 1) is a completely understandable sentiment with implications that I currently lack the energy to comment upon.

Category 2) should be taken as that individual’s human rights having been violated. Specifically, that situation could arguably be pursued in the United States under the ADA. Enforced broadly, this could have catastrophic consequences to modern society. Enforced judiciously, this could provide much needed social progress.

Category 3) is a completely understandable sentiment with implications that I currently lack the energy to comment upon.

The category 2) implications do seem to be most relevant to society, at this juncture.

edit: I can imagine that a class-action ADA lawsuit against a carefully selected set of defendants (if legally possible) could lead to resolution of this matter in the courts, without calling on the legislature to comment on topics for which they are under-qualified to comment upon.

xpe · on June 25, 2021

> I’m being charitable and leaving room for the possibility that they provide a tighter set of policies for some studies or revise their general policy to make commercial use be a study-by-study determination.

I just reviewed these pages:

[1]: Mozilla Rally Privacy Policy https://rally.mozilla.org/privacy-policy/index.html

[2]: Political and COVID-19 News https://rally.mozilla.org/current-studies/political-and-covi...

Read together, the problematic parts of the general privacy policy are not addressed nor remedied by the specific study's details, because a specific study addresses how that study uses the data.

Perhaps a future study would be different? I doubt it. My take is that the concerning parts of general privacy policy's language will stand (quoted a few messages above). Here's why I say this... Based on my experience with organizations and lawyers, Mozilla is unlikely to want to modify its general privacy policy based on particular discussions with each organization involved in a study; it would be too time-consuming and expensive, and it would create a path-dependence such that every previous study details would need to be reevaluated in the light of a modification to the general policy. Instead, Mozilla probably crafted their privacy policy in a general way, hoping that it will be acceptable to participants and partners. I expect they will modify it as little as possible.

9wzYQbTYsAIc · on June 25, 2021

> Mozilla is unlikely to want to modify its general privacy policy based on particular discussions with each organization involved in a study

They may ultimately have a legal responsibility to do so, depending on the nature of their contracts with their research partners. I’m not a lawyer, but if I were, I’d be digging into the case law to see if Mozilla + (”public university” or “federal funds”) = “a combination that must meet all severally applicable laws”.

splistud · on June 25, 2021

Part of the problem with large internet platforms is that parts of 'my data' is inextricably linked to 'your data', even to the extent that 'my data' only exists on some platforms as data points in 'your data'. In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.

I've seen the opinion expressed that part of the reason society allows this type of surveillance is that so many members of our society don't understand the details or scope. If true, whatever discussions we have about this should include the idea that we're proposing to increase the scope of the problem while researching it.

xpe · on June 25, 2021

Yes, calling out linkage as a key challenge for data privacy is very important.

To dig in one level deeper... Have you looked into privacy-preserving record linkage (PPRL) or similar ideas? (I have not, but I'm interested.)

> The process of linking records without revealing any sensitive or confidential information about the entities represented by these records is known as privacy-preserving record linkage (PPRL). > Source: "Privacy-Preserving Record Linkage". > DOI: https://doi.org/10.1007/978-3-319-63962-8_17-1

See also: "A taxonomy of privacy-preserving record linkage techniques" at https://www.sciencedirect.com/science/article/abs/pii/S03064...

9wzYQbTYsAIc · on June 25, 2021

> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.

That is a non-sequitor, when we are discussing opting-in to social science research.

Rally is not a social network platform. It is a social science platform. There is no reason for it to be directly, as a platform, concerned with your contacts.

Per their FAQ:

> We abide by a series of principles known as Lean Data Practices. Lean Data means that we collect just the data we need, and do everything we can to protect what we collect. Studies only collect data that is essential to creating a broader understanding of a research topic.

Institutional Review Boards, privacy policies, and the various contractual agreements between parties operating and building the Rally research platform would be held to task by scientific principles of treating participants humanely and ethically.

If an IRB deemed that it was unethical to conduct a study due to the design implications indicating that data could be obtained that did not originate from informed consent, then that study would not be able to be conducted and the research design would have to be modified to correct itself or that specific research methodology would be considered to be generally unethical by the wider scientific community, just the same as the scientific community deems it unethical to do genetic experiments on unwilling human subjects.

xpe · on June 25, 2021

>> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example. That is a non-sequitor, when we are discussing opting-in to social science research.

> That is a non-sequitor, when we are discussing opting-in to social science research.

As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.

Regarding the risk of data linkages, how confident are you that Mozilla and others with access to the data will manage it ...

1. ... up to the currently-accepted level of knowledge (including hopefully some theoretical guarantees, if possible, and if not, mitigations with known kinds of risk) and ...

2. ... that the current level is acceptable given that history of data privacy doesn't paint a rosy picture?

To be open, I'm not interested in your confidence level per se, but rather the reasoning in your risk assessment. I want to weight the various factors myself, in other words. For example, you appear to have more confidence in IRB's than I do.

Knowing the history of the "arms race" between deidentification and reidentification, I don't put a whole lot of trust in Institutional Review Boards. Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.

P.S. In my view, using "non-sequitor" here is a bit strong, perhaps even off-putting. It is only a "non-sequitor" because you are making different logical assumptions than the commenter. Another approach would be to say "your conclusion only holds if..." This would make your point without being so pointed. It also helps show that you want to understand the other person's assumptions.

9wzYQbTYsAIc · on June 26, 2021

> As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.

It appears that the parent commenter revised its content to indicate that the concern was indeed “your data getting mixed with my data, when browsing Facebook”, to paraphrase.

My response there was essentially: ethical review would have to determine if all data must be provided through informed consent of all the originating humans.

Held to the gold standard of ethics, an IRB would likely have to contraindicate a research design if it did not provide a way for every individual human involved to provide informed consent. If any single individual in a data set indicated that they did not consent, then that data set would need to be reshaped to not include that individual. In lieu of that, the entire data set would have to be excluded from study.

Of course, that has some complex implications, when it comes to broad categories of data sources for browser usage: social networking sites would be a minefield. Did the website author provide consent for their content to be machine analyzed for sentiment, etc., if one really wanted to get down to it. You’d have to consider each and every resource location. Can’t assume that all browser traffic is open web traffic - someone could have left their Rally extension running while navigating to a corporate confidential network, complex copyrights, etc.

My understanding is that the US Supreme Court is about to decide on whether “if you can read it, you can keep it” as a consequence of Microsoft/LinkedIn vs. hiQ Labs, so don’t forget the “arms race” of justice, either.

> Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.

Indeed, even just basic double-blind medical studies are hard to defend when you consider operational security, let alone information security.

9wzYQbTYsAIc · on June 25, 2021

Thank you for your PS feedback, it is appreciated and will be incorporated.

My overall point is that if you don’t want data being captured that may provide data about your contacts, then dont opt-in to providing it.

Informed consent is the bedrock upon which social science ethics rests.

xpe · on June 25, 2021

Sure, I understand your point. Have you dug into the problems of data linkage attacks? (see questions above)

9wzYQbTYsAIc · on June 25, 2021

Not yet! I’m vacuously familiar with the basics, but I’m curious about the details and their relation to research design.

Will comment more fully as I find the energy.

xpe · on June 26, 2021

In case it is of interest, here is a fairly short article with a short historical look at data de-identification. If nothing else, it is one jumping off point.

"Data De-identification: Possibilities, Progress, and Perils". 2019. https://forge.duke.edu/blog/data-de-identification-possibili...

jefftk · on June 25, 2021

Rally is entirely opt-in, so I think you're off the mark here?

mikeiz404 · on June 25, 2021

I see where you are coming from but it is opt-in so I don’t believe that is theft.

All scientific research requires data and certain types of research require certain types of data to be useful. I’m personally against ad driven data collection because 1) I don’t think as a whole its outcomes, both first order and second order, are in the long term interests of the viewers or society. However assuming you can trust the data collected and how it is used is only as stated then studies looking to understand online interactions, where more and more of our lives are being lived, with the interest of improving long term outcomes seems like a good thing to me. Of course once the data is out of your hands you loose control over it so it’s good that Mozilla is doing data minimization and aggregation to help reduce the impacts of that.

I guess it really boils down to trust, intent, our ability to choose, and transparency which has not been respected in many cases so I very much do understand the skepticism. Here is to hoping this will be different. So far, in my opinion, this looks to be the case.

Some of the current studies [1]:

- Political and COVID-19 News

- Your Time Online and "Doomscrolling"

Edit: I read your comment [2] in another thread about the privacy policy, and that is a good point. I sent an email to mozilla asking for clarification and if I get a reply I will add it here.

1: https://rally.mozilla.org/current-studies/

2: https://news.ycombinator.com/item?id=27633918

9wzYQbTYsAIc · on June 25, 2021

> start with the presumption that some people should get your data

Some people do want others to have their data.

Dropbox is a great example of a successful business that is built on the idea of sharing data.

Ensorceled · on June 25, 2021

Yes, I want to use Dropbox to share my invoice with my client and my client to share the assets from their designer with me.

I don't want anybody tracking what websites I visit, articles I read or facebook videos I watch.

These are two VERY different definitions of "data", mingling the two is not helpful to discourse.

Nicksil · on June 25, 2021

>I don't want anybody tracking what websites I visit, articles I read or facebook videos I watch.

This whole thing is opt-in, is it not? If you don't want to share your information, you simply do not opt-in.

Ensorceled · on June 25, 2021

I'm replying to the person who used Dropbox as an example of "Some people do want others to have their data." in order to clear up their confusion.

9wzYQbTYsAIc · on June 25, 2021

> These are two VERY different definitions of "data", mingling the two is not helpful to discourse.

Indeed, data has complex nuances that are sometimes missed.

Let’s not mingle data intended for advertisement with data intended for social science research.

Ensorceled · on June 25, 2021

> Let’s not mingle data intended for advertisement with data intended for social science research.

Well those are two different USAGES of the same type of data (tracking). That one is good and the other bad is a different discussion.

9wzYQbTYsAIc · on June 26, 2021

> That one is good and the other bad is a different discussion.

In my estimation, that is the current discussion.

grumblenum · on June 25, 2021

I read it as "we're protecting your privacy by increasing the number of people who can monitor your activity on our browser." Presumably, the users will be well-endowed and tax-advantaged institutions who could have just bought the information from data-aggregators anyway. I'm starting to see a theme of papering over their technology products with a lot of modern art and hyperbolic language.

"Computer scientists, social scientists and other researchers will be able to launch groundbreaking studies about the web and invite you to participate." Wow. I'm about to make history with a browser add-on!

"Our first study is “Political and COVID-19 News” and comes from the Princeton team that helped us develop the Rally research initiative." Groundbreaking! College students can now make sure that I am adequately fact-checked if I err from the path of truth.

jonathanmayer · on June 25, 2021

> Presumably, the users will be well-endowed and tax-advantaged institutions who could have just bought the information from data-aggregators anyway.

Nope. This is an important point: the type of crowdsourced science that Rally enables is something that researchers couldn't do before. (With the exception of a very small number of teams who made massive investments in building single-purpose crowdsourcing infrastructure from the ground up.)

ysavir · on June 25, 2021

Could you provide more detail on what makes it novel?

jonathanmayer · on June 25, 2021

Common research methods have significant limitations. Web crawls, for instance, usually don't realistically simulate user activity and experiences. Lab studies often involve simplified systems that don't generalize to the real world. Surveys yield self-reported data, which can be very unreliable.

Rally studies, by contrast, reflect real-world user activity and experiences. In science jargon, Rally enables field studies and intervention experiments with excellent ecological validity.

ysavir · on June 25, 2021

Thanks for clarifying! Makes sense.

A few follow up questions:

1. Do you expect the opt-in nature of these studies to impact their findings?

2. To compensate for the voluntary nature of the studies, do you think researchers in general will still be incentivized to find data sources that are less respectful of people's privacy and don't require an opt-in to the study?

jonathanmayer · on June 25, 2021

> 1. Do you expect the opt-in nature of these studies to impact their findings?

The Rally participant population is not representative of the U.S. population—these are users who run Firefox (other browsers coming soon), choose to join Rally, and choose to join a study. In research jargon, there's significant sampling bias.

For some studies, that's OK, because the research doesn't depend on a representative sample. For other studies, researchers can approximate U.S. population demographics. When a user joins Rally, they can optionally provide demographic information. Researchers can then use the demographics with reweighting, matching, subsampling, and similar methods to approximate a representative population. Those methods already appear throughout social science; whether they're sufficient also depends on the study.

> 2. To compensate for the voluntary nature of the studies, do you think researchers in general will still be incentivized to find data sources that are less respectful of people's privacy and don't require an opt-in to the study?

Rally is designed to provide a new research capability that didn't exist before. I don't expect a substitution effect like that.

ysavir · on June 25, 2021

Got it. Thanks Jonathan!

cycomanic · on June 25, 2021

Regarding 2. that would run afoul of many ethics boards at universities. Generally they require that (informed) consent has been given to take part in the study.

cpeterso · on June 25, 2021

> Rally studies, by contrast, reflect real-world user activity and experiences. In science jargon, Rally enables field studies and intervention experiments with excellent ecological validity.

Rally users are all opt-in. How does that impact the design of a Rally study and the conclusions you can draw from it?

9wzYQbTYsAIc · on June 25, 2021

Academic research in the social sciences is rigorously based on the concept of informed consent (i.e., opt-in), in the first place.

There would be no change in terms of research design and the ability to draw scientific conclusions.

edit: also, see https://news.ycombinator.com/item?id=27633212 for details on research design considerations when conducting social science.

analognoise · on June 25, 2021

Except as noted elsewhere, Mozilla also gets the data to "improve products and services" right?

So it sounds like a nice shiny cloak for...exactly the kind of data collection nobody actually likes.

Yay for extra steps?

yellowfish · on June 25, 2021

Mozilla has been known to be pretty iffy when it comes to 'opt in' ( the mr. robot tie in .. etc )

Nicksil · on June 25, 2021

>Mozilla has been known to be pretty iffy when it comes to 'opt in' ( the mr. robot tie in .. etc )

Did the instance you're referencing state it was opt-in then turn out to not be opt-in?

grumblenum · on June 25, 2021

Princeton can't buy data from aggregators? Wikipedia says they have a $26.6B endowment.

dhimes · on June 25, 2021

College students can now make sure that I am adequately fact-checked if I err from the path of truth.

I don't think they are "making sure" as in "enforcing." It sounds like they are observing and reporting, hopefully in an aggregated fashion.

DavideNL · on June 25, 2021

The answers seem explained quite clearly and are simple to find by just clicking the links in the article…

“what data”,for example: https://rally.mozilla.org/privacy-policy/

throwaway81523 · on June 26, 2021

Yep. "Privacy first" and "data sharing" don't belong in the same sentence.

joshspankit · on June 27, 2021

I was really excited for most of the time reading thinking “A privacy-first general data store that lets us donate insights for the common good, and may even end up being a marketplace where we can choose to sell data. This is so cool! I hope they have a mobile app so I can record things like what I ate for supper or how long I looked at a screen today”

But, it seems like they’re just tracking browser behaviour, and there’s absolutely zero place for people to add their own data?

Why bother giving it a branded name then? Why not “crowd-sourced browser behaviour study in partnership with ____”?

JumpCrisscross · on June 25, 2021

Mozilla strikes me as an organisation with too many resources for its own good. They keep pursuing these random side gigs with limited unifying theme other than an excuse to spend money.

tgsovlerkhgsel · on June 25, 2021

Reminder that donations to Mozilla are not donations to Firefox development. The Mozilla Corp (makes Firefox) and Mozilla Foundation (gets donations) are mostly separate.

xpe · on June 25, 2021

> Mozilla strikes me as an organisation with too many resources for its own good. They keep pursuing these random side gigs with limited unifying theme other than an excuse to spend money.

Why does this appear to be "random" to you? What is your definition of "random"?

Have you tried "putting on a Mozilla hat" and thinking about their priorities and goals? If you do, I think you'll likely see some connections. After you do this for a few minutes, I'm interested if perhaps you can see another point of view.

nonbirithm · on June 25, 2021

It makes me wonder about their management style occasionally, like taking over two decades to fix the issue with the accidental quit shortcut on desktop. People were celebrating that it was done, but I was wondering why it hadn't been fixed earlier if it was that important (I had accidentally triggered that shortcut myself at least ten times before).

Also reminds me that their mobile division seems painfully understaffed.[1] After using Firefox Mobile for a year there still isn't even a way to search your history, along with a dozen other pain points I put up with daily. And the most painful issues I bring up are either closed or never get addressed for months, or ever. It makes Chromium-based browsers even more tempting to me, and in cases I've had no choice but to use them for things like caching bugs with stale development assets, but Chrome is really all there's left if we lose Firefox.

I really, really hope that they do not sink the current iteration of mobile Firefox. When I see resources being used on projects like these, I just wish the more pressing issues were prioritized. Firefox is the face of Mozilla.

[1] https://news.ycombinator.com/item?id=26107676

9wzYQbTYsAIc · on June 25, 2021

Additional reminder that Mozilla just recently had to go through a round of layoffs.

https://news.ycombinator.com/item?id=24120336

errantmind · on June 25, 2021

The fact they have significantly more than 250 employees in the first place is a bit mind boggling. As far as I can tell most of the spending is unrelated to Firefox. What do all of these people do and why are they needed?

Nicksil · on June 25, 2021

>The fact they have significantly more than 250 employees in the first place is a bit mind boggling. As far as I can tell most of the spending is unrelated to Firefox. What do all of these people do and why are they needed?

I sometimes have the same reaction when learning the number of people employed for what I would have otherwise thought to be some meager thing. Then I get reminded about the folks over in external/internal support; marketing; design; IT/infrastructure; legal; compliance; etc.

Here's Mozilla's listing:

https://careers.mozilla.org/listings/

errantmind · on June 25, 2021

I understand there are a lot of roles needed for non-trivial organizations to function, but Mozilla has over 1000 employees and, in my opinion, does not deliver a corresponding level of value. I would actually donate if I could be sure my money is going to Firefox. I do not care about any of their other initiatives.

9wzYQbTYsAIc · on June 26, 2021

Without Mozilla documentation on web standards, there won’t be much of an open web left.

That’s the bulk of the team that got layed off, by the way. I think there was some last-ditch effort that was able to preserve the MDN efforts.

9wzYQbTYsAIc · on June 25, 2021

> Whoever wrote this piece is way too close to... whatever it is Mozilla is doing here.

It does appear that this piece was written by Mozilla, themselves… on their own blog.

xpe · on June 25, 2021

>> Whoever wrote this piece is way too close to... whatever it is Mozilla is doing here.

> It does appear that this piece was written by Mozilla, themselves… on their own blog.

Haha! Captain Obvious strikes again. Joking aside, I think the other commenter knew the post was written by someone at Mozilla on the Mozilla web site. Their point, I think, was that the author didn't do a good job of selling Rally.

9wzYQbTYsAIc · on June 26, 2021

Heh, nice one.

Could be, yeah.

My estimation is that Mozilla was making a desperately concerted effort to signal that they are not at fault in the Big Tech wars.

Yaina · on June 25, 2021

It's not easy to understand, especially from the blog post alone, but as far as I understand it, the proposition is the following: Big companies have the ability to do research on users all the time, either by doing anonymous studies or by tracking you for their ad networks.

This is a luxury many researchers that work outside of these big tech companies don't have, which creates a scientific power imbalance. Mozilla Rally is meant to give these capabilities to everyone, and the platform is meant to ensure that you always know what you sign up for and what data is being used.

If I understand the Princeton example correctly: They want to figure out how people consume and spread misinformation. Social networks like Facebook have all that data but won't share it. Now you can opt-in to a Rally study where independent researchers can examine the data.

jonathanmayer · on June 25, 2021

> This is a luxury many researchers that work outside of these big tech companies don't have, which creates a scientific power imbalance.

The power imbalance goes far beyond science. Independent research is foundational for platform accountability. An example: when I was working on the Senate staff, before I started teaching at Princeton, a recurring challenge was the lack of rigorous independent research on platform problems. We were mostly compelled to rely on anecdotes, which made oversight and building a factual record for legislation difficult.

9wzYQbTYsAIc · on June 25, 2021

I’m curious as to your take on independent scholarship, outside of the domain of academia?

Would appropriately rigorous independent scholarship be considered as a trustworthy source within your sphere?

jonathanmayer · on June 25, 2021

> Would appropriately rigorous independent scholarship be considered as a trustworthy source within your sphere?

Definitely. Academia doesn't have a monopoly on excellent technology and society research. The Markup's data-driven investigative journalism, for example, is outstanding.

9wzYQbTYsAIc · on June 25, 2021

> Now you can opt-in to a a Rally study where independent researchers can examine the data.

It would have been great if they’d invested the resources to use Solid, here.

xpe · on June 25, 2021

Thanks for mentioning this. Two questions:

1. In your experience, what is the maturity level of Solid?

2. Would you mind sketching out how you would do the integration with Solid? I'm reading over https://solidproject.org/users/get-a-pod but haven't spend a lot of brain cycles on it yet.

9wzYQbTYsAIc · on June 26, 2021

You’re welcome!

I’m in the same boat as you - haven’t spent many brain cycles on it yet.

Mozilla Rally would have been the perfect proof-of-concept for putting that idea to use and giving it a shakedown.

Generally, Solid seems to be pretty grounded in its attempt to resolve data sharing privacy concerns.

My understanding is that the implementations would be about the same as using browser storage.

I believe developers would need to approach data access from the perspective of “an infinitely sharded document database - shards are globally-uniquely identifiable - shards are remotely distributed - shards have their own IAM - for each shard, you must register as an authorized user and authenticate to access remote data”

gopiandcode · on June 25, 2021

Looking through the FAQ, It seems like Rally requires users to send their raw data straight to the aggregation service, with the only privacy guarantees being that the data is encrypted during transport, and a "promise" that they will run internal audits to make sure private data isn't released from their servers.

IMO this seems to provide worse privacy than even Google and Micro$oft's telemetry, which at least use differential privacy to make sure that each individual's privacy is somewhat protected (the data you send is randomised so even if the aggregator is compromised by a malicious third party (e.g. NSA) individuals have some degree of plausible deniability).

Sure, Mozilla's intentions may be more "pure" (or is that just their propaganda speaking?), but in terms of privacy guarantees this seems like it is a strict downgrade, that abuses their goodwill to hide its deficiencies.

jonathanmayer · on June 25, 2021

> with the only privacy guarantees being that the data is encrypted during transport, and a "promise" that they will run internal audits to make sure private data isn't released from their servers.

There's much more than that, including: privacy and security review before a study launches, a data minimization requirement, a sandboxed data analysis environment with strict access controls, and IRB oversight for academic studies.

> IMO this seems to provide worse privacy than even Google and Micro$oft's telemetry, which at least use differential privacy to make sure that each individual's privacy is somewhat protected (the data you send is randomised so even if the aggregator is compromised by a malicious third party (e.g. NSA) individuals have some degree of plausible deniability).

The vast majority of Google and Microsoft telemetry does not involve local differential privacy. Google, in fact, has almost entirely removed local differential privacy (RAPPOR) from Chrome telemetry [1].

We've been examining the feasibility of local differential privacy for Rally. The challenge for us—and why local differential privacy has limited deployment—is that the level of noise makes answering most (often all) research questions impossible.

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=101690...

FlyingLawnmower · on June 25, 2021

Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?

E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."

It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.

jonathanmayer · on June 25, 2021

> Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?

Yes. Central differential privacy is a very promising direction for datasets that result from studies on Rally.

> It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.

I've done a little re-identification research, and my faculty neighbor at Princeton CITP wrote the seminal Netflix paper, so we take this quite seriously.

skybrian · on June 25, 2021

Interesting. I can see that RAPPOR seems to be deprecated in favor of something else called ukm (Url-keyed metrics) but not why this change is being made. Is there somewhere I can read more about it?

jonathanmayer · on June 25, 2021

I am not aware of any public announcement or explanation. Which is... probably intentional, since Google is removing a headline privacy feature from Chrome.

skybrian · on June 25, 2021

How did you learn about it? By studying the code?

jonathanmayer · on June 25, 2021

Our team looked closely at the Google, Microsoft, and Apple local differential privacy implementations when building Rally. It helped that we have friends who worked on RAPPOR.

skybrian · on June 25, 2021

Did you end up using differential privacy in Rally? What's the thinking behind this?

quadrifoliate · on June 25, 2021

I like that you can run your own versions of these studies, e.g. https://github.com/mozilla-rally/rally-study-01 to collect your own data. I think it would be a novel idea to make this the primary focus of the project, enabling users to see and understand their own data before donating it to research projects.

For example, I know that the study above is called "Your Time Online and 'Doomscrolling'", but I don't have much of an idea on how they plan to quantify doomscrolling via attention or audio events. I managed to get a reasonable idea from https://rally.mozilla.org/current-studies/your-time-online-a... and https://github.com/mozilla-rally/rally-study-01/blob/master/... though, so it's possible I just need to put in more effort as a user :)

9wzYQbTYsAIc · on June 25, 2021

> I don't have much of an idea on how they plan to quantify doomscrolling via attention or audio events.

It seems that what they’ve done is told us exactly what data they will collect but have left it unanswered as to what specific social science methodologies they will use to analyze the data.

It is very likely that their intention is to not set that expectation with the data crowdsourcers so that the researchers have the flexibility to adjust their methodological approaches in an iterative fashion.

swiley · on June 25, 2021

So they don't have enough manpower to maintain features people actually want (like the JavaScript toggle in settings which Chrome apparently has) but they do to make things no one will use that will get shut down in a few years?

print_goto_ten · on June 25, 2021

Odds are its funded by whomever is doing the study and is unrelated the browser at all.

hosteur · on June 25, 2021

If that is the case then it should be done as an extension. And why would Mozilla use its brand capital on this?

bilkow · on June 25, 2021

It is an extension:

https://addons.mozilla.org/en-US/firefox/addon/mozilla-rally...

https://github.com/mozilla-rally/rally-core-addon/

WA · on June 25, 2021

Surprisingly difficult to understand what this thing is or does. Seems like you can "donate" your browsing behavior to science for doing studies in a kinda anonymous privacy protecting way.

kevincox · on June 25, 2021

It doesn't even sound anonymous. It sounds like they just audit that the partner aggregates the data once delivered to them. Some sort of pinky-promise.

9wzYQbTYsAIc · on June 25, 2021

> Some sort of pinky-promise.

Nope.

> * research partners are contractually obligated to abide by these procedures and protect your data.*

https://rally.mozilla.org/how-rally-works/faqs/

caconym_ · on June 25, 2021

But, realistically, what recourse do users have if there's a breach?

A few years ago I got a letter from Washington State University saying that they'd obtained my personal health data (which I never directly consented to, nor had I interacted with WSU in any way prior to receiving this latter) and subsequently allowed it to be stolen by unknown malicious actors. In case you're curious, they kept the data on an unencrypted hard drive in a physical safe that was then physically stolen.

There was some class action pittance that meant nothing to me, and WSU does not seem to have been subject to any meaningful consequences. It seems to be viewed as a cost of doing business sort of thing. For all of us who had their data stolen, the horse has left the barn, and I see no real deterrent effect. This seems to be the norm when data breaches happen.

So while "pinky promise" might be a bit hyperbolic, there is a lot of truth to it in general and I don't know how this case is supposed to be any different. If there is some paradigm-breaking accountability mechanism built in, I'd love to hear more about it.

9wzYQbTYsAIc · on June 26, 2021

> There was some class action pittance that meant nothing to me, and WSU does not seem to have been subject to any meaningful consequences.

Class action suits, from my understanding, are about 1) compensating the initial plaintiffs, 2) setting an appropriate punitive damage to redress the harm to society, and 3) distributing that punitive damage in an equitable fashion.

You are benefiting from 2) and 3) as a claimant in a class-action. 2) is where the meaningful consequence happens, in my estimation.

Ensorceled · on June 25, 2021

Lots of abuses of data are "contractually" forbidden.

9wzYQbTYsAIc · on June 26, 2021

Contracts are legal constructs that ultimately have the threat of force of violence through law enforcement as their recourse.

nebulous1 · on June 25, 2021

I agree. I haven't read beyond the linked page but my take is that it doesn't do anything about the data that you're already "sharing", but provides an additional mechanism for you to share data with specific parties in a controlled manner.

All this seems reasonable if that's the case, but the page is misselling it.

9wzYQbTYsAIc · on June 25, 2021

> I haven't read beyond the linked page

Then you haven’t yet properly informed yourself.

> All this seems reasonable if that's the case, but the page is misselling it.

There are limits to how much info you can fit into one page. Marketing pages are built in such a way that they guide you to where the detailed pages are.

Ensorceled · on June 25, 2021

>> I haven't read beyond the linked page

> Then you haven’t yet properly informed yourself.

Then why have the linked page. If it's literally not informative enough than it's just a poorly written press release.

Nicksil · on June 25, 2021

>Then why have the linked page. If it's literally not informative enough than it's just a poorly written press release.

Usually marketing pages will include just enough information and imagery to grab your attention. There will be some links embedded which ultimately lead to more and more information.

Basically, if you're interested in learning more, you'll be motivated to read/click further. If it doesn't interest you, you leave having taken a fraction of the time than had the page been filled with all the details.

Ensorceled · on June 25, 2021

I guess what I'm getting at is: if article does not provide enough background to even be able to participate in a discussion about the subject, it's a bad article.

9wzYQbTYsAIc · on June 26, 2021

The common law may no longer be “you have to read the fine print”.

9wzYQbTYsAIc · on June 25, 2021

> Surprisingly difficult to understand what this thing is or does.

Social science is a hard science, in the sense that it is very difficult to comprehend the technical details without adequate training.

> Seems like you can “donate” your browsing behavior to science

More than that, you can donate the following, as well:

> Your Rally demographics and an optional short survey

For the COVID study, for example.

https://rally.mozilla.org/current-studies/political-and-covi...

errantmind · on June 25, 2021

Just because statistics is involved doesn't make Social Science a 'hard' science

9wzYQbTYsAIc · on June 26, 2021

Indeed.

> Softer fields are likely to be found, even if perhaps at lower frequencies, in the physical and the biological sciences. Conversely, there is no reason why high-consensus fields should not exist in the social sciences, too.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694152/#!po=0....

bjt2n3904 · on June 25, 2021

Whenever someone says they want to study anything relating to "misinformation", it's almost a guarantee they're looking to solve human problems with technical solutions.

Oh look: "Rally has the potential to help address societal problems we could not solve before."

No, I don't think it does. Especially when the first thing you tout is a study on "misinformation about COVID".

The fact that American's are locked in a bitter political cold war with each other is not a technological problem, and does not have a technological solution.

It's like trying to help a couples on the brink of divorce by updating their phones with new firmware, and changing how Facebook's algorithm works. The reason the couple is threatening divorce is 1% what happens ON their phones, and 99% what happens off their phones.

We have to break away from this stupid idea that if only we studied what was happening on Twitter and Facebook we could fix our problems.

xpe · on June 25, 2021

> Whenever someone says they want to study anything relating to "misinformation", it's almost a guarantee they're looking to solve human problems with technical solutions.

What in your experience motivates you to say this?

errantmind · on June 25, 2021

Not the OP but I think it is because of a lack of trust in institutions, due to an escalating culture war and tribalism.

As an educated person interested in facts and science, there is a surprising lack of due diligence in the media around a lot of the current hot topic issues (not going to name them). Every time I do my own research on one of these issues, it is obvious the media and our institutions are more interested in supporting an official narrative than diving deep into the facts. Opposing opinions are just not 'allowed' anymore in some areas. I personally don't even trust people using the word 'misinformation' anymore, as it has been so abused

bjt2n3904 · on June 25, 2021

> lack of trust in institutions, due to an escalating culture war and tribalism

Precisely. Whenever someone talks about "disinformation" as being the cause of this, it would be insulting, if it weren't so detached from reality and Orwellian.

"Gosh, these dumb people are reverting to tribal behavior because they're getting lied to! The real problem is they can't think for themselves -- we just need to figure out how to tweak the algorithm so the disinformation doesn't spread, and they get fed the right truths. Then we'll have harmony again!"

People need to get off their computers, and stop trying to understand (and fix) society through Twitter.

9wzYQbTYsAIc · on June 25, 2021

> We have to break away from this stupid idea that if only we studied what was happening on Twitter and Facebook we could fix our problems.

Analogously, to demonstrate absurdity, consider this statement: we have to break away from this stupid idea that if we studied what was happening on our crop fields and our roads we could fix our problems with agriculture and traffic.

yyyk · on June 25, 2021

This is exactly the data a modern Cambridge Analytica would want (e.g. shares, time spent on each post, all correlated to demographics). I hope this platform has controls to ensure that study data isn't misused post-study for non-study purposes, because the FAQ answer isn't so encouraging[0].

[0] https://rally.mozilla.org/how-rally-works/faqs/#what-happens...

"With Mozilla’s permission, researchers may retain aggregated or de-identified datasets for their analyses. Mozilla may also retain aggregated data sets which we may release in the public good to foster an open web."

Shouldn't you ask the users for permission on using their aggregate data for purposes that could be different to the study they enrolled to?

9wzYQbTYsAIc · on June 25, 2021

> Shouldn't you ask the users for permission on using their aggregate data for purposes that could be different to the study they enrolled to?

They do, given that you’d have been informed of that before you provided your consent by opting in to Rally data crowdsourcing.

> This is exactly the data a modern Cambridge Analytica would want

Kind of nice that it’d be in the hands of Mozilla, Princeton University, other trusted research partners, and the open web, isn’t it?

yyyk · on June 25, 2021

>They do, given that you’d have been informed of that before you provided your consent by opting in to Rally data crowdsourcing.

I haven't tried registering for the study, but I note that an FAQ answer is not enough. Users cannot be expected to search for the FAQ. This needs to be mentioned in the consent form, and not done implicitly 'by opting in' like corporations do.

>Kind of nice that it’d be in the hands of Mozilla, Princeton University, other trusted research partners, and the open web, isn’t it?

The 'open web' is a bit of a nebulous concept, and the CA data was supposedly in the hands of a data scientist at the University of Cambridge.

lawl · on June 25, 2021

> Kind of nice that it’d be in the hands of Mozilla, Princeton University, other trusted research partners, and the open web, isn’t it?

No? It'd be much better to not collect that data at all.

9wzYQbTYsAIc · on June 26, 2021

> It'd be much better to not collect that data at all.

That may not be possible to enforce without shuttering the entire computer networking infrastructure of the world and rebuilding it on new computer science foundations.

errantmind · on June 25, 2021

>What if, instead of companies taking your data without giving you a say, you could select who gets access to your data and put it to work for public good?

Implied false dichotomy. What if I refuse to use companies that do not fully encrypt my data in such a a way as to them not having any access to it.

Also, Mozilla isn't going even half this far. They are relying on a contractual obligation with their partners and internal audits to protect my identity. I'm skeptical of their ability to protect my data to the point I will never use this service.

brynjolf · on June 25, 2021

So they are practising on collecting more data but with the motivation that it is somehow for the good of society? It is still doing the one thing I don't want Mozilla to do, embrace data collecting, no matte the banner or motivation.

Also universities through history have done numerous unethical studies, they don't automatically gain my trust or data because they pinky promise.

mhalle · on June 25, 2021

I think this is an interesting idea, but I don't know how well it scales.i think it is a high bar for individuals to explicitly manage every single donation of personal data.

What I have thought might work is that individuals could donate their data to a non-profit entity that would have a legal responsibility to protect contributors' data. The non-profit would act as a data broker for researchers. It could sell aggregate data, but the sales would go to charitable causes. Contributors could opt out of specific research or specify specific charities.

The result is a little like a non-profit data brokerage. It would put experts in charge of data protection and confidentiality, and it explicitly acknowledges that data has value. The broker would have legal recourse and financial means to track down data abuse.

vcavallo · on June 25, 2021

What’s up with all the black flags and raised fists?

hndirect · on June 25, 2021

Sorry if this is too off topic, but is there a name for the art style they are using in the illustrations? It seems to be all over corporate sites these days.

Nicksil · on June 25, 2021

"Corporate Memphis," perhaps?

Previous HN discussion: https://news.ycombinator.com/item?id=27107820

Wikipedia: https://en.wikipedia.org/wiki/Corporate_Memphis

Edit: I had this open in a tab for ~20 mins. which is why I didn't see caseyohara's reply before submitting mine. Please excuse the redundancy.

caseyohara · on June 25, 2021

That design style is called "Corporate Memphis" (https://en.wikipedia.org/wiki/Corporate_Memphis) and you're right it is pervasive.

Here's a YouTube video about it if you're curious: Why do "Corporate Art styles" Feel Fake? https://www.youtube.com/watch?v=lFb7BOI_QFc

alphabet9000 · on June 25, 2021

used to be a nice twitter blog documenting the style, but it got suspended at some point: https://web.archive.org/web/20190426022335if_/https://twitte...

RileyJames · on June 25, 2021

Interesting. I spent the past two days building a prototype which solves this problem for a specific use case “sharing search results”. Going to Show HN in a few days.

This article is light on technical details, or even a high level how it works.

I wonder how they solve the authenticity / validity of the data.

Funnily enough the solution I’ve come up with isn’t compatible with Firefox due to api limitations.

lykahb · on June 25, 2021

I think that it is important to create a precedent of mass data collection that reveals to the users what is collected and gives tools to manage it. The legislators can use it as a blueprint to require a similar level of disclosure from other companies that do online surveillance. However, I do not see what value Rally offers to the enrolled users.

miked85 · on June 25, 2021

I am failing to see the motivation for anyone to use this. If you cared about privacy, you wouldn't share your data in the first place. If you didn't care about privacy, there is still no reason to opt in.

errantmind · on June 25, 2021

Agreed. I'm in the former group.

You aren't getting my data, stop trying. That may seem harsh, but that is where I'm at. I'm really put off by the idea of sharing my data these days. I don't trust Mozilla's internal audit process to keep it safe, but it isn't specific to them. Outside of maybe a two companies who are 'encryption-first', I don't trust any of them

renewiltord · on June 25, 2021

Yeah, I’m big on this stuff because I like scientific studies to be biased towards me. So like, imagine there is some treatment for a disease that only works on people who live in San Francisco.

My ideal outcome is that the study is accidentally biased to serve us and that all of America pays for this medicine.

Ideally, all outcomes are tailored to me through sheer participation bias.

Sure sure, everyone knows that sampling bias exists but there is a systemic human bias to what has been measured. I intend to exploit that.

And as for my data, Equifax got the most sensitive stuff so I’m not worried about anything else.

9wzYQbTYsAIc · on June 25, 2021

> The motivation is enabling crowdsourced scientific research that benefits society.

https://news.ycombinator.com/item?id=27630188

lawl · on June 25, 2021

> crowdsourced scientific research that benefits society.

how, in your words, would the covid19 news misinformation study benefit society?

9wzYQbTYsAIc · on June 26, 2021

One example, without having read further into that particular study, but knowing what misinformation research tries to do, generally, would be the following:

Identifying the specific mechanisms behind the dark patterns of social engineering.

There is a body of research, for example, that considers the spread of rumors by making use of the same mathematical models that epidemiology uses to model the spread of infectious disease (virality, in particular).

As I recall, that body of research led to the conclusion that the best weapon against misinformation may be the liberal and broad application of truth-telling.

renewiltord · on June 25, 2021

Whatever. You can have my data for scientific purposes but if you fuck this up I will hunt down the PI and smear shit on nis door.

You better keep the PI in America because I’m not doing an intercontinental to smear shit.

I’m not reading your long ass blog post though. Make it a Chrome extension and stick the CTA in the first two sentences.

neighbour · on June 26, 2021

Tried reading through the page. No idea why it exists or what it even does. Why would anyone use this?

notjustanymike · on June 25, 2021

Corporate Memphis strikes again!

fnord77 · on June 25, 2021

so, only approved research partners gets the ability to post studies and access data?

9wzYQbTYsAIc · on June 25, 2021

> What is the criteria for becoming a partner? Currently, we are working with a small number of reputable research teams who match our commitment to your safety and trust and who also have the technical expertise to help us build Rally’s scientific capabilities.
All partners are required to sign an agreement with Mozilla. This agreement upholds the researcher’s rights to independence in their research, while mandating the safeguards they must follow in order to protect our users.

In the future, we will expand beyond our pilot partners. Please contact us if you are interested in becoming a partner!

https://rally.mozilla.org/how-rally-works/faqs/#partner-crit...

grayrest · on June 25, 2021

Is this Mozilla Test Pilot with data sharing with outside researchers?

tokai · on June 25, 2021

Nothing on if studies based on this will be published as Open Access. I can see that Jonathan Mayer's latest publications are all published without OA so that doesn't bode well for the covid news study. And ironically are the 'Beyond the Paywall' researchers no better at publishing Open research.

alexshendi · on June 25, 2021

Am I the only one who has read "Piracy First"?

Iwan-Zotow · on June 25, 2021

MeToo

thought, finally, us, pirates, will have good sharing platform

Iwan-Zotow · on June 25, 2021

read it as "piracy-first data sharing platform"

grawprog · on June 25, 2021

>Your data is valuable.

Yet again though we're being asked to give it up for free...

>But for too long, online services have pilfered, swapped, and exploited your data without your awareness. Privacy violations and filter bubbles are all consequences of a surveillance data economy. But what if, instead of companies taking your data without giving you a say, you could select who gets access to your data and put it to work for public good?

So taking and benefitting from my data is alright as long as I'm aware of it?

>But, being “data-empowered” also requires the ability to choose who you want to access your data.

No...being data empowered would mean I'm profitting off my own data, not giving it to companies to profit off of.

>We’re kickstarting the Mozilla Rally research initiative with our first two research collaborator studies. Our first study is “Political and COVID-19 News” and comes from the Princeton team that helped us develop the Rally research initiative. This study examines how people engage with news and misinformation about politics and COVID-19 across online services.

And the first study in this is about politics, that great word again 'misinformation' that gets tossed around everywhere.

So, the first study will be using user data on the news they read and how they interact on social media in regards to politics for what purpose exactly?

This makes the world better how?

>Soon, we’ll also be launching our second academic study, “Beyond the Paywall”, a study, in partnership with Shoshana Vasserman and Greg Martin of the Stanford University Graduate School of Business. It aims to better understand news consumption, what people value in news and the economics that could build a more sustainable ecosystem for newspapers in the online marketplace.

So, the second study is 'how do we make news organizations more money?"

I'm kind of noticing a pattern here...

>With Rally, we’ve built an innovative, consent-driven data sharing platform that puts power back into the hands of people.

No, you've found a way to convince people they should give you their data for 'research that benefits the world' that's in reality thinly veiled market research.

It would be nice if a company could just be fucking honest for once.

This whole blog post should just be:

"Look, at Mozilla we realized we can make money collecting data, we're going to pretend we couldn't just take it from you and give you the option to opt in to make us money."

Seriously, just fucking say it straight.

denton-scratch · on June 25, 2021

[flagged]

smitty1e · on June 25, 2021

But if your knowledge tree falls in the forest, and no one hears of it, then how do the data sound?

Somewhere between hermit-style hoarding and totally wide open there may be a collaborative sweet spot.

_4fwd · on June 25, 2021

Why? Seems to me I get absolutely nothing from this other than a wink from Mozilla that my data can't identify me.

9wzYQbTYsAIc · on June 25, 2021

> The motivation is enabling crowdsourced scientific research that benefits society.

https://news.ycombinator.com/item?id=27630188

luke2m · on June 25, 2021

[flagged]

Nicksil · on June 25, 2021

>Disclaimer: will be killed in a few months

You're thinking of Google. This is Mozilla.

luke2m · on June 25, 2021

FF Send, FF Notes, Positron, FF OS, Appmaker, and Deuxdrop to name a few. They tend to keep more stuff around than Google, but still cancel things that I like frequently.

lawl · on June 25, 2021

> You're thinking of Google. This is Mozilla.

Well, it is really difficult to tell them apart these days. Mozilla does like to copy everything google does, if you look at firefox.

Nicksil · on June 25, 2021

>Well, it is really difficult to tell them apart these days. Mozilla does like to copy everything google does, if you look at firefox.

Indeed similarities exist. Still, no where close.

zenlf · on June 25, 2021

[flagged]

9wzYQbTYsAIc · on June 25, 2021

> sounds like double speak

That’s rather cynical.

What’s wrong with someone wanting to share their data with certain people while also demanding that those people respect the data’s privacy?

errantmind · on June 25, 2021

You keep accusing people of being nihilistic / cynical in this thread. It isn't cynical to be extremely skeptical of any company's ability to keep your data private and safe given the constant barrage of breeches and ethical violations.

Maybe I'm in the minority but Mozilla's brand had been permanently tarnished in recent years and I do not think of them as a 'privacy first' organization

9wzYQbTYsAIc · on June 26, 2021

> It isn't cynical to be extremely skeptical of any company's ability to keep your data private and safe

By dictionary definition, cynicism is an attitude of scornful or jaded negativity, especially a general distrust of the integrity or professed motives of others.

Skepticism is defined as the doubt as to the truth of something.

Extreme doubt as to the integrity of a company’s data privacy practices is the dictionary definition of cynicism.

9wzYQbTYsAIc · on June 26, 2021

> I do not think of them as a 'privacy first' organization

I won’t pry, but I suspect that you are not a web developer, by trade.

It is common knowledge in the web development field that Mozilla is a, if not the, standard bearer for consumer privacy protection in the browser wars.

Safari is currently taking the helm, though.

9wzYQbTYsAIc · on June 26, 2021

> Mozilla's brand had been permanently tarnished in recent years

Indeed, Google may have out-competed them on the marketing messaging front.

zenlf · on June 26, 2021

There is nothing wrong with it.

Like I mentioned, there are perfect legitimate reasons and need to do it.

However, the best privacy practice is to never share it. If you share, then it is no longer privacy first. The goals which one wants to achieve come first. I'm not saying it will not have adequate privacy protection. But it's not "privacy first" because it introduces the risk to someone's privacy for the benefit of something. That something comes first.

9wzYQbTYsAIc · on June 27, 2021

> But it's not "privacy first" because it introduces the risk to someone's privacy for the benefit of something. That something comes first.

It certainly isn’t “privacy last” or “privacy later”, either, shall we call it “privacy enforced” or “privacy now”?

ttt0 · on June 25, 2021

[flagged]

visarga · on June 25, 2021

They were talking about Trump's attack rally, and this was published on 8th of Jan.

ttt0 · on June 25, 2021

[flagged]

9wzYQbTYsAIc · on June 25, 2021

> Technology is technology. It doesn't care about morality and what's right or wrong.

That is rather uninformed. There is a growing body of research on embedding ethics into technology.

https://link.springer.com/article/10.1007/s00146-020-01010-1

ttt0 · on June 25, 2021

[flagged]

9wzYQbTYsAIc · on June 25, 2021

That’s rather nihilistic.

How exactly are academic researchers supposed to guarantee that their research will be applied for the good of humanity?

Government is you and me and everyone else.

If you want government to care about it, write to them and tell them to care about it.

They probably aren’t going to care about random comments on HN.

Research on ethical machines has been taking place since the dawn of the drone age and has already guided military policy. You can find the supporting evidence of that for yourself if you search for it.

errantmind · on June 25, 2021

Not the OP, but no, the government doesn't and can't represent me. We are systematically locked into a two party system due to winner-takes-all voting in the USA and you seem to think that two (very similar) parties are enough for me to have representation. They aren't.

It also isn't nihilistic to have a very high bar for sharing any data these days. I personally am beyond considering sharing my data and go to great lengths to keep my important data safe. I don't trust anyone with it.

9wzYQbTYsAIc · on June 26, 2021

I would refer you to the political theory of John Locke, et al., in particular the consent of the governed.

Additionally, see the UN Human Rights declaration for the most modern (as far as I am aware) descendant of that political theory.

UN Human Rights declaration has as its immediate political theory ancestor the US Declaration of Independence.

chobytes · on June 25, 2021

This is an awfully generous way to present what looks like yet another in-browser advertising platform.