Whoever wrote this piece is way too close to... whatever it is Mozilla is doing here. There seems to be an assumption that users will be gleeful to throw their data at legitimate researchers from legitimate institutions doing legitimate work. What "data"? Browsing history? Identity? Something else? Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.
As a nerd I can read between the lines—clearly they have come up with some sort of privacy-preserving data collection system that is useful. But at face value this whole article is just saying "hey use Firefox and give your data to scientists for reasons we don't bother to explain because obviously it's good."
Princeton research collaborator here. Glad to answer questions about Rally.
> What "data"? Browsing history? Identity? Something else?
That depends on the Rally study, since research questions differ and studies are required to practice data minimization. Each study is opt in, with both short-form and long-form explanations. Academic studies also involve IRB-approved informed consent. Take a look at our launch study for an example [1].
> Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.
The motivation is enabling crowdsourced scientific research that benefits society. Think Apple Research [2], NYU Ad Observatory [3], or The Markup's Citizen Browser [4]. There are many research questions at the intersection of technology and society where conventional methods like web crawls, surveys, and social media feeds aren't sufficient. That's especially true for platform accountability research; the major platforms have generally refused to facilitate independent research that might identify problems, and platform problems often involve targeting and personalization that other methods can't meaningfully examine.
These "This Study Will Collect" and "How We Protect You" sections are really good. It probably wouldn't convince me personally to sign up, but it's as comprehensive as I would expect. It's a shame that these comments didn't make it into the blog post.
I think that the motivation of 'enabling citizen science' is not a very strong one. You will get very, very skewed results, moreso than typical WEIRD, if you conduct studies on the people for whom that is sufficient motivation.
A stronger motivation would be providing a product or service that tangibly adds value to someone's life.
After reading this, I have no idea how Rally would provide any tangible benefits to me.
Exactly. It is so weird to see all this marketing speak that makes it sound like users can get to benefit from something, but in the end this is just something that gets people to work and provide data for free to multi-billionaire universities.
We don't any more studies or research to know that the best privacy policy is to not collect any data in the first place.
I know you mean well but I think you completely missed the above commenters point.
You've replied here with answers to address their (our?) potential concerns, but the commenter never said they had concerns about the project itself, rather that this particular blog post doesn't "sell" or explain the value add well. That's feedback on the project's communication strategy, not on what it's actually doing.
> > Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.
> The motivation is enabling crowdsourced scientific research that benefits society.
You seem to be confusing "theys". The question is what motivates participants, not what motivates researchers.
But if the participants are limited to people who are motivated solely by participating in research, wouldn't that add significant bias to that research?
Nonetheless, much of psychology research conducted in the US has made do with ridiculous sampling bias - the US college student is anecdotally considered to be the most-studied population in the world.
Personally I don't think that researchers have any more business doing this kind of surveillance than Google and company do.
The idea that this will benefit society seems naive to me. I feel like it will only serve to legitimize the practice by putting ostensibly trustworthy faces on the packaging.
Not just surveillance, but conducting research within corporate platforms. Therefore, they would have access to my data and a corporation's engine. If I think that google knows too much about me, do I get to opt-in whether that hyper-knowledge is shared to researchers (because I won't).
I hate this. Viscerally. Why, why, WHY does every “privacy first” system, platform, whatever, start with the presumption that some people should get your data and it’s just a matter of vetting the “good” groups from the “bad” groups.
No. That isn’t privacy.
No one should get my data. And it’s ridiculous that all of these companies try to position their data grabbing projects as “privacy” oriented when what they really mean is they’re not quite as invasive and/or are slightly more transparent about their data theft compared to others.
I understand this is a sensitive area, and I understand that reasonable people have good reasons to be concerned. But it seems that you are directing your frustrations at Mozilla unfairly.
I find this to be a good definition of privacy:
> Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. - Wikipedia
From what I've read, Mozilla's Rally gives people the ability to choose what research studies to participate in.
I think there is plenty of discussion to be had about what level of control and granularity works best for different people, but I have confidence that Mozilla has both the right incentives and technical capability to contribute meaningfully in this space.
Sorry, I definitely did not put as much thought into my comment as I should have and I left out the critical piece of information that really ticked me off. I'm one of the nerds who actually reads the privacy documents and Rally's privacy policy[1] has a section titled "How We Use Your Information" that includes:
> improving Mozilla’s existing products and services
> creating and developing new products
All of the marketing copy is about "donating" your data to important research and how "Big Tech has built its success by exploiting your data." Meanwhile Mozilla is doing the exact same thing they're criticizing "big tech" for doing. Tucked away in the fine print is the fact that your data _isn't_ just going to be used for research studies, it's going to be exploited by yet another for-profit tech company. They've just put a nice warm and fuzzy do-gooder wrapper on it.
If Rally is transparent about how your data is used like they claim to be they would either (1) not use your data in that way and exclusively allow the data to be used for research, as advertised, or (2) make it abundantly obvious it will be used that way.
>> improving Mozilla’s existing products and services
>> creating and developing new products
I agree that these are concerning. They seem out of place. If you want to start a petition asking Mozilla to clarify and/or remove these clauses, I would sign it.
I almost opened an issue on their GitHub[1] (one of their privacy-related documents invites people to "call them on it" if you have privacy concerns) but I decided against it because I worried about harassment from sleuthing HN readers finding me on other platforms. Such are the compromises you make as a lady on the internet sometimes.
>but I decided against it because I worried about harassment from sleuthing HN readers finding me on other platforms. Such are the compromises you make as a lady on the internet sometimes.
You make it easy for the internet assholes to do this to you. If your HN username is your real name that is a really big problem for your privacy. Your occupation is stated in your profile. Likewise, stating your sex might as well be your privacy's death knell.
Become more anonymous to provide less ammunition to those with nothing more to do than torment others, then continue doing the things you feel are the right things to do without excuse.
I've been interneting-while-woman for three decades now so of course my username isn't my real name. But (like most developers) my GitHub includes my real name, my photo, and my company, hence my hesitation. I have to wonder, though... are you as quick to chastise the "internet assholes" you see harassing women online as you were to chastise me for having the audacity to admit my gender online?
That wasn't chastising you. I don't know you so there was no way I could have known how long you've been on the internet nor your depth in experience in identity within it. I was attempting to point out some of the bigger factors feeding into your complaint, regarding the potential of people harassing you, with the intent of no more than to bring attention to something you may have overlooked, as humans tend to do sometimes.
You're being combative for no reason so I'll leave it here but, in the future, don't always assume malice.
I see both sides here. On the whole, I think both people are trying to contribute and help in their own ways.
Some more specific comments:
> You're being combative for no reason so I'll leave it here but, in the future, don't always assume malice.
Saying "no reason" doesn't ring true to me. What one person considers to be (valid) reasons is subjective. In my view, 'reasons' includes a person's identity and experiences.
With that in mind, how do you think this alternative message would have been received... ?
"It was not my intention to chastise you. I meant well in offering some ways to reduce the chances that trolls come after you. Please don't assume malice. I'm happy to listen if you have suggestions on how I could communicate the message more effectively."
I don’t think it’s in any way “combative” to point out that I am indeed competent and capable of interacting with the online world safely without the assistance patronizing of strangers on the internet. And, whether your intention or not, that’s exactly what your comment was: patronizing. With undertones of “well what did you think would happen when you present yourself in such a way”. Whether you’re willing to admit it or not, comments like yours aren’t helpful — they’re part of the problem.
I'd suggest looking up the "principle of charity".
Additionally, claiming the self-perceived "undertones" as "the real truth" vs clearly (and reasonably) stated (subsequent) stance/explanation indeed is confrontative, at best. In my opinion and experience.
When X claims they know better what was the intent or tone of Y's message when Y is already there precisely elaborating... X has to be realistic, admit no prior knowledge of Y (nor their idiosyncrasies, nor their style of communication), be aware of the limitations of the medium (no voice tone, no body language), and take things at face value as written (and/or ask for explanation/further details in neutral and a non-confrontational way). Y has to do the same. It really is the only way that maintains functionality of the conversation.
(Also, forseeing a potential conflict: X and Y has nothing to do with bio-sex, it just signifies two unknowns.)
At least, that's the self-defined framework I use, as a probably somewhat autistic/ADHD person (people confuse and frustrate me to no end).
This is quite similar to a concept in psychology called Theory of Mind (ToM):
> Theory of mind as a personal capability is the understanding that others have beliefs, desires, intentions, and perspectives that are different from one's own.
I have a theory: It is relatively more difficult for people who have faced adversity (whether it be from systemic bias and/or personal situations) to make unemotional assessments when conditions relate to those adverse situations.
I would agree, in principle and mostly in practice. However, if one knows better, then they must also know that the same "better" can be done. It is hard, yes, and it takes practice, but is achievable. Exposure therapy of sorts helps immensely. And, if/when feeling overwhelmed, simply ask for a recess and postponement of the discussion. I call it "processing time", and it usually takes a few days, or even longer. I call upon it, when I sense it is required (on my and other side, too).
I also try to familiarize the other side with my own (aforementioned) communication style and idiosyncrasies. I'd say we all seek to at least not be misunderstood, if understanding (in sense of the agreement) is not possible... and being frank and upfront about it - helps.
> And, whether your intention or not, that’s exactly what your comment was: patronizing.
E: Not quite. You perceived the comment as patronizing. This is not a universal assessment. From my point of view, I didn't find it patronizing. I'm not saying I'm right and you are wrong; I'm simply saying it is far from clear cut.
Here is one definition of patronizing that I find useful:
> apparently kind or helpful but betraying a feeling of superiority; condescending
You may think that someone else feels superior to you. That is your assessment, I respect that, and I'll listen. At the same time, it is subjective and is uncertain, because your knowledge is incomplete.
The principle of charity is useful here. I hope you can see alternative interpretations that show N does not perceive himself as superior. In particular, their commentary, in my view, is by and large very thoughtful, with the exception of a few sharp edges (which everyone has). From what I can tell, N's edgier comments came out because they felt attacked.
That's the pattern I see here. A person feels attacked and their communication becomes less charitable and even abrasive. At least two people fell into this trap in this thread. As a community, we don't benefit when this happens, but this is human nature.
The solutions are not easy. In my view, we should try to observe, be thoughtful, and attempt to deescalate tensions. I believe a vast majority of people are here for positive reasons and have plenty to learn from each other.
I can understand that some people may disagree with and/or not understand what she said, it is both unkind and not helpful to say "Now that's just nuts." The comment does not move the conversation forward in terms of clarification or understanding. The comment does not demonstrate patience nor does it show curiosity of other perspectives.
I'm one of the very few people who actually read all of the disclosure documents. I wouldn't be surprised if I were the only person who read these documents in their entirety aside from the document drafter(s) themselves. And this wasn't in the privacy notice for a study but for the Rally browser extension. Rather confusingly, Rally has one privacy policy and each individual study will have their own, separate, privacy policy in addition to the Rally privacy policy.
> you would not have seen that and would not have gotten emotionally triggered
Don’t do that here. The comment is about Rally pretending to be about altruistic academic research while actually being a platform for Mozilla product development.
> Sorry, I definitely did not put as much thought into my comment as I should have and I left out the critical piece of information that really ticked me off.
I’m being charitable and leaving room for the possibility that they provide a tighter set of policies for some studies or revise their general policy to make commercial use be a study-by-study determination.
Or, just don’t opt-in to any of the Rally studies. Your call, it’s your data.
> Or, just don’t opt-in to any of the Rally studies. Your call, it’s your data.
First, a caveat. I don't know the people behind the comments in this sub-thread. I have read almost all of them and find them to be informative and thoughtful. So thanks for that.
That said, when I read a comment like the above, what I hear is a mentality of "you are an individual, with power, if you don't like it, act individually". That mentality is not wrong, but is quite limited and incomplete. It overlooks the power and importance of individuals discussing and organizing together, which is often much more powerful than simply "voting with your feet".
As always, with your replies to me, your spirit/style is much appreciated.
> It overlooks the power and importance of individuals discussing and organizing together, which is often much more powerful than simply "voting with your feet".
I did indeed purposefully overlook that, but only out of a working assumption that the consent of the governed is there, overall.
If there truly is no consent of the governed for informed consent being a sufficient tool for empowerment in these cases, then there are deeper topics that need to be discussed. Some of those topics could lead to conclusions that may have catastrophic implications for modern society, if not discussed in a reasonable and considerate manner. Cancelling entire industries, in one fell swoop, are some of the conclusions being drawn in the current climate, for example.
If there is consent of the governed that opt-in is a sufficient tool for empowerment in these cases, then these cases may ultimately be logically reducible to failures to uphold the laws, as they currently exist.
To further expound on the latter line of reasoning, these cases seem to anecdotally belong to a few categories: 1) I don’t like advertising, 2) I didn’t know that they could do that, 3) I don’t want anyone watching what I do.
Category 1) is a completely understandable sentiment with implications that I currently lack the energy to comment upon.
Category 2) should be taken as that individual’s human rights having been violated. Specifically, that situation could arguably be pursued in the United States under the ADA. Enforced broadly, this could have catastrophic consequences to modern society. Enforced judiciously, this could provide much needed social progress.
Category 3) is a completely understandable sentiment with implications that I currently lack the energy to comment upon.
The category 2) implications do seem to be most relevant to society, at this juncture.
edit: I can imagine that a class-action ADA lawsuit against a carefully selected set of defendants (if legally possible) could lead to resolution of this matter in the courts, without calling on the legislature to comment on topics for which they are under-qualified to comment upon.
> I’m being charitable and leaving room for the possibility that they provide a tighter set of policies for some studies or revise their general policy to make commercial use be a study-by-study determination.
Read together, the problematic parts of the general privacy policy are not addressed nor remedied by the specific study's details, because a specific study addresses how that study uses the data.
Perhaps a future study would be different? I doubt it. My take is that the concerning parts of general privacy policy's language will stand (quoted a few messages above). Here's why I say this... Based on my experience with organizations and lawyers, Mozilla is unlikely to want to modify its general privacy policy based on particular discussions with each organization involved in a study; it would be too time-consuming and expensive, and it would create a path-dependence such that every previous study details would need to be reevaluated in the light of a modification to the general policy. Instead, Mozilla probably crafted their privacy policy in a general way, hoping that it will be acceptable to participants and partners. I expect they will modify it as little as possible.
> Mozilla is unlikely to want to modify its general privacy policy based on particular discussions with each organization involved in a study
They may ultimately have a legal responsibility to do so, depending on the nature of their contracts with their research partners. I’m not a lawyer, but if I were, I’d be digging into the case law to see if Mozilla + (”public university” or “federal funds”) = “a combination that must meet all severally applicable laws”.
Part of the problem with large internet platforms is that parts of 'my data' is inextricably linked to 'your data', even to the extent that 'my data' only exists on some platforms as data points in 'your data'. In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.
I've seen the opinion expressed that part of the reason society allows this type of surveillance is that so many members of our society don't understand the details or scope. If true, whatever discussions we have about this should include the idea that we're proposing to increase the scope of the problem while researching it.
Yes, calling out linkage as a key challenge for data privacy is very important.
To dig in one level deeper... Have you looked into privacy-preserving record linkage (PPRL) or similar ideas? (I have not, but I'm interested.)
> The process of linking records without revealing any sensitive or confidential information about the entities represented by these records is known as privacy-preserving record linkage (PPRL).
> Source: "Privacy-Preserving Record Linkage".
> DOI: https://doi.org/10.1007/978-3-319-63962-8_17-1
> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.
That is a non-sequitor, when we are discussing opting-in to social science research.
Rally is not a social network platform. It is a social science platform. There is no reason for it to be directly, as a platform, concerned with your contacts.
Per their FAQ:
> We abide by a series of principles known as Lean Data Practices. Lean Data means that we collect just the data we need, and do everything we can to protect what we collect. Studies only collect data that is essential to creating a broader understanding of a research topic.
Institutional Review Boards, privacy policies, and the various contractual agreements between parties operating and building the Rally research platform would be held to task by scientific principles of treating participants humanely and ethically.
If an IRB deemed that it was unethical to conduct a study due to the design implications indicating that data could be obtained that did not originate from informed consent, then that study would not be able to be conducted and the research design would have to be modified to correct itself or that specific research methodology would be considered to be generally unethical by the wider scientific community, just the same as the scientific community deems it unethical to do genetic experiments on unwilling human subjects.
>> In that sense any opt-in choice given to another is yet another privacy breach on their 'contacts' for example.
That is a non-sequitor, when we are discussing opting-in to social science research.
> That is a non-sequitor, when we are discussing opting-in to social science research.
As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.
Regarding the risk of data linkages, how confident are you that Mozilla and others with access to the data will manage it ...
1. ... up to the currently-accepted level of knowledge (including hopefully some theoretical guarantees, if possible, and if not, mitigations with known kinds of risk) and ...
2. ... that the current level is acceptable given that history of data privacy doesn't paint a rosy picture?
To be open, I'm not interested in your confidence level per se, but rather the reasoning in your risk assessment. I want to weight the various factors myself, in other words. For example, you appear to have more confidence in IRB's than I do.
Knowing the history of the "arms race" between deidentification and reidentification, I don't put a whole lot of trust in Institutional Review Boards. Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.
P.S. In my view, using "non-sequitor" here is a bit strong, perhaps even off-putting. It is only a "non-sequitor" because you are making different logical assumptions than the commenter. Another approach would be to say "your conclusion only holds if..." This would make your point without being so pointed. It also helps show that you want to understand the other person's assumptions.
> As I understand it, the commenter's point does not rest on 'contact' linking being present. Their point is that any kind of data linking provides a reindentification risk.
It appears that the parent commenter revised its content to indicate that the concern was indeed “your data getting mixed with my data, when browsing Facebook”, to paraphrase.
My response there was essentially: ethical review would have to determine if all data must be provided through informed consent of all the originating humans.
Held to the gold standard of ethics, an IRB would likely have to contraindicate a research design if it did not provide a way for every individual human involved to provide informed consent. If any single individual in a data set indicated that they did not consent, then that data set would need to be reshaped to not include that individual. In lieu of that, the entire data set would have to be excluded from study.
Of course, that has some complex implications, when it comes to broad categories of data sources for browser usage: social networking sites would be a minefield. Did the website author provide consent for their content to be machine analyzed for sentiment, etc., if one really wanted to get down to it. You’d have to consider each and every resource location. Can’t assume that all browser traffic is open web traffic - someone could have left their Rally extension running while navigating to a corporate confidential network, complex copyrights, etc.
My understanding is that the US Supreme Court is about to decide on whether “if you can read it, you can keep it” as a consequence of Microsoft/LinkedIn vs. hiQ Labs, so don’t forget the “arms race” of justice, either.
> Many smart, well-meaning efforts have fallen prey to linkage attacks. They are insidious.
Indeed, even just basic double-blind medical studies are hard to defend when you consider operational security, let alone information security.
In case it is of interest, here is a fairly short article with a short historical look at data de-identification. If nothing else, it is one jumping off point.
I see where you are coming from but it is opt-in so I don’t believe that is theft.
All scientific research requires data and certain types of research require certain types of data to be useful.
I’m personally against ad driven data collection because 1) I don’t think as a whole its outcomes, both first order and second order, are in the long term interests of the viewers or society.
However assuming you can trust the data collected and how it is used is only as stated then studies looking to understand online interactions, where more and more of our lives are being lived, with the interest of improving long term outcomes seems like a good thing to me. Of course once the data is out of your hands you loose control over it so it’s good that Mozilla is doing data minimization and aggregation to help reduce the impacts of that.
I guess it really boils down to trust, intent, our ability to choose, and transparency which has not been respected in many cases so I very much do understand the skepticism. Here is to hoping this will be different. So far, in my opinion, this looks to be the case.
Some of the current studies [1]:
- Political and COVID-19 News
- Your Time Online and "Doomscrolling"
Edit: I read your comment [2] in another thread about the privacy policy, and that is a good point. I sent an email to mozilla asking for clarification and if I get a reply I will add it here.
I read it as "we're protecting your privacy by increasing the number of people who can monitor your activity on our browser." Presumably, the users will be well-endowed and tax-advantaged institutions who could have just bought the information from data-aggregators anyway. I'm starting to see a theme of papering over their technology products with a lot of modern art and hyperbolic language.
"Computer scientists, social scientists and other researchers will be able to launch groundbreaking studies about the web and invite you to participate." Wow. I'm about to make history with a browser add-on!
"Our first study is “Political and COVID-19 News” and comes from the Princeton team that helped us develop the Rally research initiative." Groundbreaking! College students can now make sure that I am adequately fact-checked if I err from the path of truth.
> Presumably, the users will be well-endowed and tax-advantaged institutions who could have just bought the information from data-aggregators anyway.
Nope. This is an important point: the type of crowdsourced science that Rally enables is something that researchers couldn't do before. (With the exception of a very small number of teams who made massive investments in building single-purpose crowdsourcing infrastructure from the ground up.)
Common research methods have significant limitations. Web crawls, for instance, usually don't realistically simulate user activity and experiences. Lab studies often involve simplified systems that don't generalize to the real world. Surveys yield self-reported data, which can be very unreliable.
Rally studies, by contrast, reflect real-world user activity and experiences. In science jargon, Rally enables field studies and intervention experiments with excellent ecological validity.
1. Do you expect the opt-in nature of these studies to impact their findings?
2. To compensate for the voluntary nature of the studies, do you think researchers in general will still be incentivized to find data sources that are less respectful of people's privacy and don't require an opt-in to the study?
> 1. Do you expect the opt-in nature of these studies to impact their findings?
The Rally participant population is not representative of the U.S. population—these are users who run Firefox (other browsers coming soon), choose to join Rally, and choose to join a study. In research jargon, there's significant sampling bias.
For some studies, that's OK, because the research doesn't depend on a representative sample. For other studies, researchers can approximate U.S. population demographics. When a user joins Rally, they can optionally provide demographic information. Researchers can then use the demographics with reweighting, matching, subsampling, and similar methods to approximate a representative population. Those methods already appear throughout social science; whether they're sufficient also depends on the study.
> 2. To compensate for the voluntary nature of the studies, do you think researchers in general will still be incentivized to find data sources that are less respectful of people's privacy and don't require an opt-in to the study?
Rally is designed to provide a new research capability that didn't exist before. I don't expect a substitution effect like that.
Regarding 2. that would run afoul of many ethics boards at universities. Generally they require that (informed) consent has been given to take part in the study.
> Rally studies, by contrast, reflect real-world user activity and experiences. In science jargon, Rally enables field studies and intervention experiments with excellent ecological validity.
Rally users are all opt-in. How does that impact the design of a Rally study and the conclusions you can draw from it?
I was really excited for most of the time reading thinking “A privacy-first general data store that lets us donate insights for the common good, and may even end up being a marketplace where we can choose to sell data. This is so cool! I hope they have a mobile app so I can record things like what I ate for supper or how long I looked at a screen today”
But, it seems like they’re just tracking browser behaviour, and there’s absolutely zero place for people to add their own data?
Why bother giving it a branded name then? Why not “crowd-sourced browser behaviour study in partnership with ____”?
Mozilla strikes me as an organisation with too many resources for its own good. They keep pursuing these random side gigs with limited unifying theme other than an excuse to spend money.
Reminder that donations to Mozilla are not donations to Firefox development. The Mozilla Corp (makes Firefox) and Mozilla Foundation (gets donations) are mostly separate.
> Mozilla strikes me as an organisation with too many resources for its own good. They keep pursuing these random side gigs with limited unifying theme other than an excuse to spend money.
Why does this appear to be "random" to you? What is your definition of "random"?
Have you tried "putting on a Mozilla hat" and thinking about their priorities and goals? If you do, I think you'll likely see some connections. After you do this for a few minutes, I'm interested if perhaps you can see another point of view.
It makes me wonder about their management style occasionally, like taking over two decades to fix the issue with the accidental quit shortcut on desktop. People were celebrating that it was done, but I was wondering why it hadn't been fixed earlier if it was that important (I had accidentally triggered that shortcut myself at least ten times before).
Also reminds me that their mobile division seems painfully understaffed.[1] After using Firefox Mobile for a year there still isn't even a way to search your history, along with a dozen other pain points I put up with daily. And the most painful issues I bring up are either closed or never get addressed for months, or ever. It makes Chromium-based browsers even more tempting to me, and in cases I've had no choice but to use them for things like caching bugs with stale development assets, but Chrome is really all there's left if we lose Firefox.
I really, really hope that they do not sink the current iteration of mobile Firefox. When I see resources being used on projects like these, I just wish the more pressing issues were prioritized. Firefox is the face of Mozilla.
The fact they have significantly more than 250 employees in the first place is a bit mind boggling. As far as I can tell most of the spending is unrelated to Firefox. What do all of these people do and why are they needed?
>The fact they have significantly more than 250 employees in the first place is a bit mind boggling. As far as I can tell most of the spending is unrelated to Firefox. What do all of these people do and why are they needed?
I sometimes have the same reaction when learning the number of people employed for what I would have otherwise thought to be some meager thing. Then I get reminded about the folks over in external/internal support; marketing; design; IT/infrastructure; legal; compliance; etc.
I understand there are a lot of roles needed for non-trivial organizations to function, but Mozilla has over 1000 employees and, in my opinion, does not deliver a corresponding level of value. I would actually donate if I could be sure my money is going to Firefox. I do not care about any of their other initiatives.
>> Whoever wrote this piece is way too close to... whatever it is Mozilla is doing here.
> It does appear that this piece was written by Mozilla, themselves… on their own blog.
Haha! Captain Obvious strikes again. Joking aside, I think the other commenter knew the post was written by someone at Mozilla on the Mozilla web site. Their point, I think, was that the author didn't do a good job of selling Rally.
It's not easy to understand, especially from the blog post alone, but as far as I understand it, the proposition is the following: Big companies have the ability to do research on users all the time, either by doing anonymous studies or by tracking you for their ad networks.
This is a luxury many researchers that work outside of these big tech companies don't have, which creates a scientific power imbalance. Mozilla Rally is meant to give these capabilities to everyone, and the platform is meant to ensure that you always know what you sign up for and what data is being used.
If I understand the Princeton example correctly: They want to figure out how people consume and spread misinformation. Social networks like Facebook have all that data but won't share it. Now you can opt-in to a Rally study where independent researchers can examine the data.
> This is a luxury many researchers that work outside of these big tech companies don't have, which creates a scientific power imbalance.
The power imbalance goes far beyond science. Independent research is foundational for platform accountability. An example: when I was working on the Senate staff, before I started teaching at Princeton, a recurring challenge was the lack of rigorous independent research on platform problems. We were mostly compelled to rely on anecdotes, which made oversight and building a factual record for legislation difficult.
> Would appropriately rigorous independent scholarship be considered as a trustworthy source within your sphere?
Definitely. Academia doesn't have a monopoly on excellent technology and society research. The Markup's data-driven investigative journalism, for example, is outstanding.
1. In your experience, what is the maturity level of Solid?
2. Would you mind sketching out how you would do the integration with Solid? I'm reading over https://solidproject.org/users/get-a-pod but haven't spend a lot of brain cycles on it yet.
I’m in the same boat as you - haven’t spent many brain cycles on it yet.
Mozilla Rally would have been the perfect proof-of-concept for putting that idea to use and giving it a shakedown.
Generally, Solid seems to be pretty grounded in its attempt to resolve data sharing privacy concerns.
My understanding is that the implementations would be about the same as using browser storage.
I believe developers would need to approach data access from the perspective of “an infinitely sharded document database - shards are globally-uniquely identifiable - shards are remotely distributed - shards have their own IAM - for each shard, you must register as an authorized user and authenticate to access remote data”
Looking through the FAQ, It seems like Rally requires users to send their raw data straight to the aggregation service, with the only privacy guarantees being that the data is encrypted during transport, and a "promise" that they will run internal audits to make sure private data isn't released from their servers.
IMO this seems to provide worse privacy than even Google and Micro$oft's telemetry, which at least use differential privacy to make sure that each individual's privacy is somewhat protected (the data you send is randomised so even if the aggregator is compromised by a malicious third party (e.g. NSA) individuals have some degree of plausible deniability).
Sure, Mozilla's intentions may be more "pure" (or is that just their propaganda speaking?), but in terms of privacy guarantees this seems like it is a strict downgrade, that abuses their goodwill to hide its deficiencies.
> with the only privacy guarantees being that the data is encrypted during transport, and a "promise" that they will run internal audits to make sure private data isn't released from their servers.
There's much more than that, including: privacy and security review before a study launches, a data minimization requirement, a sandboxed data analysis environment with strict access controls, and IRB oversight for academic studies.
> IMO this seems to provide worse privacy than even Google and Micro$oft's telemetry, which at least use differential privacy to make sure that each individual's privacy is somewhat protected (the data you send is randomised so even if the aggregator is compromised by a malicious third party (e.g. NSA) individuals have some degree of plausible deniability).
The vast majority of Google and Microsoft telemetry does not involve local differential privacy. Google, in fact, has almost entirely removed local differential privacy (RAPPOR) from Chrome telemetry [1].
We've been examining the feasibility of local differential privacy for Rally. The challenge for us—and why local differential privacy has limited deployment—is that the level of noise makes answering most (often all) research questions impossible.
Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?
E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."
It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.
> Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?
Yes. Central differential privacy is a very promising direction for datasets that result from studies on Rally.
> It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.
I've done a little re-identification research, and my faculty neighbor at Princeton CITP wrote the seminal Netflix paper, so we take this quite seriously.
Interesting. I can see that RAPPOR seems to be deprecated in favor of something else called ukm (Url-keyed metrics) but not why this change is being made. Is there somewhere I can read more about it?
I am not aware of any public announcement or explanation. Which is... probably intentional, since Google is removing a headline privacy feature from Chrome.
Our team looked closely at the Google, Microsoft, and Apple local differential privacy implementations when building Rally. It helped that we have friends who worked on RAPPOR.
I like that you can run your own versions of these studies, e.g. https://github.com/mozilla-rally/rally-study-01 to collect your own data. I think it would be a novel idea to make this the primary focus of the project, enabling users to see and understand their own data before donating it to research projects.
> I don't have much of an idea on how they plan to quantify doomscrolling via attention or audio events.
It seems that what they’ve done is told us exactly what data they will collect but have left it unanswered as to what specific social science methodologies they will use to analyze the data.
It is very likely that their intention is to not set that expectation with the data crowdsourcers so that the researchers have the flexibility to adjust their methodological approaches in an iterative fashion.
So they don't have enough manpower to maintain features people actually want (like the JavaScript toggle in settings which Chrome apparently has) but they do to make things no one will use that will get shut down in a few years?
Surprisingly difficult to understand what this thing is or does. Seems like you can "donate" your browsing behavior to science for doing studies in a kinda anonymous privacy protecting way.
It doesn't even sound anonymous. It sounds like they just audit that the partner aggregates the data once delivered to them. Some sort of pinky-promise.
But, realistically, what recourse do users have if there's a breach?
A few years ago I got a letter from Washington State University saying that they'd obtained my personal health data (which I never directly consented to, nor had I interacted with WSU in any way prior to receiving this latter) and subsequently allowed it to be stolen by unknown malicious actors. In case you're curious, they kept the data on an unencrypted hard drive in a physical safe that was then physically stolen.
There was some class action pittance that meant nothing to me, and WSU does not seem to have been subject to any meaningful consequences. It seems to be viewed as a cost of doing business sort of thing. For all of us who had their data stolen, the horse has left the barn, and I see no real deterrent effect. This seems to be the norm when data breaches happen.
So while "pinky promise" might be a bit hyperbolic, there is a lot of truth to it in general and I don't know how this case is supposed to be any different. If there is some paradigm-breaking accountability mechanism built in, I'd love to hear more about it.
> There was some class action pittance that meant nothing to me, and WSU does not seem to have been subject to any meaningful consequences.
Class action suits, from my understanding, are about 1) compensating the initial plaintiffs, 2) setting an appropriate punitive damage to redress the harm to society, and 3) distributing that punitive damage in an equitable fashion.
You are benefiting from 2) and 3) as a claimant in a class-action. 2) is where the meaningful consequence happens, in my estimation.
I agree. I haven't read beyond the linked page but my take is that it doesn't do anything about the data that you're already "sharing", but provides an additional mechanism for you to share data with specific parties in a controlled manner.
All this seems reasonable if that's the case, but the page is misselling it.
> All this seems reasonable if that's the case, but the page is misselling it.
There are limits to how much info you can fit into one page. Marketing pages are built in such a way that they guide you to where the detailed pages are.
>Then why have the linked page. If it's literally not informative enough than it's just a poorly written press release.
Usually marketing pages will include just enough information and imagery to grab your attention. There will be some links embedded which ultimately lead to more and more information.
Basically, if you're interested in learning more, you'll be motivated to read/click further. If it doesn't interest you, you leave having taken a fraction of the time than had the page been filled with all the details.
I guess what I'm getting at is: if article does not provide enough background to even be able to participate in a discussion about the subject, it's a bad article.
> Softer fields are likely to be found, even if perhaps at lower frequencies, in the physical and the biological sciences. Conversely, there is no reason why high-consensus fields should not exist in the social sciences, too.
Whenever someone says they want to study anything relating to "misinformation", it's almost a guarantee they're looking to solve human problems with technical solutions.
Oh look: "Rally has the potential to help address societal problems we could not solve before."
No, I don't think it does. Especially when the first thing you tout is a study on "misinformation about COVID".
The fact that American's are locked in a bitter political cold war with each other is not a technological problem, and does not have a technological solution.
It's like trying to help a couples on the brink of divorce by updating their phones with new firmware, and changing how Facebook's algorithm works. The reason the couple is threatening divorce is 1% what happens ON their phones, and 99% what happens off their phones.
We have to break away from this stupid idea that if only we studied what was happening on Twitter and Facebook we could fix our problems.
> Whenever someone says they want to study anything relating to "misinformation", it's almost a guarantee they're looking to solve human problems with technical solutions.
What in your experience motivates you to say this?
Not the OP but I think it is because of a lack of trust in institutions, due to an escalating culture war and tribalism.
As an educated person interested in facts and science, there is a surprising lack of due diligence in the media around a lot of the current hot topic issues (not going to name them). Every time I do my own research on one of these issues, it is obvious the media and our institutions are more interested in supporting an official narrative than diving deep into the facts. Opposing opinions are just not 'allowed' anymore in some areas. I personally don't even trust people using the word 'misinformation' anymore, as it has been so abused
> lack of trust in institutions, due to an escalating culture war and tribalism
Precisely. Whenever someone talks about "disinformation" as being the cause of this, it would be insulting, if it weren't so detached from reality and Orwellian.
"Gosh, these dumb people are reverting to tribal behavior because they're getting lied to! The real problem is they can't think for themselves -- we just need to figure out how to tweak the algorithm so the disinformation doesn't spread, and they get fed the right truths. Then we'll have harmony again!"
People need to get off their computers, and stop trying to understand (and fix) society through Twitter.
> We have to break away from this stupid idea that if only we studied what was happening on Twitter and Facebook we could fix our problems.
Analogously, to demonstrate absurdity, consider this statement: we have to break away from this stupid idea that if we studied what was happening on our crop fields and our roads we could fix our problems with agriculture and traffic.
This is exactly the data a modern Cambridge Analytica would want (e.g. shares, time spent on each post, all correlated to demographics). I hope this platform has controls to ensure that study data isn't misused post-study for non-study purposes, because the FAQ answer isn't so encouraging[0].
"With Mozilla’s permission, researchers may retain aggregated or de-identified datasets for their analyses. Mozilla may also retain aggregated data sets which we may release in the public good to foster an open web."
Shouldn't you ask the users for permission on using their aggregate data for purposes that could be different to the study they enrolled to?
>They do, given that you’d have been informed of that before you provided your consent by opting in to Rally data crowdsourcing.
I haven't tried registering for the study, but I note that an FAQ answer is not enough. Users cannot be expected to search for the FAQ. This needs to be mentioned in the consent form, and not done implicitly 'by opting in' like corporations do.
>Kind of nice that it’d be in the hands of Mozilla, Princeton University, other trusted research partners, and the open web, isn’t it?
The 'open web' is a bit of a nebulous concept, and the CA data was supposedly in the hands of a data scientist at the University of Cambridge.
> It'd be much better to not collect that data at all.
That may not be possible to enforce without shuttering the entire computer networking infrastructure of the world and rebuilding it on new computer science foundations.
>What if, instead of companies taking your data without giving you a say, you could select who gets access to your data and put it to work for public good?
Implied false dichotomy. What if I refuse to use companies that do not fully encrypt my data in such a a way as to them not having any access to it.
Also, Mozilla isn't going even half this far. They are relying on a contractual obligation with their partners and internal audits to protect my identity. I'm skeptical of their ability to protect my data to the point I will never use this service.
So they are practising on collecting more data but with the motivation that it is somehow for the good of society? It is still doing the one thing I don't want Mozilla to do, embrace data collecting, no matte the banner or motivation.
Also universities through history have done numerous unethical studies, they don't automatically gain my trust or data because they pinky promise.
I think this is an interesting idea, but I don't know how well it scales.i think it is a high bar for individuals to explicitly manage every single donation of personal data.
What I have thought might work is that individuals could donate their data to a non-profit entity that would have a legal responsibility to protect contributors' data. The non-profit would act as a data broker for researchers. It could sell aggregate data, but the sales would go to charitable causes. Contributors could opt out of specific research or specify specific charities.
The result is a little like a non-profit data brokerage. It would put experts in charge of data protection and confidentiality, and it explicitly acknowledges that data has value. The broker would have legal recourse and financial means to track down data abuse.
Sorry if this is too off topic, but is there a name for the art style they are using in the illustrations? It seems to be all over corporate sites these days.
Interesting. I spent the past two days building a prototype which solves this problem for a specific use case “sharing search results”. Going to Show HN in a few days.
This article is light on technical details, or even a high level how it works.
I wonder how they solve the authenticity / validity of the data.
Funnily enough the solution I’ve come up with isn’t compatible with Firefox due to api limitations.
I think that it is important to create a precedent of mass data collection that reveals to the users what is collected and gives tools to manage it. The legislators can use it as a blueprint to require a similar level of disclosure from other companies that do online surveillance. However, I do not see what value Rally offers to the enrolled users.
I am failing to see the motivation for anyone to use this. If you cared about privacy, you wouldn't share your data in the first place. If you didn't care about privacy, there is still no reason to opt in.
You aren't getting my data, stop trying. That may seem harsh, but that is where I'm at. I'm really put off by the idea of sharing my data these days. I don't trust Mozilla's internal audit process to keep it safe, but it isn't specific to them. Outside of maybe a two companies who are 'encryption-first', I don't trust any of them
Yeah, I’m big on this stuff because I like scientific studies to be biased towards me. So like, imagine there is some treatment for a disease that only works on people who live in San Francisco.
My ideal outcome is that the study is accidentally biased to serve us and that all of America pays for this medicine.
Ideally, all outcomes are tailored to me through sheer participation bias.
Sure sure, everyone knows that sampling bias exists but there is a systemic human bias to what has been measured. I intend to exploit that.
And as for my data, Equifax got the most sensitive stuff so I’m not worried about anything else.
One example, without having read further into that particular study, but knowing what misinformation research tries to do, generally, would be the following:
Identifying the specific mechanisms behind the dark patterns of social engineering.
There is a body of research, for example, that considers the spread of rumors by making use of the same mathematical models that epidemiology uses to model the spread of infectious disease (virality, in particular).
As I recall, that body of research led to the conclusion that the best weapon against misinformation may be the liberal and broad application of truth-telling.
> What is the criteria for becoming a partner?
Currently, we are working with a small number of reputable research teams who match our commitment to your safety and trust and who also have the technical expertise to help us build Rally’s scientific capabilities.
All partners are required to sign an agreement with Mozilla. This agreement upholds the researcher’s rights to independence in their research, while mandating the safeguards they must follow in order to protect our users.
In the future, we will expand beyond our pilot partners. Please contact us if you are interested in becoming a partner!
Nothing on if studies based on this will be published as Open Access. I can see that Jonathan Mayer's latest publications are all published without OA so that doesn't bode well for the covid news study. And ironically are the 'Beyond the Paywall' researchers no better at publishing Open research.
Yet again though we're being asked to give it up for free...
>But for too long, online services have pilfered, swapped, and exploited your data without your awareness. Privacy violations and filter bubbles are all consequences of a surveillance data economy. But what if, instead of companies taking your data without giving you a say, you could select who gets access to your data and put it to work for public good?
So taking and benefitting from my data is alright as long as I'm aware of it?
>But, being “data-empowered” also requires the ability to choose who you want to access your data.
No...being data empowered would mean I'm profitting off my own data, not giving it to companies to profit off of.
>We’re kickstarting the Mozilla Rally research initiative with our first two research collaborator studies. Our first study is “Political and COVID-19 News” and comes from the Princeton team that helped us develop the Rally research initiative. This study examines how people engage with news and misinformation about politics and COVID-19 across online services.
And the first study in this is about politics, that great word again 'misinformation' that gets tossed around everywhere.
So, the first study will be using user data on the news they read and how they interact on social media in regards to politics for what purpose exactly?
This makes the world better how?
>Soon, we’ll also be launching our second academic study, “Beyond the Paywall”, a study, in partnership with Shoshana Vasserman and Greg Martin of the Stanford University Graduate School of Business. It aims to better understand news consumption, what people value in news and the economics that could build a more sustainable ecosystem for newspapers in the online marketplace.
So, the second study is 'how do we make news organizations more money?"
I'm kind of noticing a pattern here...
>With Rally, we’ve built an innovative, consent-driven data sharing platform that puts power back into the hands of people.
No, you've found a way to convince people they should give you their data for 'research that benefits the world' that's in reality thinly veiled market research.
It would be nice if a company could just be fucking honest for once.
This whole blog post should just be:
"Look, at Mozilla we realized we can make money collecting data, we're going to pretend we couldn't just take it from you and give you the option to opt in to make us money."
FF Send, FF Notes, Positron, FF OS, Appmaker, and Deuxdrop to name a few. They tend to keep more stuff around than Google, but still cancel things that I like frequently.
You keep accusing people of being nihilistic / cynical in this thread. It isn't cynical to be extremely skeptical of any company's ability to keep your data private and safe given the constant barrage of breeches and ethical violations.
Maybe I'm in the minority but Mozilla's brand had been permanently tarnished in recent years and I do not think of them as a 'privacy first' organization
> It isn't cynical to be extremely skeptical of any company's ability to keep your data private and safe
By dictionary definition, cynicism is an attitude of scornful or jaded negativity, especially a general distrust of the integrity or professed motives of others.
Skepticism is defined as the doubt as to the truth of something.
Extreme doubt as to the integrity of a company’s data privacy practices is the dictionary definition of cynicism.
> I do not think of them as a 'privacy first' organization
I won’t pry, but I suspect that you are not a web developer, by trade.
It is common knowledge in the web development field that Mozilla is a, if not the, standard bearer for consumer privacy protection in the browser wars.
Like I mentioned, there are perfect legitimate reasons and need to do it.
However, the best privacy practice is to never share it. If you share, then it is no longer privacy first. The goals which one wants to achieve come first. I'm not saying it will not have adequate privacy protection. But it's not "privacy first" because it introduces the risk to someone's privacy for the benefit of something. That something comes first.
How exactly are academic researchers supposed to guarantee that their research will be applied for the good of humanity?
Government is you and me and everyone else.
If you want government to care about it, write to them and tell them to care about it.
They probably aren’t going to care about random comments on HN.
Research on ethical machines has been taking place since the dawn of the drone age and has already guided military policy. You can find the supporting evidence of that for yourself if you search for it.
Not the OP, but no, the government doesn't and can't represent me. We are systematically locked into a two party system due to winner-takes-all voting in the USA and you seem to think that two (very similar) parties are enough for me to have representation. They aren't.
It also isn't nihilistic to have a very high bar for sharing any data these days. I personally am beyond considering sharing my data and go to great lengths to keep my important data safe. I don't trust anyone with it.
As a nerd I can read between the lines—clearly they have come up with some sort of privacy-preserving data collection system that is useful. But at face value this whole article is just saying "hey use Firefox and give your data to scientists for reasons we don't bother to explain because obviously it's good."