Sure. But I have a question: why? Why should we opt out of the telemetry? To me, this idea seems to not just be admitting defeat, it's ensuring defeat right from the start.
Telemetry should always be opt-in. Yes, that means vendors will get much less data. It's on them to deal with it.
On a related note, I wonder how long it takes until one of the vendors of popular CLI tools or desktop apps get fined for GDPR violation. I wonder how much of existing telemetry already crosses the "informed consent" requirement threshold. I'll definitely be filing a complaint if I find a tool that doesn't ask for it when, by law, it should.
I'll play the devil's advocate. Most people will shoot support e-mails at you which are more or less "app crashes". If you have not already encountered the problem you have to walk them through tedious debugging process. If you collect crash reports, you have probably already fixed the problem.
For usage data, it allows developers to focus on features that matter and know which ones you can remove.
For example I don't collect any data in my app, but it also means that I fear removing any features that are slowing me down, because I have no idea about how people use it.
As for why sometimes this would be better as opt-out, well, on iOS crash reports are opt-in, and only about 20-30% of users have them enabled. That is fine for huge programs or ones with little surface.
Your point is basically "surveillance data is useful". And, well, yes. There would be zero debate over surveillance if there were literally no desirable reasons to have it.
Calling it surveillance is misleading in most cases.
The most recent crash reporting system I worked with was little more than a stack trace without any user data recorded at all. Not even IP addresses of the report.
We didn’t care who was crashing and we didn’t collect any PII at all. It was a simple report about where in our code the crash occurred.
It was very useful for fixing bugs. No surveillance or PII involved.
If that was the norm then people would opt in. But the trust is gone now, bad actors have ruined it for everyone. And the solution to that is not to enable it by default.
I think this might be the winning argument. There may not be any meaningful reason for telemetry to outweigh the bad actors and damage they've done.
For me, i'd love to enable telemetry for some of my more liked, FOSS apps - but even with those my question immediately arises "What are you sending?".
Without someway to monitor, are they sending filenames? Are they sending file contents? How much is it? etcetc
To satisfy my questions i need some sort of router-enabled monitoring of all telemetry specific traffic. So i can individually approve types of info... and that seems difficult. But the days of blanket allowances from me are long gone due to the bad actors.
Excellent point. As a user, here are my requirements if you want me to opt into your data collection scheme:
1. All exfiltration of data must be under my direct control, each time it happens. You can collect all the data you want in the background, but any time it is transmitted to the company, I must give consent and issue a command to do it (or click a button).
2. All data that is exfiltrated must be described in detail before it is exfiltrated. "Diagnostic data" isn't good enough. List everything. Stack trace? Crash report? Memory dump? Personal info (list them all out)? Location information? Images (what are they, screenshots? from my camera?) Time stamps from each collection. If it's nebulous "feature usage data" then list each activity that is being logged (localized in my language). Lay them all out for me or I'm not going to press that Submit button.
3. I need to be able to verify that #2 is accurate, so save that dump to disk somewhere I can analyze later.
4. The identifiers used to submit this data should be disclosed. Is a unique user id required to upload? Do you link subsequent uploads to the same unique id? Is that id in any way associated with me as an account or as a person in your backend? I want you to disclose all of this.
5. For how long do you retain the data I sent? How is this enforced? Is there a way for me to delete the data? How can I ensure that the data gets deleted when I request it to be deleted?
6. Do you monetize the data in any way, and if so, am I entitled to compensation for it?
I don't know of many (if any) data collection schemes that meet this bar, yet.
If you install an app via Google Play, crash reporting & some telemetry are silent and enabled by default. This is in addition to any crash reporting built into the app.
Analogy time: in the early days of the Internet, any IP address could send an e-mail by directly connecting to the mail change host listed for the domain of the To: address via SMTP. This was a good, nice thing.
Bad agents, spammers, ruined that. The trust is gone, and wow it's a terrible idea to accept SMTP connections from any random IP addresses.
Legitimate senders have to "opt in" to the SMTP sending system by getting a static IP of good repute, or else using a forwarding host which has those attributes.
That, and additionally, if there's an opportunity to monetize data from (or derived from) telemetry further, companies won't hesitate selling it (or access to it) to third parties.
That, and additionally, if the data is stored, it's a question of when, not if, it'll become a part of some data breach.
To developers and entrepreneurs feeling surprised by the pushback here, please take a honest look at the state of the software industry. Any trust software companies enjoyed by default is gone now.
I think the old crash report screen will be answer to the paranoid crowds now. App crash the stack trace windows appears with the data and a button to click to email the stack trace to the developers. That used to be the norm before privacy data became lucrative business.
It sounds like there must be a standard of privacy for certain apps.
I work with UL a lot and they have lists of standards and specifications that help us meet the safety requirements of electronic devices. These standards are then used to meet the customers demand for a high level of safety. Customers in my field do not even consider products that don’t have UL. This strategy is good to better inform the consumer while standards are kept by an independent firm whose incentives are aligned to maintain their credibility.
I am not deep in the software field but I can imagine that groups like the EFF or similar orgs have a standard. The issue is that the consumers of these products don’t seem to care about this outside of the privacy advocate world.
In Linux, we use `abrt` for that, but user need to review and submit bug report with stack trace manually, so nobody complains about that. Yep, it's very useful.
How do users know if that's all the data that's submitted? Auditing every program to see exactly what gets sent (and repeating the process for every update) is way too much work; it's safer to just opt-out by default.
Now that I think about it, it's safer to simply not use software that opts users into studies (like telemetry analysis) without informed consent.
> How do users know if that's all the data that's submitted?
That's the thing, isn't it? They'll never know. They can't; it takes deep technical knowledge to even be able of conceptualize what data could be sent, and how it could be potentially misused.
Which is to say, it's all a matter of trust. Shipping software with opt-out telemetry (or one you can't disable) isn't a good way to earn that trust.
Even with deep technical knowledge and severely pared down telemetry, PII embedded in side-channel like outlets could be missed. Think a simple stack trace with no data is PII-free? Probably. But are you sure that the stack of functions called doesn't depend on the bytes in your name?
The anti-tracking point in the era of Do-Not-Track was “building advertising profiles that follow people around the web is evil.” Software telemetry is neither a personal profile, nor for advertising, nor correlated across more than one surface. As such I think it’s a little bit of a stretch to put this in the same bucket as 3rd party JS adtech crap.
> Your point is basically "surveillance data is useful". And, well, yes. There would be zero debate over surveillance if there were literally no desirable reasons to have it.
Well, not exactly. The original comment was that "surveillance data" is useful to the user and their installation. For instance in getting better response to your error report. Or, I'd say, in keeping your software patched and up to date (check for updates is included in the OP as suppressed by 'do not track', since it necessarily reveals IP address).
Even if none of these things that make it useful to the actual users were in effect... surveillance would still be happening because it lets the vendor monetize the users data.
Useful to whom and for what seems relevant, instead of just aggregating it all as "sure, it's useful in some way to someone or it wouldn't happen."
> For usage data, it allows developers to focus on features that matter and know which ones you can remove.
I find tools that don't do this are generally more powerful because they allow for deep expertise and provide a ton of payoff if you put in the effort.
E.g: Vim. 80%+ of users probably don't use macros. Hell, I use them <1% of the time. But I'm sure glad they're there when I need them.
No, you can't remove it. Even though I'm using it rarely it's existence might be the reason for me to use the tool at all, so that when I need it the feature is available.
This came about with audacity. There I have my set of standard filters I run all the time, even though they don't bring much benefit, they are there and nice. They will be on top of a usage statistics. Then there are filters I need for a special effect or to repair something really broken. Those I use ahrdly, but when, they make the difference.
Or when talking command line: `ls` without options I use a lot (Well actually a lie, i have some alias in my shell rc), sometimes I use `-a` or `-l`. This doesn't mean that maintainers should remove `-i` since once per year or so I need it to compare inodes with log entries or something and then it's important that flag exists.
You need qualified information about what features are important. Not unqualified statistics.
Ah, so the question of whether to remove feature A or feature B is solved by simply not removing either. That's brilliant!
I get that this might not be a popular sentiment, but resources are finite. If we have a situation where we can't maintain both features, which one do we focus on? Usage metrics can absolutely be beneficial there.
I don't think it's usually a problem of "remove either A or B", it's usually "should we remove A or invest work in making it compatible with changes ahead?".
I see how usage telemetry can be useful in deciding whether or not it's worth it to keep supporting a feature, but I offer two counterbalancing points:
1. What people may be worried about - what I myself am worried about - is the methodology creep; it's too easy to end up having telemetry drive feature removal decisions, as in, "monthly report says feature X is used by less than 1% of users, therefore let's schedule its removal for the next sprint". The problem here is, telemetry alone will likely lead you astray. It's useful as a data source, not as the optimization function of product development.
2. If a feature you're worrying about has significant use, you most likely already know it without telemetry - all it takes is following on-line discussions mentioning your product (yes, someone might need to do it full-time). If removing the feature will have major impact on your maintenance budget, and non-telemetry sources don't flag this feature as being actively used, you can just axe it - revenue hit from lost userbase you've missed is unlikely to be big.
From this follows that the telemetry is most useful for deciding the fate of features that aren't used much, and don't cost much to maintain. At which point, I wonder, do you really have such low margins that you can't afford to carry the feature a while longer? I'm strongly biased here, because I'm going only by my personal experience - but I'm yet to see a software company[0] that doesn't have ridiculous amounts of slack. Between the complexity, management mess-ups, piles of technical debt and the nature of knowledge work being high-variance, having a feature slow your current development down by half[1] won't have much long-term impact.
--
The question thus is, are the gains from usage telemetry really worth the risk and potential ethical compromise? Would those gains be significantly lessened, if the telemetry was opt-in, and the company put more work into getting to know the users better? I suspect the answer is, respecting your users this way won't hurt you much, and may even benefit you in the long run.
--
[0] - Other than one outsourcing code farm I briefly worked in (my boss loaned me for a couple weeks to a friend, to help him meet a tight deadline), but these kind of companies don't make product decisions, they just close tickets as fast as possible.
[1] - And hopefully leading some devs notice the need for a refactoring, in order for that feature to not be a prolonged maintenance burden.
I'm a hundred percent certain that 1 is what Spotify does. It seems every release they remove or hide features that I use occasionally (e.g. clicking on the currently playing album picture to reveal the active playlist). It's extremely frustrating that since power users now are in a minority they are completely ignored.
At the first release like this they basically said "This new version has a lot less features, but you can vote on which ones we will add back!". They added none of the ones that got votes, and eventually removed the feature voting system altogether.
> Most people will shoot support e-mails at you which are more or less "app crashes". (...) If you collect crash reports, you have probably already fixed the problem.
Fair enough. Still, there are two separate steps here: collecting crash reports and sending them. What if the app asked if it can send the report, letting you optionally review it? Many programs today do that, I think it's an effective compromise. Additionally, the app could store some amount of past crash reports, and the places for the users to get the support e-mail (a form, a button, an address in a help file...) could request you to check for, or automatically call up, those past crash reports, and give the user choice to include them. The way I see it, the app should give users near-zero friction to opt-in, but still have them opt-in.
It won't solve the problem of bad support requests completely, but nothing ever does - random people will still write you with problems for which you have no data (e.g. network was down when crash occurred), or for which no data exists (because requester is a troll).
> For usage data, it allows developers to focus on features that matter and know which ones you can remove.
I accept this as an argument in favor, though personally, I don't consider it a strong one. I feel that "data-driven development" tends to create worse software, as companies end up optimizing for metrics they can measure, in lieu of actually checking things with real users, and thus tend to miss the forest for the trees.
Picking good metrics is hard, especially in terms of usage. The most powerful and useful features are often not the ones frequently used. Like, I may not use batch processing functionality very often, but when I do, it's critical, because it lets me do a couple day's worth of work in a couple minutes.
So, for me, can usage telemetry improve software? Shmaybe. Is it the only way? No, there are other effective - if less convenient - methods. Is the potential improvement worth sacrificing users' privacy? No.
> on iOS crash reports are opt-in, and only about 20-30% of users have them enabled. That is fine for huge programs or ones with little surface.
I feel the main reason this is a problem is because of the perverse incentives of app stores, where what you're really worried about is not crashes, but people giving you bad reviews because of them. Mobile space is tricky. But then, forcing everyone into opt-in telemetry doesn't alter the playing field in any way.
This would be tricky - the nature of software means some people would try to keep crashing their programs on purpose, sending repetitive crash reports, in order to make money. Developers would now have to deal with a flood of spam in their crash reports.
I think the only way crash reporting can work, outside of support contracts, is as a favor by the user to the vendor. But, to maximize the amount of such favors, the vendor would have to treat users with respect - which is pretty much anathema to the industry these days.
Valid concern, but bug bounties are a thing; It’s up to the developer to decide if the bug is worthy of a payout. Maybe make it so that if it crashes, and they provide a useful log (or steps), then you pay out.
From what I know, bug bounties already have a spam problem. I definitely saw some devs in my circles complaining about people repeatedly sending garbage submissions in hopes of getting a payout.
What bug bounties also have is a big barrier to entry. You generally need to be at least marginally competent in software development, and do plenty of leg work, to make money with them. Turning regular crash reports into bug bounties removes that barrier, amplifying the spam problem.
One point here - support emails do not help you identify problems that less invested users may be having with your product.
For example, I used to work developer relations on TensorFlow. We wanted to make the framework accessible to enterprise data scientists. The problem was that these users were not familiar with the tools that we commonly used to get feedback - GitHub issues, the mailing list, etc.
Most of them were using TensorFlow on Windows via Jupyter, which wasn't well-represented among the users that we had frequent contact with.
It was really hard to understand the universe of issues that prevented most of these users from getting beyond the "Getting Started" experiences. Ultimately, these users are better served by easier to use frameworks like PyTorch, but I think a big reason that TensorFlow couldn't adapt to their needs is that we didn't understand what their needs were.
Another big problem is that it takes a certain level of technical sophistication to know how to send maintainers a useful crash report. If you rely on this mechanism, you will have a very biased view of your potential user base.
Sure, and I can spy on girls I like just so I can learn how to offer them a better experience when they meet me.
Having good intents does not justify skipping consent. The “opt-out” mentality is a very slippery slope, since you’re already stating that consent does not need to be explicit (hint: it’s not consent if not given explicitly AND freely).
I don’t think the simile fails for that reason. “Informed consent” is generally accepted as the standard, and it’s arguable that click-through EULAs are not in any way “informed consent” for the average person.
To bring the original simile full circle, let’s say a person signs a contract with another person, and then that person forces themselves on them. I doubt the presence of a clause in the contract saying “I agree to have sex with X” would absolve them of guilt.
There certainly is a difference in harm between forcing sex and sending unique user IDs, and it’s not unreasonable to make a contract about the latter if one wishes, is it?
This is rather a reason to prefer free software licenses. Culture has yet to catch up to this, but in the long run I hope the collective consciousness learns to distrust and avoid complicated proprietary software licences.
Why not collect the data locally and only share it after a crash when the user agreed?
Why not add a splash screen on the program start up that informs your user of upcoming plans so they can intervene? Like "Hey we are planning to remove feature X to speed up the program, do you agree?"
And this is because actual usage metrics don't really translate to opinions. I have features in programs that I use maybe once every two years, but then I really need them. Then there are other features I use daily and I really hate them with a passion.
That certainly is the devil’s side. Unfortunately too many firms have affixed phony halos and then exfiltrated the People‘a personal data. Opt-in is the only way the People will be able to choose whom they trust.
Telemetry alone doesn't tell you how valuable a given functionality is. A critical problem, one imposed through the tyranny of the minimum viable user,[1] is that the high value creator community of any tool or service is going to be small. Quite often 1%, if a very small fraction of that.
Your telemetry data will show you only that such features see little use. They won't tell you how much value is derived from that use, or what the effects of removing such functionality will be on the suitability and viability of your application.
Default telemetry is demonstrable harm for negative gains.
You have basically just given a justification that crash reporting can be conducted based on legitimate interest instead of consent, and as such does not require opt-in.
Many people mistakenly believe consent is the only possible justification for data processing under GDPR, whereas there are actually 6 possibilities, and you can ask a lawyer which one can apply for a given data processing flow.
Note that whereas I do believe that crash reporting can indeed be considered legitimate interest, I wouldn't consider plain telemetry ("phone home without a technical good reason") to fall under that umbrella...
Not everyone is using a flatrate and thus both crash reporting and telemetry might cost the sender some money (or calculate against their hugh speed quota). Some people seem to expect that data transfers are always free ...
That's one reason (besides privacy) why I have Netguard running as a firewall on our Android phones and set to block traffic by default for each app, unless the app's creator convincingly explains why their app should be allowed to access the net.
Actually, this is why I want telemetry to be opt-in. I have a consistent policy of providing telemetry and I want the software to be biased to my needs. I want them to conclude that some feature used only by some privacy-conscious user is unused and should be axed because I want the software to be hyper-tailored to me.
I want the software to be streamlined, have no features except what I'll use, and for the community to be specifically people like me. I want other people to not use the software and use up dev bandwidth.
And I love it when telemetry biases the stats towards me. That way all devs will eventually be making software for people just like me.
I know you're sarcastic, but it wouldn't be a bad outcome. Sure, the vendor would have to be particularly dumb in their usage of telemetry[0], but the result would be... software that is useful to you. All the professional features you need would be in there, with none of the glitter.
Of course, I would opt-in too, with the same mindset but different use cases, and the software would provide equally for both of us. Add in a few more people like us, and we'd end up with a quality tool, offering powerful and streamlined workflows. Those who don't like it would start using a competing product, and tailor it towards their needs. Everyone wins.
Reality of course is not that pretty, but at face value, it still beats software optimized to lowest common denominator, serving everyone a little bit, but sucking out the oxygen from the market, preventing powerful functionality from being available anywhere.
--
[0] - It's a mistake that's much easier to make when you're flooded with data from everyone, rather than having a small trickle of data from people who bothered to opt in.
I am actually not being sarcastic. It is a life goal for me to have most policy organized to serve me or people like me (along whatever genetic/social/cultural/economic grouping is most likely to benefit me). i.e. I encourage communities not like me to refuse participation in medical research, forcing participants to be in my ethnic group; I encourage stringent data sharing norms and a culture of fear around what is done with health data in socio-economic and ethnic groups that are not mine; I encourage organizations to have strict opt-in requirements, in general, which I have no problem meeting, so that tools are built to be best used by me and adequate for others.
My dream is that everything is above the adequacy threshold for everyone else so that they don't build their own equivalent tool but that everything is also past my pleasantness threshold. I think the most effective means of doing this is to focus existing products into being past my pleasantness threshold while ignoring others since high switching costs keep most people on the same path they were before, and because things like medical research they don't really get to re-optimize.
I understand that this sounds sarcastic, but it is not.
Well... then I apologize for assuming. I'm not sure how such philosophy sits with me, I need to think about it more. One thing for sure, what you say makes you the model of an ideal free market participant :).
My experience is third party products/add-ins (specifically McAfee Enterprise Suite) cause most of the crashes I have with Office products. Most OS blue scrrens are non-microsoft hardware drivers.
I was mainly referring to their apps on Mac and Android.. They crash a LOT. Especially Teams for Mac and Outlook on Android. And both use lots and lots of telemetry. Lots and lots of traffic to hockeyapp.com (which is a MS telemetry platform) and various telemetry URLs at microsoft.com.
Yeah, I understand this becomes necessary to minimize maintenance effort on rarely used features,but it feels like even more often it's used to turn useful applications into barren wastelands. Firefox, perhaps the best example, is a fraction of what it used to be.
This may be a noble goal but then it may also lead to the program never being updated. If you need to do a major update (underlying system library no longer works, dependency is deprecated and has a security hole, a new direction requires overhaul of backend) you may need to prioritise what to keep and what to let go.
A problem I encountered was also localisation. Once you localise your program, adding any string is exponential work. In this case removing features can give you a lot of slack.
> For usage data, it allows developers to focus on features that matter and know which ones you can remove.
Sorry, but if devs are requiring THAT tight connection with end users to MAINTAIN software, they are probably should stop and leave. Its impossible to figure out a new feature from the such reactive approach, and they would have to resort to more traditional way to interact with end users. Thus... making coverage analysis a totally redundant thing.
Tighter user connection is suitable for enterprise software, not for general deployment.
And why so much worries about removing working(!) features?
I agree wholeheartedly with the idea that it should be opt-in but both approaches are equally unenforceable. The inverse of what's suggested in the article would be:
export DO_TRACK=0
Project owners that want to track you simply won't take any notice of these flags anyway.
I agree. The approach I'd like to see is, standardizing on some kind of DO_TRACK for convenience [0], and then doubling down on legal enforcement of opt-in telemetry. Project owners should be incentivized to seek consent, by threat of legal action from data protection authorities - and then, standardizing on some sort of DO_TRACK flag would be a no-brainer for them.
As it is, letting the industry standardize on a DNT opt-out is just making telemetry more established as a standard practice, making it harder to argue that it should, in fact, be opt-in.
The problems we have with tracking on the web are in big part because it was an established practice before appropriate legislation against it was drafted. In the CLI space, we have an opportunity to nip it in the bud, because it's not - as of yet - standard practice for console tools to silently spy on you.
--
[0] - And while we're at it, standardizing on a browser-provided consent UI, instead of each site providing its own, with its own dark patterns. It's the same idea.
> And while we're at it, standardizing on a browser-provided consent UI, instead of each site providing its own, with its own dark patterns. It's the same idea.
We've already been there and it was basically shelved because users were indifferent and companies wanted the data regardless.
> Project owners should be incentivized to seek consent, by threat of legal action from data protection authorities
Definitely agreed, and I'd want to have some form of strict liability for data breaches, based on what kind of information has been leaked. Currently, a company holding data about me (e.g. name, email address, phone number, credit history) causes a large amount of risk to me, but themselves carry no risk in case of a data breach. They are the ones who can decide to collect less information, keep shorter retention policies, or restrict access to prevent a breach, but they have no incentives to do so.
Yes, this would be a complete up-ending of many business models, but if your business model relies on collecting data without collecting the associated risk, it's a business model that society shouldn't allow to exist.
The problem is a simple one. Most telemetry tools (Mixpanel, Sentry, etc.) don't give the developers who are adding them into their products the ability to quickly add respectful consent flows.
This really needs to be a feature of the telemetry tools in the first place. Because, ultimately, most telemetry is being implemented by startup engineers who are burning the midnight oil to complete the telemetry JIRA ticket before going back to the long list of other stuff they have to implement.
I have experienced this from all three sides - as a software engineer implementing telemetry, as a product manager consuming telemetry, and now as a founder who is building a tool to collect telemetry in the most respectful manner possible.
> now as a founder who is building a tool to collect telemetry in the most respectful manner possible.
Thank you for taking being respectful to users seriously.
I'd be very interested in learning how your consent flows look, and what other aspects of your product are driven by the goal to "collect telemetry in the most respectful manner possible". I couldn't see much on it on the landing page, so if you have a moment, could you provide additional information, either here or in private?
Consent is calculated at the time that each reports are sent back. This means that your users can grant and revoke their consent on a per-report basis, which is the only respectful way to do things.
We are also building programs which will deidentify reports on the client side, before any data is even sent back to our servers. This work is still in the early stages, but here's v0.0.1 of the Python stack trace deidentifier: https://www.kaggle.com/simiotic/python-tracebacks-redactor/e...
Besides Python, we also support Javascript, Go, and we added Java support last week.
I would really love to hear any feedback you have.
That is, are you relying entirely on trust and/or contractual obligations, or do you have some means of enforcing that the user of your SDK isn't cheating?
> Consent is calculated at the time that each reports are sent back. This means that your users can grant and revoke their consent on a per-report basis, which is the only respectful way to do things.
Correct. I like how you think about this. I assume the SDK user will be ultimately responsible for prompting the end-user for consent; I wonder if you have any "best practices" documents for the software authors, so that they don't have to reinvent respectful consent flow UX from scratch?
> We are also building programs which will deidentify reports on the client side, before any data is even sent back to our servers.
I don't see any code in that Kaggle notebook you linked (I'm not very familiar with Kaggle, I might be clicking wrong). Should I assume your approach is based on training a black-box ML model? Or do you use some heuristics to identify what data to cut?
Thanks for looking at the code, and for your feedback!
Here is a recipe for adding error reporting (reporting of all uncaught exceptions) in a Python project. The highlighted line shows that, when you instantiate a reporter, you have to pass a consent mechanism:
https://github.com/bugout-dev/humbug/blob/main/python/recipe...
We allow you to create a consent mechanism that always returns true:
> consent = HumbugConsent(True)
Of course, someone can always create their own subclass of HumbugConsent which overrides that check. We don't have a good way to prevent this, nor would we want to restrict anyone's freedom to modify code.
Our emphasis is on building simple programs that we can reasonably expect to run on any reasonable client without using an exorbitant amount of CPU or memory. For this reason, we aren't using black box ML models. Rather, we analyzed the data and came up with some simple regex based rules on how to deidentify stack traces for our v1 implementation.
We are in the process of doing this for more languages and building this into a proper deidentification library that can be imported into any runtime - Python + Javascript + Go + etc.
"If you want me to test your app(lication), pay me."
- so-called "power user"
What many of today's software authors want/expect is free testing.
"To me, this seems to not just be admitting defeat, it's ensuring defeat right from the start."
While I do not use any of the example programs the mentioned, it seems like these environmental variables would be appropriate if the user wants to toggle between tracking and no tracking. However, for users who would never want to enable tracking, "no tracking" should be a compile-time option. It would not suprise me if that is not even possible with these programs. How is the user supposed to verify that "Do Not Track" is being honoured.
> What many of today's software authors want/expect is free testing.
Many of apps and tools are open-source and free. While I assume everyone wants to provide best experience, it's hard for me to justify being angry for bugs and problems in tools that I got for free, not bought them.
Secondly, the industry realized that going fast, releasing often, measuring results, and improving over time is a winning strategy. No matter how often we as users will complain that "they changed something again", we still want to get things fast. Deploying new version once per year is not something we would really like in most cases.
And fast development cycle inevitable comes with bugs, but they can be fixed quickly, not in the next year. Because even if you spend 2 months on testing your app, it will still contain bugs that will surface after the first real user touches the app.
Let's imagine the ls command with telemetry. What happens when you make an error like this?
$ ls all-the-pr0n
ls: cannot access 'all-the-pr0n': No such file or directory
Um, what did I just tell the ls vendor? Who are they sharing that data with?
> Telemetry should always be opt-in
Opt in needs to be very precise and spell out exactly what is being shipped. For a lot of command line tools, telemetry is going to create more problems than it is worth.
> I wonder how much of existing telemetry already crosses the "informed consent" requirement threshold.
OPT-IN is so much better for users. But if analytics is helpful, then probably opt in means you get no data, unless you keep bugging people asking people to opt in - which would be horrible too
> But if analytics is helpful, (...) [opt-in telemetry] would be horrible too
I don't question that analytics can be helpful. I do question the degree to which it is, relative to other methods of gaining the same insights (such as better QA, user panels, surveying people, etc.).
I also don't think it would be horrible too. Inconvenient, yes. But horrible? People used to ship working software before opt-out analytics became a thing.
What I meant would be horrible experience - CLI apps halting and asking on every execution much like websites do with news letters, cookie banners, or paywalls.
How can users be incentivized to opt-in, particularly with a free-to-use application? I can see a case for ad-supported software, a developer could reduce or eliminate ads for that user in exchange for telemetry data...
Ask nicely. After all, you want them to do you a favor.
If you're less into respect and more into manipulation, offer them a meaningless trinket. A sticker on the app's home screen saying "I helped", or something.
Yeah, this makes opting out my responsibility. If a company collects the names of all the files in my home directory, it's my fault for not setting some random variable correctly. Oh, and you did remember to also set it in your crontabs, right? If not, oopsie! You're gettin' spied on!
This proposal is terrible and comes at the problem from the exact wrong direction. If someone wants to come up with a "export GO_AHEAD_AND_SPY=yes" envvar that enables telemetry, fine.
If you think GDPR always requires consent, you would be wrong. Consent is just one of many possible legal bases, and usually the one you use only when you can't use any of the rest. In this case, and more widely, it's not at all clear which types of personal data processing do or do not require consent.
I know consent is only one of the possible bases. My comment was not about GDPR per se. I mentioned this law because of the spirit behind it - the law itself sets the bar very low (arguably below the point of what I'd consider ethical behavior). The other reason I mentioned it is because GDPR is currently the only stick we (at least, some of us) have to push back on the invasive telemetry. It's not nearly enough.
GDPR notwithstanding, I'm of firm belief that any kind of telemetry in software should be strictly opt-in and require informed consent. I say should, it's an ethical view, not legal.
If the variable is widely implemented, then that provides the nice, single point of control that the users need.
In the FOSS world, we typically have distros between the applications and the users. If the applications honor the variable, then that's all the control that is required. A distro can implement an opt-in model by defining the variable with a value of 1 in the base system, so that it's present right from boot.
It's not a big enough of a change for people to actually choose Linux distributions over it - those who know about the variable will set it themselves, those who don't will be stuck with a bad default.
My issue isn't with the point of control - it's with the default. Telemetry of all kinds should be opt-in. People shouldn't have to worry that they're constantly being watched. They shouldn't have to hope that every single telemetry stream is operated by competent and careful software engineers, guided by honest and law-abiding managers. You know how this industry works; it's a rare case where a data collection scheme doesn't overreach, accidentally ingest too much, leak data, or turn malicious and pass it to bad actors.
I am installing Windows to a new machine right now. And here's the "Privacy Settings" setup step;
Title: "Diagnostic Data"
Explanation: "Send all Basic diagnostic data, along with info about the websites you browse and how you use apps and features, plus additional info about device health, device activity, and enhanced error reporting."
Right on point. Also its at the mercy of the application and even then one there must be a trusty third party who can certify that an application follows the spec. This is not practical.
The reliable and practical way is to have a ad blocker at the kernel level similar to browser ad blockers.
I really do believe the best solution is a checkbox at install, but the checkbox starts filled. Yes, that's technically OPT-OUT, but it's extremely obvious and in front of you to anyone who actually cares about opting out.
Maybe contractors should take whatever they want from your house while they are doing their job. If you don't like it you should monitor them more closely, and opt-out, or do business with others.
The software you write is yours. My data is not. You have every right to include or not include features, but you have no right to take my data without my permission. Your rights end where mine begin.
>> I wonder how long it takes until one of the vendors of popular CLI tools or desktop apps get fined for GDPR violation.
How would GDPR help with anonymous data? Say you have a CLI that sends back the frequency of usage for all top level commands daily. If the user doesn't log into the tool, or that information isn't sent then the developer would have IP address. If they discard that, how would it land under the remit of GDPR?
I'm curious because I think it's easy for small developers to try and jump on this bandwagon. The big companies will all have vetted their telemetry strategy with their legal teams and have compliance reviews in place, as well as people who will handle cleanup from data spills. Bob is less likely to have this for his popular CLI tool.
> Say you have a CLI that sends back the frequency of usage for all top level commands daily. If the user doesn't log into the tool, or that information isn't sent then the developer would have IP address. If they discard that, how would it land under the remit of GDPR?
I think it wouldn't, given proper handling of the IP address.
Where I'd expect your Bob to land in trouble is in mishandling crash reporting, in particular wrt. logging. It's very common for log files to accidentally acquire passwords or PII, or potentially other secrets protected by different laws. To be safe here, you'd have to ensure no user-provided data, or data derived from user input, ever enters the log files - which may include things like IP addresses and hostnames, names of files stored on the machine, etc.
Because massive chunks of the population will never turn it on not due to ideological commitment, but simply due to no knowledge that it exists. Furthermore, if we define “tracking” as broadly as “any crash reports and any checking for updates” that effectively means these features will not work, and the open-source maintainers will have a much harder time tracking down bugs and encouraging people to update to less buggy or more secure versions of their software.
Why not simply fork or choose not to install code you don’t like, rather than forcing your beliefs about what does or does not constitute acceptable code on the developers?
You are not entitled to crash reports - that applies to open source developers just as to anyone else. If you wan't crash reports, have some kind of wizard or command to submit them and point to that when a crash happens, but you must always gather informed consent before submitting that data.
I like how games typically handle this; if the application crashes a dialog appears asking if you want to send a crash report, and often tells you what's included in said report.
On a more CLI oriented design, take a look at Debian's report-bug program. It's completely transparent, and still gathers enough information that most times a one-line description of the bug is enough for anybody to understand everything that happened on your system.
Many versions of Android do the same (although the details may differ, that's the beauty and the curse of open source). After crashing a (on some devices, only after a few times), you get a popup that says something like "<app name> appears to stop unexpectedly, would you like to report this to the developer?"
If you click on the details button, you can see almost everything outside of pure hexdumps of RAM.
There's no need for automatic sample submission if you respect your users' privacy.
If it’s so important to the project the devs can ask users, educate them, and receive informed consent. There are plenty of ways a dev can force a user’s attention for a few minutes to hear their “pitch” and if the users still don’t want to opt in after hearing the reasons perhaps the reasons aren’t nearly as compelling as the devs believe them to be.
I don't believe its unreasonable for it to be opt-out. Building software is very hard, and even something as inane as "automatically report crashes to the developers so we can fix them quicker" or "tell us how many users are on each version so we can estimate the blast radius of some backward-incompatible change" would be categorized as tracking.
Here's the problem: people are idiots. You can manage or visit any Github issues page for a major project for ten minutes and recognize that even our industry is not immune from this. People also, overwhelmingly, use the defaults. When presented with the option to turn on tracking, most people won't, despite the fact that for most developers, its a legitimate good which benefits the user over the long-term.
You can say "well, if people want to be idiots, that's their right". Idiocy never remains in isolation. If they refuse to update the app, then update Windows and it stops working, Users don't throw their hands up and say "oh well that's my bad". They don't complain to Microsoft. They complain to AppDevs. That becomes a ticket, which is written far-too-often from the perspective of anger and hate. Its triaged by, usually, overworked volunteers.
Telemetry is not all bad. There is no "ensuring defeat" right from the start, as if its some war. Most developers just want to deliver a working project; telemetry enables that. Giving users the ability to opt-out, maybe even fine-grained control over what kinds of telemetry is sent, is fantastic.
> even something as inane as "automatically report crashes to the developers so we can fix them quicker" or "tell us how many users are on each version so we can estimate the blast radius of some backward-incompatible change" would be categorized as tracking.
Devil is in the details. Unless you are very careful, even a basic crash report may leak PII (commonly, through careless logging).
> Here's the problem: people are idiots.
I know what you're referring to, but I have a similarly broad and fractally detailed counter-generalization for you: companies are abusive. Their consider individual customers as cattle, to be exploited at scale. They will lie, cheat and steal at every opportunity, skirting close to the boundaries of what's legal, and considering occasional breaches into outright fraud as costs of doing business.
Yes, I know not all companies are like that - just like not all users are technically illiterate. But the general trend is obvious in both cases.
What this means is, I don't trust software companies. If a company asks me to opt into telemetry, with only a generic "help improve experience" blurb, I'm obviously going to say no. It would be stupid to agree; "help us improve the experience" is the single most cliché line of bullshit in the software industry. There's hardly a week without a story of "some well-known company selling data to advertisers". Introduction of GDPR revealed the true colors of the industry - behind each consent popup with more than two switches there is an abusive, user-hostile company feeding data to a network of their abusive partners. So sorry, you have to do better than tell me it's in the long-term benefit of the users - because every scoundrel says that too, and I have no way of telling you and them apart.
And now for the fractal details part:
> Idiocy never remains in isolation. If they refuse to update the app, then update Windows and it stops working, Users don't throw their hands up and say "oh well that's my bad". They don't complain to Microsoft. They complain to AppDevs.
Yes, idiocy on both ends. There's a reason why users refuse to update the app. It's because developers mix security patches, bugfixes, performance improvements, and "feature" updates in the same update stream - with the latter often being a downgrade from the POV of the user. I'm one of those people who keep auto-update disabled, because I've been burned too many times. I update on my own schedule now, because I can't trust the developers not to permanently replace my application with a more bloated, less functional version.
(Curiously, if usage telemetry is so useful, why software so often gets worse from version to version?)
Secondly, if the user updates Windows and your app stops working, it's most likely your fault. Windows cares deeply about not breaking end-user software, historically it bent over backwards to maintain compatibility even with badly written software. It's entirely reasonable to expect software on Windows to remain working after Windows updates, or even after switching major Windows version.
> Most developers just want to deliver a working project; telemetry enables that.
Telemetry does not enable that. Plenty of developers delivered working projects before telemetry was a thing. What enables delivery of working projects is care and effort. Telemetry is just a small component of that, a feature that gives the team some data that would otherwise require more effort to collect. Data that's just as easy to lead you astray as it is to improve your product.
> Giving users the ability to opt-out, maybe even fine-grained control over what kinds of telemetry is sent, is fantastic.
Yes, and all that except making the telemetry opt-in is even more fantastic. You want the data? Ask for it, justify your reasons for it, and give people reasons to trust you - because the average software company is absolutely not trustworthy.
The standard of "care and effort" in software engineering today begins with observability.
An error-handling branch without a counter on it will not get through code review. That an incident was detected through user reporting and not telemetry/alerting is a deeply embarrassing and career-limiting admission in a postmortem. That logs were insufficiently detailed to reproduce them problem will be a serious and high-priority defect for the team. Something like an entire app without any crash reporting is gross negligence on the part of the Senior VP on whose watch it happened.
I'm not really remarking whether this is good or bad, you're free to think this is a bad move, but from my perspective it is definitely the way the industry moved. Among my colleagues, releasing without a close eye on thorough telemetry is some childish cowboy amateur-hour shit.
You consent to a scope of work. If you want line item control over exactly how the work gets done, what tools the workman gets to bring, what creature comforts he is and isn’t allowed on the jobsite, that’s something you can negotiate.
Like if you’re Amish or under some weird historic preservation regime or working near a delicate billion-dollar scientific instrument. Perhaps you really need carpentry done with hand tools. You can find a contractor who wants to do that. You don’t hire a normal firm and then get mad that they failed to seek consent before plugging in their table saw.
Not if users get something in return. Like with the windows insider program. Opt in to get betas and provide automated feedback. If MS turned off telemetry for regular users they'd be doing a great job.
Telemetry should always be opt-in. Yes, that means vendors will get much less data. It's on them to deal with it.
On a related note, I wonder how long it takes until one of the vendors of popular CLI tools or desktop apps get fined for GDPR violation. I wonder how much of existing telemetry already crosses the "informed consent" requirement threshold. I'll definitely be filing a complaint if I find a tool that doesn't ask for it when, by law, it should.