Over the years I've noticed this, privacy defeatism, amongst other excuses used to push back against security and privacy enhancements.
Another is security chicken and egg. "Because A isn't secure, there's no point in securing B. And because B isn't secure, there's no point in securing A." I've seen arguments along those lines used against improving the security of, for example, DNS and SNI. Or just DNS in general. "There's no point in securing DNS because your ISP can see what IP addresses you connect to."
The all-or-nothing argument has been used to argue against HTTPS. "State actors like China can circumvent HTTPS, since they control trusted CAs! So why use HTTPS at all?"
Then there's the arguments against security because "it makes development harder!" People have argued against mandatory HTTPS saying that it makes developing websites harder.
The list goes on.
Luckily it seems that pro-security and pro-privacy succeed in the end, for the most part. But those that use these pathological arguments have certainly delayed things more than they should have been.
I'm not a privacy defeatist, but I am a fingerprinting defeatist. Here's why: I don't think it's realistic for caching and anti-fingerprinting to co-exist, and given those two options users will always pick the former because the latter would be perceived as slow.
The classic example is:
<script src=foobar.js>
Where foobar.js just returns something like "var id=0x1234567", the user who doesn't want to be fingerprinted cannot cache this script because it could be uniquely generated.
It would be simple enough to rely on resource hashes to not actually contact the sources of files and then to skip or approve seeking hashes you can't get from your preferred shared caches. Thus a unique file gets ignored by the most anonymity concious.
I think there is a chicken and egg here combined with IP incentives to not fix how web browsers work.
I hoped the great firewall would have accidentally fixed this by making it preferable to refer to hashes that can be found in peer caches irregardless of CDN status.
I think you're forgetting that resources cannot be shared if they're marked private. You cannot get account_statement.js from your shared cache, it has to be unique to you.
Your example actually isn't incorrectly marked private.. So that is already covered in my previous comment.
The most anonymity concious would realize they are trying to do banking in what is supposed to be their anonymous session and never fetch the file.. if they somehow missed that they were entering auth details?
The way things need to work on the web involve choices that you apply differently (or IE applies for Windows users.) The defeatist response is not to implement any choices. A typical user will want a small number of PII sites, so let's have only PII mode and autofill their details into forms in any blog!
There are resources that shouldn't be saved in a shared cache and shouldn't be seen in an anonymous session. It is not a coincidence that they are both and it is great that they are explicitly marked.
But, if the server sends you a private resource, this implies the server must already know your identity. Being private it's not supposed to send it to anyone else, so it needs to identify you in order to know "does this user have access to this resource?".
The point is that other scripts on the page can find out that id=0x1234567, and use that value for tracking requests. Since you've cached it, they can track you across sessions.
The idea is that if you visit two sites, and on both sites you use the same token, the ad-provider (or whoever) can associate you across the two origins because of the cached token.
I've been thinking that the "State actors like..." argument is rather weak because it seems like the more we learn about "State actors" from anywhere the more it seems like they can get around anything. If they happen be be focused on your or your organization it seems like it's just game over no matter what you try to do.
Luckily, I guess, securing yourself against state level actors keeps you safe from everyone else that's trying to do bad things, so, win/win?
A "state actor" focusing on you should be considered at least as dangerous (and semi-impossible) to guard against as a dedicated adversary that wants to infiltrate you specifically. Just as with a dedicated adversary, it's no justification not to make it hard for the average or law-abiding citizen and what they can easily come up with.
We don't have locks on our front doors to make it impossible for unwanted people to get inside our houses, we do it to prevent the average person, deter the interested criminal, and hopefully give us warning when it is or has happened.
> Luckily, I guess, securing yourself against state level actors
To be clear, there is no securing yourself against state level actors. You can harden yourself, and reduce your target area and/or footprint, but there's no way to actually make yourself secure.
> To be clear, there is no securing yourself against state level actors. You can harden yourself, and reduce your target area and/or footprint, but there's no way to actually make yourself secure.
Yes. Your best bet is hiding. Once you're identified, it's basically over. It's just another facet of the state monopoly on force.
Resource monopoly is rather a subset of force monopoly, no?
I mean, consider guns. In the US, citizens can own guns. And armed private security is OK, too. But there's consistent push-back against private militias, armies, etc. Unless they work only outside the country, in acceptable contexts.
> To be clear, there is no securing yourself against state level actors. You can harden yourself, and reduce your target area and/or footprint, but there's no way to actually make yourself secure.
You can, if you gain the support of an opposing, comparable state actor. That's what Snowden has employed. But 100% security is still a myth, not only against state actors... reality itself is just not like that. Unless you can hide inside a black hole.
Irony. HTTPS deployment in China is going very ... not bad.
First and second tier websites already switched it on long ago. Because in China we had an ancient tradition for ISPs to sniff and modify user's traffic to (for example) inject their own affiliation codes to the web request. Those websites hates it of course.
On the "CAs is controlled by state" side, some privacy enthusiasts in China already removed those CAs from their system long ago. Go give them some hug and support: https://github.com/chengr28/RevokeChinaCerts
And of course, in the context of CA, what's truly important is transparency in the audit process and continuous monitoring. Catch them red handed will end their business[0] once and for all. And so far, most of them are rule-abiding.
Absolutely. Heard that with stateless IPv6 DHCP. "If you can fingerprint the browser, why does it matter that one can use the MAC component of the IPv6 to track people across networks?". Heard on HN...
> I don't see the flaw in that one. It does seem kind of extra to secure DNS. What's the argument in favor?
If ISPs control your DNS, then they can block domains they don't want you to reach or redirect unknown domains to their own ad pages like Comcast has in the past. If you use someone else's DNS, your ISP could still block network packets sent to certain IP addresses, but they wouldn't want to block all AWS IP addresses just to block one site on AWS.
> when Chinese CA's abuse their position they get blacklisted [in Chrome]
AFAIK, Chrome has very small market share in China. Most Chinese users use chimeric browsers that combine Chromium and IE's Trident engines. Many Chinese bank and government sites require NPAPI plugins for custom crypto, so the popular browsers can seamlessly switch a tab from Chromium to Trident for sites that are known to require plugins.
>If you use someone else's DNS, your ISP could still block network packets sent to certain IP addresses, but they wouldn't want to block all AWS IP addresses just to block one site on AWS.
Well, that's sort of what Russia did to try to ban Telegram...
Russia may decide they're OK with blocking AWS. Comcast probably isn't going to do that. Is it really essential that we solve the Russia problem before we solve the Comcast problem?
Even on the Russia side of things there are advantages.
If you use your ISP's DNS, they can block a specific site or service with very high granularity and zero cost. If you don't, they can still block things, but they have to resort to cruder methods that risk annoying customers/citizens and degrading their overall network health.
Over time these extra annoyances might become a kind of death by 1000 cuts, allowing ISPs and countries that don't censor to outperform them in ways that even ordinary, privacy/freedom ambivalent people care about.
This is all quite proper. It's healthy for censorship (even when it's possible) to have costs. Where possible, we should try to build technological infrastructure so that networks that censor won't be able to compete on an even ground with the networks that don't.
Well, there isn't much benefit to securing DNS or using third-party name servers, unless you also use VPN services and/or Tor to prevent your ISP from seeing what sites you access. However, securing DNS can somewhat interfere with geolocation by websites. Given mechanisms that point to the nearest servers.
But if you are using VPNs and/or Tor, it's crucial to secure DNS. Tor exits do DNS lookups for clients. And properly configured VPN services do as well.
Even so, can't ISPs still see what sites you connect to? Or is everything after the inital connection to Cloudflare hidden in HTTPS? And how many sites are typically reachable through a given Cloudflare IPv4 address?
In TLS 1.2 (what a good browser and site uses today) the Server Name Indication sent from your client is plaintext, as is the certificate sent back by the server and the choices made by both sides during key agreement.
In TLS 1.3 (not yet officially published as a standard but basically finished and already drafts are used by Firefox, Chrome, and Cloudflare) only SNI remains plaintext.
So yes, the FQDN you're connecting to will still be revealed, but an adversary can't trust it, unlike the certificate itself, you could be lying if the remote server lets you.
If you scroll back a few days HN discussed the Internet Draft about SNI encryption (it's a problem statement rather than a proposed solution) so they want it, it's just not clear how it could be done (there are lots of bad / ineffective options)
"There's no point in securing DNS because your ISP can see what IP addresses you connect to."
Perhaps the ISP has fingerprinted the pages of every website on every shared IP address so they can easily determine which website the user is visiting? Why rely on unencrypted DNS or unencrypted SNI when it is so trivial to map IP addresses to domains.
A poor example, but consider this single IP address 216.239.36.21 with a reverse lookup returning a 23M list of 1,283,151 domains.
The biggest barrier for privacy is not a technical one. It's that most people don't care. We see outrages in a few intellectual or tech saavy niches, like hn, but the average joe has many, many, other priorities before this.
I see a parallel between this and digital rights management. Eliminating probabilistic fingerprinting for browsers is always going to be a technical arms race.
"your ISP can see what IP addresses you connect to"
Why not make some browser extensions that randomly connect to all sorts of websites and download random web pages?
There's plenty of bandwidth for us to do that these days, and your real browsing habits should be able to hide in the noise.
Of course, for something more sophisticated, the "random" browsing should statistically match real web browsing, to make traffic analysis more difficult.
I’m a privacy defeatist because the vast majority of people who argue for things like “https everywhere” and secure DNS nevertheless still use CC payments and buy most of their things online, which practically defeats the whole purpose: your CC-processor or bank now know when and if you’re going to have sex (you bought some condoms with your CC), know your political and ideological affiliations (based on the books you’ve purchased online) etc etc.
It’s all pretty much a farse, I’d call it “voodoo privacy”.
That's ironic that you responded with the points, exact spirit of which the parent rebukes.
The issues you are raising are indeed real issues, that will be eventually addressed. And that "eventually" largely depends on the pressure that privacy-minded consumers create.
And just because they are not addressed yet, it doesn't mean we shouldn't care about fixing other weak links.
It’s not ironic at all, I’ve read about this discourse about data privacy for at least 10 years, nothing has changed for the better since then, quite the contrary. I for one take active steps in combating it (I very rarely use my CC for purchases, I buy almost nothing online etc), it’s tiring to be lectured on privacy by people who act one way and say a totally different thing.
Just because one person or entity (the CC company in this case) has all my data doesn’t mean every other person or entity should have it too. The fewer that have your data the better and the more it becomes possible to slowly improve your privacy level.
Es, my CC company can infer a lot about me. Yes, they may even be selling that information to others. That doesn’t mean that I want anybody who can MITM my requests to be able to see what I’m doing (or, worse, send me forged content for whatever purpose).
That is a big reason I stopped using chrome. I block all the cookies and scripts I can but they were still collecting data from other shit and serving me obvious directed ads.
There are two ways to defeat fingerprinting:
1) Be common: have your data match what others are sending
2) Be unique on every page request: scramble your data
Browsers in category (2) are not trackable, even though they appear to be fingerprintable. Each page request is a different fingerprint.
What is unclear from the article is whether the studies mentioned (EFF, INRIA) considered that and tested that. Does anyone know? Because privacy does not require non-unique fingerprints, it just requires untrackability.
Firefox has two about:Config settings, one stops canvas fingerprinting, the other notifies.
Other settings stop giving out a list of plugins or fonts. And e.g. unlock or umatriz can rotate your user agent.
It’s not perfect, but with a few tweaks Firefox is much much harder to fingerprint - though IPv6 tends to undo all of that because many providers assign a prefix per customer which will never change and can be used by anyone to correlate.
> why browsers can still stop fingerprinting (emphasis mine)
> privacy defenses don’t need to be perfect. Many researchers and engineers think about privacy in all-or-nothing terms
So browsers can't stop fingerprinting but they can reduce it, and the article author's title falls in the same all-or-nothing trap that they criticize others for. Too often practicality is viewed as defeatism. I don't remember defeatism, I remember practical limitations to stopping all uniqueness vectors but realization that some can be stopped.
I remember somebody who worked with Firefox saying that even the idea of not by default revealing the OS and hardware platform received too much opposition.
If you are not uniquely identifiable but 99% of the users are that's enough to identify you. That's why US Navy funded the tor project. If their people were the only ones that could hide from other governments then those other governments knew who those unidentified people were.
This is also a big issue with NoScript and similar plugins. A very small percentage of real users (ignoring bots and headless browsers and such) intentionally disable JavaScript, so they're painting a target on themselves while trying to do the opposite.
This is true. However, most users of NoScript/uMatrix/uBlock are blocking the script that actually does the fingerprinting. So while the server could infer information because the script didn't run, usually they get no notification and tracking simply fails.
For sure. I use NoScript for browser security, not for fingerprint protection. I would still definitely recommend people use NoScript/uMatrix/etc. It's just an unfortunate side effect.
The article mentions that the biggest impact comes from limiting the highest-entropy information.
It wouldn't be hard to rank the various attributes by entropy and then you - individually - could calculate the probability that you'd be uniquely identifiable.
As the % of users that are identifiable drops eventually the value isn't worth the cost. If only 5% is identifiable that is enough that it won't be done.
Note that identifiable comes is degrees. If they can't tell me from my wife that doesn't matter for most purposes.
Dropping the average number means that trackers make less money which means that they have less resources to invest in tracking which means that there's less of a chance that you specifically will be tracked.
And sees it as an indicator that preventing fingerprinting is possible:
Only a third of users had unique fingerprints,
despite the researchers’ use of a comprehensive
set of 17 fingerprinting attributes.
To me, the 17 attributes do not seem comprehensive at all. For example they don't make use of the users IP. So much can be derived from the IP. Carrier, approximate location etc. They also don't use the local IP, which is leaked via WebRTC:
They also don't seem to measure the performance of CPU, RAM and GPU when performing different tasks.
But yes: Browsers should do more to prevent fingerprinting. But it seems they have no inclination to do so. That they don't plug the WebRTC hole that leaks the local IP is a strong indicator for me that privacy is low on the list of the browser vendors. Or maybe not on the list at all.
There's generally a tradeoff between usability and performance, and resistance to fingerprinting. If your browser has WebGL enabled, the machine (not just the browser) can be fingerprinted. If it caches resources, adversaries can discover browsing history.
Re WebGL, I gather that canvas fingerprints are based on graphics hardware and drivers. So all browsers on a given machine have the same canvas fingerprint. I've used https://browserleaks.com/webgl for testing. And with VMs, it's even worse. I found that all Debian-family VMs on a given host have the same canvas fingerprint. But Windows, macOS, CentOS, Arch and TrueOS (aka PC-BSD) VMs each have distinct fingerprints.
About cached resources, it's my impression that adversaries can exploit XSS vulnerabilities for detection. Most simply, you just measure load time.
I don't know. In my testing, I don't recall that I even used related Debian and Ubuntu releases. So I doubt that just updating the graphics driver would change the fingerprint.
However, I was using VirtualBox VMs, so it's possible that my results were artifacts caused by restricted choice in virtual graphics drivers. Testing that would be rather tedious, and I'd appreciate correction.
> But there’s another pill that’s harder to swallow: the recent study was able to test users in the wild only because the researchers didn’t ask or notify the users. [2] With Internet experiments, there is a tension between traditional informed consent and validity of findings, and we need new ethical norms to resolve this.
So the result of their privacy-advocating study was itself only obtainable by breaching the privacy of their participants. (I.e. making them participants without them knowing or consenting to it)
Especially, canvas would become useless in an opt-in scenario and would just foster permission fatigue (people just clicking "Allow" because it makes those dialogs go away).
Another option might be to define closely (pixel-by-pixel) how a canvas should look like after a specific action. That way vendors would have less room for 'their way' of drawing things but the result would look equal and would be useless for fingerprinting.
Similarly, one could define a list of fonts which every browser should bring and all other fonts should be loaded from a server. It would eliminate the 'you have special fonts installed' problem completely.
The canvas wouldn’t be opt in, only the pixel reading feature would be - and that’s an edge case that I can’t imagine being used in even one in a thousand uses of Canvas. Just drawing to a canvas has no privacy implications and covers almost all uses.
There's many amazing things only possible in canvas because you can read the pixels directly (especially in displaying complex pixel art), so removing that capability out-right would be highly disappointing.
I suppose a better solution would be to mark the canvas as tainted if any text is drawn onto it, so that pixel reading capabilities are blocked. Another aggressive solution could be to redraw the entire canvas without gpu acceleration if pixel data is requested. Together these would severely limit canvas fingerprinting.
Or just block it and display a warning that the user can accept before the operation is allowed. And as you say: all canvas impls should come in 2 flavors, one “reference” software impl that guarantees the exact same pixels given the exact same operations. Using any other impl should block readpixel or redraw with the reference impl before reading.
I realize there are edge use cases that need reading a pixel I just don’t agree with the idea that they would more important than limiting tracking. So I think the canvas limiting mode (used in ff safe mode) should be default.
"Another lesson is that privacy defenses don't need to be perfect. Many researchers and engineers think about privacy in all-or-nothing terms: a single mistake can be devastating, and if a defense won't be perfect, we shouldn't deploy it at all."
This "all-or-nothing" perspective is rampant on www forums discussing computer topics and certainly HN is no exception. It is particularly acute in any discussions of "privacy" or "security".
There are countless examples.
Earlier this week the topic of SNI rose again to HN's front page.
A minor percentage1 of TLS-enabled websites require SNI. An unfortunate side effect of SNI is that it makes it easier for third parties to observe which websites users are accessing via TLS because it sends domainnames unencrypted in the first packet.
Forum commenters will thus argue because there are other, more difficult means for some third parties to observe these domainnames, e.g., through traffic analysis, that the unencrypted SNI is therefore not an issue worth addressing.
All-or-nothing. If the privacy achieved by some proactive measure is not "perfect" then to these commenters it is worthless.
But the HN front page reference suggested otherwise: It was an RFC describing how the IETF is taking a proactive measure, trying to "fix" SNI, encrypting it to prevent third parties from using it in ways detrimental to users.
There is an easier proactive measure. The popular browsers send SNI by default, even if the website does not require it. The default behaviour is to accomodate a minority of TLS-enabled websites at the expense of all users, including those who may not be using this minority of websites.
To make an analogy to fingerprinting, imagine sending 17 unique identifiers with every HTTP transaction when, say, only 5 are actually needed. The all-or-nothing perspective adopted by forum commenters would dictate that it makes no sense to reduce the number unless the number can be reduced to zero.
Amongst the security folks there is a concept sometimes called "defense in depth". Commenters in discussions about security often agree there is no such thing as "perfect" security and they cannot rely on a single, "silver bullet". They must use multiple tactics.
Is privacy somehow different? There are many tactics users can take that, cumulatively, can make things more difficult for the data collectors.
1 Survey of websites currently appearing on HN
Number of unique urls: 367
Number of http urls: 43
Number of https urls: 324
Number of https urls requiring SNI: 38
Number of https urls requiring correct SNI: 26
"Requiring correct SNI" means SNI must match Host header.
Summary
One can fetch 286 of the 324 https urls currently posted on HN with a HTTP client that does not send SNI.
An additional 12 can be retrieved by sending a decoy SNI name that does not match the Host header.
What is needed is a community written and funded privacy browser. Mozilla the billion dollar non profit and Firefox their so called privacy first brwoser is the biggest exfiltrator of user data. Their default 5000 user settings are all geared towards feeding them and Google real time user activity feed. They have moved from being Google data bi.ch to a major data collection thug themselves. As such fingerprinting becomes at best a tangential issue.
Sets up a straw man of "privacy defeatism", then knocks it down with an analysis of some French website's logs--as though any of this were a technological problem in the first place.
This is not a case of "defense in depth" or "something is better than nothing." This is the case of a surveillance capitalism asset performing to spec.
Privacy in the browser is/was f-ed, plain and simple. Patch one hole, make two more. That's how it's been for at least 20 years now. I see no reason for that to change until the browser is obsolesced by something even better at fleecing the peasants.
Another is security chicken and egg. "Because A isn't secure, there's no point in securing B. And because B isn't secure, there's no point in securing A." I've seen arguments along those lines used against improving the security of, for example, DNS and SNI. Or just DNS in general. "There's no point in securing DNS because your ISP can see what IP addresses you connect to."
The all-or-nothing argument has been used to argue against HTTPS. "State actors like China can circumvent HTTPS, since they control trusted CAs! So why use HTTPS at all?"
Then there's the arguments against security because "it makes development harder!" People have argued against mandatory HTTPS saying that it makes developing websites harder.
The list goes on.
Luckily it seems that pro-security and pro-privacy succeed in the end, for the most part. But those that use these pathological arguments have certainly delayed things more than they should have been.