Hacker News new | past | comments | ask | show | jobs | submit login
The Web Never Forgets: Persistent Tracking Mechanisms In The Wild (esat.kuleuven.be)
134 points by jcr on July 21, 2014 | hide | past | favorite | 47 comments



Tl;dr wow, just wow.

So we have three fingerprint mechanisms: One I had not even heard of or frankly suspected, but it explains a lot about why many designers who strive for pixel perfection are constantly frustrated. The original paper is fascinating (cseweb.ucsd.edu/~hovav/dist/canvas.pdf) but one simple takeaway - testing 300 different user systems (Mechanical Turk - quite clever) a simple sentence in Arial rendered 50 different ways! So by combining different renderings (like sentences, drawing a line on a canvas element) they find that user systems - hardware, drivers etc, give off different results unique enough to tag you.

I would be interested in their results on iPhones (in fact any one interested in a quick experiment here in HN?)

And cookies - the synching cookies where fucking trackers collaborate and share / copy each other's ids, and ever cookies where they store a cookie in JavaScript/cookie and anywhere else and keep respawning.

Little fuckers

I propose two solutions:

Cookies should be limited to no more than 128bits - enough to store one randomly generated UUID. Screw the backward compatibility.

And we should be able to only allow signed hashed JavaScript to run on our machines. For example I want to only allow say bootstrap and some overlay and a config or data file. No one gets to run unsigned js on my machine?

Have I just reinvented Microsoft code signing?

But boy I am in a bad mood


Does anyone else wish all these smarts people bake into per-browser plugins... could instead be part of a localhost proxy?

Then the browser-plugin component could be entirely optional, just a more-convenient way to command the proxy (via some HTTP call that it recognizes and intercepts) to block a given URL or pattern.


As someone who's used a local filtering proxy (Proxomitron) and multiple browsers for a long time, I completely agree - browsers are getting more and more complex and it's difficult to use a plugin to block things that would have been fetched and loaded already by it. The fact that it works for all browsers on the system, and applications that incidentally include non-configurable browsers, is also nice.

On the other hand page modification that involves DOM manipulation/JS interactions would definitely be better handled in a browser plugin, since a proxy is more of a streaming filter device.

SSL is a bit of a pain (especially certificate pinning) since this is essentially a "benevolent MITM attack" but there are workarounds for it.


How does that work with SSL?


Proxy does SSL checking, browser trusts proxy's CA certificate, requests are re-signed on the fly.


This is more difficult with mobile devices, and also doesn't it display warning dialogs?


The warning dialogs do not display if you add the authority certificate used by the proxy to your trust store, and the proxy properly re-signs everything.


How do you distinguish between eg EV SSL versus non-EV?


That may indeed be a problem. Could use an out-of-browser gui, or possibly configure your browser to trust your own CA as a CA capable of EV somehow, and stick your own EV id in the generated certificate. I dont think this is configurable in most common browsers at the moment, though.


HTTPSwitchBoard [1] seems to a decent job of blocking most (not all) of these methods. I currently have it configured to only allow scripts from the domain I'm visiting and quite a few of the methods mentioned rely on scripts/cookies hosted by third party websites. Granted, switchboard is far from foolproof but it's a step in the right direction.

[1] https://github.com/gorhill/httpswitchboard


I'm still waiting for a Firefox version, especially to avoid Adblocks's overhead. I currently have Privoxy installed as a replacement, but it's too cumbersome.


There's always the RequestPolicy plugin, though it does not appear quite as nuanced.


Random Agent Spoofer is another great tool for blocking browser fingerprinting https://addons.mozilla.org/en-US/firefox/addon/random-agent-... (git version is newer https://github.com/dillbyrne/random-agent-spoofer) Always in conjunction with the classic recipe ABP+NoScript.


We're in a state right now where if you are doing anything that suggests you're willing to spend money on a specific set of things, you have to do it in a private browsing window or put up with retargeting for weeks. Kind of sad.


I just clear my browser history constantly.


The canvas fingerprint is immune to this because your browser paints the fonts the same regardless of your history, and the fingerprint itself is not stored locally.


It's also pretty aspecific. Your user agent string is likely to be more specific.


Its kind of sad that when you actually want to spend money on specific things, you're met with puke upon puke of advertising, forums infested, reviews scammed, its a hellhole just to finda good router for example.

I guess its an effect of having much money or people willing to potentially spend money in one giant network - internet. You just cant make a review site or forum which is immune to scamming, that is when sellers start hiring people to puppet around their marketing message.


    > its a hellhole just to finda good router for example
Unless you think Amazon is compromised, it's pretty easy to search on there, and order by average customer rating. That's pretty much the only way I buy things now.


Amazon's product search must be a textbook example of dysfunctional. I always search elsewhere first, because unless I know exactly what I want (or am just browsing), it's hardly possible to narrow down based on product characteristics (there are a few filters, but far to few to really work well).


The prices on Amazon are very different from the country I actually order from.


Research like this makes me question why I use a cookie manager. Staying ahead of these intrusions takes more time than I have available (job, life, etc.) and a cookie manager is no longer the bare minimum.

In fact, I'm not sure what the bare minimum is anymore, nor do I know whether or not a cookie manager even makes a dent in privacy intrusions.


a VM still seems pretty good? what are they gonna do, benchmark it? (yes.)

it's also super quick to install and update. just snapshot before browsing anything and restore to there. you can install anything you like, before browsing anything (flash etc). there's no logical way to save any state if you throw away the hard drive snapshot.


While this is probably very close to foolproof, it is just insane that our best method to accomplish "DNT" is to... throw away all changes at an OS level :|

It is incredibly alarming to consider how much control over our own damned gear we've lost over the past 10-15 years.


This change is partly because of web developers' push for a "richer" and more full-featured -- that is, closer to native code parity -- web platform. There's a spectrum in terms of how conscious web developers and web standards developers may have been of the privacy risks (and many people in W3C committees have worked hard to remove fingerprinting risks from their specs before deployment), but there's been a steady tension between privacy and new functionality for web developers in the web platform.

Of course, things don't have to have been designed as tracking mechanisms in order to end up as tracking mechanisms. One of my favorite (?) examples is HSTS, which can be used as a cookie because you can tell a browser to load 00001.example.com, 00100.example.com, and 01000.example.com securely (while not mentioning 10000.example.com or 00010.example.com). Then if you tell a browser to load one-pixel images from the HTTP versions 00001, 00010, 00100, 01000, and 10000.example.com, you can see which subdomains it adds the HTTPS on and which it doesn't. (This risk is mentioned in section 14.9 of RFC 6797.)

An important historical example that stuck around for a long time was the CSS history bug (Javascript in a web page could query to see if a link's style had changed to match the style of a visited link).

An example that shows completely unintended tracking consequences were sneaking into web protocols a long time ago was Martin Pool's "meantime", described back in 2000. (He has a broken link that suggests that someone may have expressed concern about this back in 1995.)

http://sourcefrog.net/projects/meantime

And there are a lot of unintended browser fingerprinting effects from various UA properties that sites can ask about using Javascript:

https://panopticlick.eff.org/

It's apparently possible to break many of these tracking methods, as the Tor Browser systematically tries to do, but you have to give up a lot of local history and a bit of web platform functionality.

https://www.torproject.org/projects/torbrowser/design/#Imple...

Given what I've heard about web developer pushback against fixing the CSS history query bug, I'm scared to imagine the response to trying to mainstream some of the Tor Browser's fixes in popular browsers!


Could you do it using a docker container for your web browser instead?


You could run something like TAILS in a VM or docker, I guess.


Except the canvas fingerprint wouldn't change


In Firefox, install Self Destructing Cookies, BetterPrivacy and Ghostery. Ghostery will blacklist most of these tracker sites, the other two plugins should do their best at removing all traces of those that get through.


What can be done about the Canvas security vulnerability?


Browsing with NoScript, while inconvenient, offers safe-by-default browsing for cases like these. If I see a site I actually want to interact with, I'll fire up a different browser profile and look at it there. If it's a site I want to use regularly, I just add to NoScript's whitelist.


I've been using NoScript for several months now. Seemed crazy at first, but now it doesn't really feel all that inconvenient. If anything, I just tend to get upset with the serious proliferation of needless Javascript out there. I understand why you need Javascript to make a full-featured web-app like GMail or Facebook, but most of the time it's being used in place of CSS for simple layout placement or something.

I'm also somewhat shocked at how much JS seems to be directly included from a third-party domain. Again, I understand why you'd want a separate domain for some things (many larger sites tend to have some JS hosted on the main domain and some offloaded to a CDN domain, for example), but a shocking amount of sites are offloading basic design elements of their site to some Google Ajax server for whatever reason.


"... now it doesn't really feel all that inconvenient ..." because you white-listed so many domains?


Partially that - I don't visit many web sites regularly that require Javascript, but most of the ones I do visit regularly are more or less trustworthy. The biggest problem I have with the whole whitelisting thing is that so many people host code on Google's domains that I'd really prefer a GUI element for "Allow <domain> on this domain only", so I'm not executing any code hosted on Google's servers on every page I go to, just because I occasionally navigate to a Google site. I think there's a way to configure this in the back-end, but it's not really convenient to do so temporarily.

The other big part of it is that I'm much more used to seeing a site, noticing if something's missing, then making the decision about whether I really want to let it execute arbitrary code on my machine. I'm not all that confused when a new site I go to is subtly broken.


Adding RequestPolicy on as well increases the inconvenience factor somewhat, but also dramatically cuts down on the chances of issues.

The biggest headaches I've run into so far with it are sites that use multiple cloudfront addresses or that pull in required content from host or domain names that don't bear any resemblance to the original site. I can figure out if I'm on "chicagotribune.com" that "trbimg.com" and "trb.com" are probably related to it, but there are a lot of sites where there's no clear name relationship.


The sites highlighted in the article are generally blocked by plugins such as Ghostery, and so their attempts to track via Canvas won't even be run.

If you block the known trackers, it doesn't really matter what techniques they try to use on you.



It's a cat-and-mouse game, for the time being. Until there's a real solution, use the available tech:

https://prism-break.org/en/


The EFF has a good write-up about this, too.

https://www.eff.org/deeplinks/2014/07/white-house-website-in...

Big takeaway is to avoid AddThis.com, and that NoScript - along with other tools - is an effective defense.

It's also noteworthy how this is at odds with the White House's own policy on cookies.

The White House cookie policy notes that, “as of April 18, 2014, content or functionality from the following third parties may be present on some WhiteHouse.gov pages,” listing AddThis. While it does not identify which pages, we have yet to find one without AddThis, whether open or hidden.

On the same page that is loading the AddThis scripts, the White House third-party cookie policy makes a promise: “We do not knowingly use third-party tools that place a multi-session cookie prior to the user interacting with the tool.” There is no indication that the White House knew about this function before yesterday's report.

As usual, Bruce Schneier (and his commenters) has a useful thread on the topic:

https://www.schneier.com/blog/archives/2014/07/fingerprintin...


Does this technique work for iOS? There just aren't that many varients of iOS devices, and you can already tell them apart by other browser settings.

Safari on iOS will have the same hardware, renderer and fonts as every other user with the same iOS version and device type. So surely it can't track an individual user?


Why are there "subtle differences in the rendering of the same text" when rendering on a canvas? I get how there can be differences between different browsers and even different builds of the same browser, but what more plays in? How many bits for identification does this actually provide?


They use WebGL to render text. The OS, browser and video card can produce different pixel values.

See Pixel Perfect: Fingerprinting Canvas in HTML5 (http://cseweb.ucsd.edu/~hovav/papers/ms12.html).


Privacy Badger - https://www.eff.org/privacybadger - might be able to help. It's like Disconnect or Ghostery, but based on behavior, rather than a blacklist.


The "Evercookies" are one of the many advantages of turning "Click to Play" on for all browser extensions (but most notably flash and java).

Both Internet Explorer and Chrome have supported "Click to play" natively since forever. Only Firefox shamefully doesn't.

Unlike many other security measures it is pretty intuitive. Just click on the applets you wish to load or unblock them like a pop-up blocker from the URL bar.

In Chrome you can also whitelist entire domains like this:

[*.]youtube.com

However be careful not to go too whitelist crazy, as I think this article makes clear a lot of those "Share this" applets are tracking you and many otherwise innocent sites host them.


> Only Firefox shamefully doesn't.

Of course it does. It had "Ask to activate" option for every plug-in since some mid-20s release if not earlier.


As with most user-friendly browser innovations, it was Opera that did it first. I was using it as early as 2001 for its ability to easily turn off annoying Flash content unless specifically requested.

It wasn't quite as simple as click-to-play or controllable per site until a few versions later; it was an application-wide toggle to enable plugins or not, but always very easy to use just one click away on the right-click quick preference menu.


seems like this (and many other tracking mechanisms) can be stopped by using a blacklisting HOSTS file, like http://winhelp2002.mvps.org/hosts.htm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: