Some of these are probably fingerprinting but the Twitch one isn't (I worked on the video system at Twitch for a number of years).
"player-core-variant-....js" is the Javascript video player and it uses WebGL as a way to guess what video format the browser can handle. A lot of the times mobile android devices will say "I can play video X, Y, Z" but when sent a bitstream of Y it fails to decode it. WebGL allows you to get a good indication of the supported hardware features which you can use to guess what the actual underlying hardware decoder is.
This is sadly the state of a lot of "lower end" mobile SoCs. They will pretend to do anything but in reality it just... doesn't.
JavaScript legitimately needs to know about it's runtime environment. The problem with fingerprinting is not the act of examining the environment itself, but rather sending the results of that examination to the server as a form of identification. I would rather confront the second problem more directly with a "per tab Little Snitch" type solution that eliminates communication that is not in the user's interest, rather than eliminate fingerprinting, for precisely the reasons you give, slashink.
Unfortunately web apps can defeat that by bundling unwanted telemetry type stuff in with other API calls, directly or by batching. So, it would need to be a complex tool to deal with that or suffer limited applicability. Perhaps if it stayed under the radar it could be effective in many cases without instigating countermeasures by site owners.
Let's not make perfect the enemy of good. Counter-measures like user-interactive, request specific firewalls (like Little Snitch) can always be defeated by a motivated malfactor willing to commmit resources. That does not mean it isn't worth doing.
Consider that virtually all physical locks are trivial to pick by someone who knows how (see youtube.com/lockpickinglawyer), and yet we still use locks. Pickable locks improve security because they increase the cost to the attacker enough that it deters most attacks.
I write webapps for a living. If a browser plugin wanted to selectively allow XHR/fetch calls based on payload, there is very little I could do about it.
The implementation might be to have your plugin content script wrap the DOM XHR/fetch in a proxy. The proxy runs a predicate on the payload, and if true, lets it go through. The predicate would be something like "No PII", which would also imply that the traffic be unencrypted.
An app could remove the proxy. But it seems to me that most people wouldn't bother. It's also possible that there are other mechanisms, for example a special Plugin IO API that cannot be changed by content scripts.
I’d imagine most people in this thread do or have. Myself included. It’s a pretty massive industry :)
What you’re missing is that whatever you do to remove fingerprints does itself add a unique metric to fingerprint. This is also compounded by how easy, cheap and legal it is to add fingerprinting tech to ones site. Literally the only way to break fingerprinting is if the majority of the web browsing population ran systems that randomised fake responses. But as it stands at the moment, it’s possible to:
1. Identify when a plug-in it overloading a builtin function
2. Identify which users are consistently doing so because so few are and there’s methods of fingerprinting that exist outside of your JS VM.
I don’t have the link to hand but there’s a website that you can visit and it tells you how identifiable you are. I used to think it was possible to hide until I visited that site and then I realised just how many different metrics they collect and how a great many of them are literally impossible to block or rewrite without breaking the website entirely.
It goes into more detail than the EFF link where it breaks down your uniqueness per each metric (and how much entropy each metric adds) as well as giving you an overall uniqueness.
The cover your tracks implementation also breaks down uniqueness per category, per metric.
They are nice resources, but don't get too scared!
Frankly, both are exaggerating a little - e.g. including stuff like browser version numbers which only appear as unique as they do because the time-span they cover is long enough to overlap update cycles (AmIUnique even seems to have it cover the entire history by default??? That's just noise), yet not stable for more than a short period of time. AmIUnique includes the exact referer, which is likely not nearly as useful as that would make it seem.
Then there's stuff like "Upgrade Insecure Requests" and "Do not track", which is likely extremely highly correlated with browser version choice.
Both sites can't really tell you how reliable the identification is, only how unique you are at this moment. And that matters a lot, because if identification is unreliable (i.e. the same person in some metric has multiple distinct fingerprints) the end result is that for reliable overall identification a fingerprinter may need many times as many bits of entropy as a naive estimate might assume, especially if visits are occasionally sparse and thus changes to fingerprints may frequently come all at once.
Clearly over the very short term you are likely uniquely identifiable as a visitor. However, it's less clear how stable that is.
uMatrix. It does what you describe and I always use it.
But the solution isn't good per se.
It provides a high level of granularity and it could theoretically provide even more granularity. But its already an advanced tool that an average user will never use.
Not all JavaScript does, and the kind that does isn't neccessarily something I asked for. I would be plenty happy if GitHub and Stripe couldn't show their 3D globe animations until I request them, for the sake of privacy.
After realizing the unbelievable CPU hog that the Github globe is, I simply added an element hiding rule for that crap. Not sure if element hiding rules help preventing fingerprinting from such sources.
>I can imagine that having several typical configs and switching between them at random would help blend in.
You have to be careful with that too. An anti-anti-fingerprinting implementation can record the values and compare them across visits to see whether they stay the same. If they change every few months that's reasonable (eg. changing hardware), but if they change every day or every week there's most certainly spoofing involved.
Unless a major anti fingerprinting solution uses the same list of GPUs as you doing this puts you in a tiny bucket and provides massive entropy to trackers, possibly even enough to exactly identify you given many webGL calls.
You could seed your random number generator with a hash of the hostname, guaranteeing consistency across all the random values you return to the one host.
Don't bother. It's hard to do it correctly. If you look through the snippets (or the MDN docs[1]), the value is retrieved using the getParameter() function. You might be tempted to override the function by doing something like
gl.getParameter = () => "test"
but that's easily detectable. If you run
gl.getParameter.toString()
You get back
"() => "test""
whereas the original function you get back
"function getParameter() { [native code] }"
In general, don't try to fix fingerprinting via content scripts[2]. It's very much detectable. Your best bet is a browser that handles it natively.
You can easily hide it by hijacking Function.prototype.toString to see if `this == fake gl.getParemeter or this == fake toString`. Then the js code needs to find a real Function.prototype.toString by creating an iframe, but then you can detect that. Then I'm out of ideas on how to rescue the original toString
So the issue is that the fingerprinting code can detect the anti-fingerprinting code? Doesn't that mean the best solution is for everyone to override the same functions with the same dummy information?
Agreed: fingerprinting is using ways one browser or device consistently differs from others to derive a stable identifier.
Several others on this list are also not used for fingerprinting, and are instead detecting robots/spam/abuse. Unfortunately, there isn't any technical way for the public to verify that, because client-side all it looks like is collecting a bunch of high-entropy signals.
All the major browsers have said they intend to remove identifying bits to where fingerprinting is not possible, which will also make these other uses stop working.
That was always a short-term hack anyway. The limiting case is that spammers simply pay humans somewhere to continue whatever "abuse" was formerly automated, as currently happens with captchas.
That's a fair take. In a sense it's is a "fingerprinting" method although I personally think fingerprinting embodies using this data to track devices between contexts.
If you're interested why this data was exposed in the first place, the MDN docs has good info https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_debug... . "Generally, the graphics driver information should only be used in edge cases to optimize your WebGL content or to debug GPU problems."
Sadly that's the reality of some of these tools. The intent was good and in many cases they are a necessity to create a web experience that works on every device. On the flip side this allows people to use this data to fingerprint.
As per my sibling (cousin?) comment, pretty much any legitimate capability-determining data can be used for fingerprinting.
This can only be solved with legislation, IMO. There is no way for an industry to self-regulate something like this, the candy bowl too big and the candy too sweet.
Yeah, I think the world would be better off without ads altogether. Just allow people to search for what they need or want by themselves, that's enough.
>...and pay out of pocket for the use of a search engine?
That's the only way service providers will see a cent from me going forward. Ads present not only a privacy risk but they're increasingly becoming a security risk too. I will not allow them on any of the devices I own or that connect to my home WiFi.
I think that static ads served from the first-party CDN are security-wise no worse that the content itself.
Blocking scripts and requests by third-party ad networks makes complete sense from security perspective, though.
Affiliate links going directly to relevant item pages in a store are fine by me, too. They have to be relevant for anyone to click on them, they don't play video or make sound, etc. They do give some tracking opportunity, but without third-party cookies and third-party requests, it's hard to achieve anything resembling the privacy-invading precise tracking which current ad networks routinely do.
In any case, I much more prefer the absence of AWS and an honest donation button.
More like (the spirit of) GDPR, where data collection itself becomes a legal and financial liability to point where it's not worth it to collect and retain it for a typical entity.
So absolutely no data on the device obtained via webGL is used for marketing or other BI workloads? None of it is shared with 3rd parties (especially advertisers)? Its entirely used for the user experience and then discarded when no longer relevant?
FWIW while I'm playing hardball here I really appreciate your answer and expertise.
I don't work at Twitch anymore so I can't answer your question without guessing and I rather not.
There is always a likelihood that data gets used for reasons beyond the original purpose. In the best of world the hardware that runs on consumers devices would do the right thing which would allow the web to be a perfect sandbox. I think we're slowly getting there, in terms of video it's slowly getting to a point where H.264 support is "universally true" rather than a minefield although VP9 and AV1 is a bit of being back to square one.
I think the spirit of my original comment was not to say "I promise that X company isn't doing Y" more to explain why this code existed in the first place. A search engine doesn't need to know what WebGL capabilities a device has as it doesn't deal with rendering whereas a site that has to work with hardware decoders most likely does need to know.
I didn't interpret the comment as an acquisition to my original comment.
It's good to ask the hard questions and even though I'm not able to answer it in detail I still think that 'tmpz22' brought up a good point in that data can be used for both good and bad at the same time.
Considering companies have lied before when it comes to privacy & ad tracking (see Facebook's "promise" of not using 2FA phone numbers for advertising purposes) his concerns are totally reasonable.
I understand the point you are trying to make, with the question incorporating an accusation (i.e. that the person beat their wife in the past). That is different from asking pointed questions about potential actions (e.g. "have you ever beat your wife?").
So the problem here is a bit different. It's not that devices will say "I can play Format X" and then not play it. It's that devices say "I can play Format X at Resolution A, B, C". When you give the device resolution A and B it succeeds but at resolution C it fails to decode it.
In H.264 this would be the "Level" https://en.wikipedia.org/wiki/Advanced_Video_Coding#Levels . A device may say that it can decode Level 4.2 but in reality it can only do 4.1. That means it can play back 1080p30 but not 1080p60. The only way to know is to actually try and observe the failure (which often btw is a silent failure fro m the browsers point of view, meaning you need to rely on user reports).
Wouldn't it be just as easy to test videos in formats A, B, and C and see if the play? You could check that video.currentTime advances. If it lies about that you could draw to a canvas and check the values. That seems more robust than checking WebGL.
The issue here is the architectural difference between the hardware decoder and the GPU. What happens under the hood with MSE ( https://developer.mozilla.org/en-US/docs/Web/API/Media_Sourc...) is that you are responsible for handing off a buffer to the hardware decoder as a bitstream. Underneath, the GPU sets up a texture and sends the bitstream to the hardware decoder that's responsible for painting the decoded video into that texture.
What often ends up happening is that the GPU driver says "yes the hardware decoder can do this", it accepts the bitstream, sets up the texture for you which is bound against your canvas in HTML. Starts playing the video, moves the timeline playhead but the actual buffer is just an empty black texture. From the software's point of view, the pipeline is doing what it's supposed, due to the hardware decoder being a black box from the Javascript perspective it's impossible to know if it "actually" worked. Good decoders will throw errors or refuse to advance the PTS, bad decoders won't.
Knowing this, your second suggestion was to read back the canvas and detect video. That would work but the problem here is "what constitutes working video". We can detect if the video is just a black box but what if the video plays back but at 1 frame per second, or plays back with the wrong colors. It's impossible to know without knowing the exact source content, a luxury that a UGC platform like Twitch does not have.
For this reason just doing heuristics with WebGL is often the "best" path to detecting bad actors when it comes to decoders.
My point with the video to canvas is if you create samples of the various formats in various resolutions then you can check a video with known content (solid red on top, solid green on left, solid blue on right, solid yellow on bottom) and check if that video works. If it does then other videos of the same format/res should render? I've written conformance tests that do this.
At worst it seems like you'd need to do this once per format per user per device but only if that user hasn't already had the test for that video size/format. (save a cookie/indexed-db/local-storage that their device supports that format) so after that only new sizes and formats need to be checked.
Just an idea. No idea if what problems would crop up
That said, surely this "functional" information can also be valuable fingerprinting data, no? What's stopping an enterprising data science team from pulling it into their data lake, using to build ad models for logged-out users, maybe submitting it to 3rd-party machine learning vendors, etc. and generally making it undeletable?
Since there is no way to 100% fingerprint a device, therefore there is no way to uniquely identify anyone with 100% confidence using pure fingerprinting techniques.
My view is that fingerprinting is a set of tools which can be used for "good or evil" if that makes sense. If you are gathering meta-data to determine the capabilities of the device, then this is part of the wider framework of data points which can, in principle, be used for fingerprinting a user. This data can be imported into a completely different system by a sophisticated adversary, so it needs to be treated as a security vector, imho
>Since there is no way to 100% fingerprint a device, therefore there is no way to uniquely identify anyone with 100% confidence using pure fingerprinting techniques.
Pedantic point, so forgive me, but 100% uniquely identifying a device does not imply 100% uniquely identifying the user of the device. We call them User-Agents for a reason. Anyone could be using it.
It's critical people not fall into the habit of conflating users and user-agents. Two completely different things, and increasingly, law enforcement has gotten more and more gung-ho about surreptitiously forgetting the difference.
Ad networks and device/User-Agent based surveillance only makes it worse.
There are several initiatives to implement UUID's for devices. There is the Android Advertising ID, systemD's machine-id file, Intel burns in a unique identifier into every CPU.
IPv6 (without address randomization) would also work as a poor man's UUID.
It's frighteningly easy, and you'll be surprised how unintentionally one can be implementing something seemingly innocent and end up furthering the purposes of those seeking to surveil.
- look at the statistical behavior of how they operate the mouse
- estimate their reading speed based on their scrolling
- for mobile devices, use the IMU to fingerprint their walking gait and angle at which they hold the phone (IMU needs no permissions)
- measure how the IMU responds at the exact moment a touch event occurs. this tells you a quite a bit about how they hold their phone
- if they ever accidentally drop their phone, use the IMU to detect that and measure the fall time, which tells you the distance from the ground to the height they held the phone. then assuming the phone is held normal to the eyes, you can use the angle they held the phone to extrapolate the location of the eyes and estimate the user's approximate height
That's a lot of extraneous data to be adding to a stream leaving the phone. (Or dumping to a locally stored db file.), but you're technically correct, though not infallibly so.
The level of noise is incredibly problematic. My leading cause of dropped phone, for instance is forgetting I have it in my shirt pocket, on my lap, off my desk, or from my back pocket if I don't put it in just right. Am I a different person in each of those circumstances? The statistical answer would be no, but only cones from widening the scope of collected data. Suppose I fiddle with it? Dance with it? Have a habit of leaving it in a car? Without a control, you have a different set of relative patterns. At best you know there is a user with X. Yes you can make some statistical assumptions, but at best, when it really counts, it still needs to line up with a hell of a lot of other circumstantial datapoints to hold water.
Furthermore, I guarantee not a single person would dare make any high impact assumption based on that metadata given that once it gets out, it's so adversarially exploitable it isn't even funny. Imagine a phone unlock you could do just by changing your gait. Or worse, a phone that locks the moment you get a cramp or blister. Madness. Getting different ads because you started walking like someone else for a bit. Do I become a different person because I try to read something without my glasses, or dwell on a passage to re-read it several times? Or blaze through a section because I already know where it is going? These are not slam dunk "fingerprints" by a long shot. More like corraborating data than anything else, and in that sense, even more dangerous, because people are not at all naturally inclined to look at these things with a sense of perspective. It can lead a group of non-data-savvy folks to thinking there is a much cleaner tighter case than there necessarily is, and on top of that, mandates that people be okay with the gathering of that data in the first place, which has only been acceptable up until now because there was no social imperative to disclose that collection.
Going off on a tangent here, so I'll close with the following.
There is the argument to be made that that exact kind of practice is why defensive software analysis should be taught as a matter of basic existence nowadays. If I find symbols that line up with libraries or namespaces that access those resources, why should I be running that software in the first place?
I can't overstate how over 90% of software I come across I won't even recommend anymore without digging into it anymore. There's just too much willingness to spread data around and repurpose it for revenue extraction. It does more harm than good. What people don't know can most certainly hurt them, and software is a goldmine for creating profitable information asymmetries.
> My leading cause of dropped phone, for instance is forgetting I have it in my shirt pocket, on my lap, off my desk, or from my back pocket if I don't put it in just right. Am I a different person in each of those circumstances? The statistical answer would be no, but only cones from widening the scope of collected data. Suppose I fiddle with it? Dance with it? Have a habit of leaving it in a car? Without a control,
Oh, but all of these can be added to your statistical model and learned over time! If we figure out that you suddenly walk with a limp, and all the other metrics match, we can recommend painkillers! Or if the other metrics match and you start dancing, we start recommending dance instructors! Hell, we can even figure out how well you dance using the IMU and recommend classes of the appropriate skill level.
For a recommendation system, like ads, the consequences of mis-indentification wouldn't be that high either. You'd still target much better than random, which is the alternative in the absence of fingerprinting.
Fingerprinting works because devices are surprisingly easy to identify just by enumerating their capabilities. If you are collecting this data, you are likely fingerprinting (read: uniquely identifying) machines even if you aren't trying to.
The same is true of humans, by the way. Even something as innocuous as surname, gender, and county of residence could be enough.
It would only be fingerprinting if the "fingerprint" is persisted alongside some other information about you as a user, and subsequently used in attempts to identify other activity as belonging to said user. That is not at all what was implied by the approach described above (which would just be used at the time of initializing every video streaming session).
There is zero evidence any of these are being used for fingerprinting, which is defined as building up an entire set of capabilities to uniquely identify a user.
Essentially all of these seem to be attempting to identify the graphics card, and all of these could be related to feature detection for an embedded video player, for example. Feature detection is not fingerprinting.
Fingerprinting is pretty easy to confirm, because it collects a large combination of data points (fonts installed, canvas rendering quirks, etc.) and either reports or hashes them all together. (That doesn't mean it's easy to find the code that does the fingerprinting, but once you've found it it's quite obvious what it's doing.)
It is deeply irresponsible and misleading of the author to claim fingerprinting without looking for that type of combination. And slapping "No claims made about accuracy" at the top of the page isn't an excuse.
> [...] and all of these could be related to feature detection for an embedded video player, for example.
I agree the OP's analysis itself is a bit shallow, but I don't see how that can be related. If you embed a video and do not do anything with its output then WebGL is simply unrelated. In fact most of websites in question do not seem to use WebGL visibly. That fact combined with a direct check for WEBGL_debug_renderer_info and so on should flag a suspicion. Explicit alternative explanations would be needed to prove innocence.
Literally the current top comment for this article explains how WebGL values are used to more accurately determine which video stream is best to send to the video player, since other methods can return inaccurate or insufficient information. Because you want to make sure the video format has hardware acceleration for decoding, to not destroy a mobile user's battery.
So that's how they're related.
And let's not assume guilt and then require proving innocence? It works the other way around. It's the burden of the person making the accusation to form a strong case. Which is not done here at all.
Oh, I missed that comment---I thought the top comment was same at that time, but it has changed since then. That suffices as "explicit alternative explanations". Thank you for pointing it out.
I still believe that the presumption of innocence should not apply in this case, because fingerprinting is already close to guilty (if you don't agree to this premise there is no point for further discussion, so please refrain from arguing against this specific point). You are right that feature detection is not fingerprinting but they are virtually indistinguishable; given how widespread fingerprinting has been, it was enough to suspect so.
It appears that your null hypothesis embraces the benevolence of tech companies. Is this a reasonable assumption? How, after all, do they make their money?
You could take literally any line of code and make an unfounded claim. "It calculates a hash, and fingerprints use hashes!" "It stores a variable, and analytics uses variables!"
It's on the burden of the person making an accusation of bad behavior to actually demonstrate that. Otherwise it's no different from me declaring you're an evil hacker because you comment on Hacker News, guilty until proven innocent.
The null hypothesis determines the "unfounded claim". For example, judicially, the null hypothesis is, "You are innocent until proven guilty in a court of law." Similarly, commercially, the null hypothesis is, "If it's profitable and mostly legal, corporations will compete to do it better."
Fingerprinting is both profitable and legal. It is so profitable and so legal that today's most dominant corporations, entities representing trillions of dollars of value, are founded on its premise.
The "unfounded claim", therefore, is yours. Or do you have any evidence that you are not being surveilled?
Author doesn't actually know what this information is being used for. Sure there can be a discussion about whether this information should be recorded at all (say, to determine if your site doesn't work on certain hardware), but claiming it _is_ fingerprinting is baseless.
It's quite tough to know for certain if you're being fingerprinted.
I worked on finding ways around fingerprinting at a previous job. The problem is that sites go out of their way to hide fingerprinting like performing it in an arbitrary redirect and then never do it again or only after using the site for a bit, and prefer doing it before an important operation like making a payment.
Did you find any realistic way person can prevent being fingerprint? most idea i have seen, like a tor browsing, is focusing on changing fingerprint and not so much on making fingerprint non-unique. But it always are easy for to connect change fingerprint to former fingerprint, so what we are really need is blending in with same fingerprint as other persons.
I don't know how it can be fully prevented other than disabling javascript, but in my opinion firefox with "privacy.resistFingerprinting" is a good start, though you'll still stand out because few people are using it.
I've seen a script that performed two canvas fingerprints - a complex, and a simpler one. The latter being so simple always returned the same value regardless of the browser, so it was there to see if you have altered the canvas value. That's why changing your fingerprint might still leave you trackable.
> most idea i have seen, like a tor browsing, is focusing on changing fingerprint and not so much on making fingerprint non-unique.
Not sure if I exactly understand what you're trying to say here, but the Tor
Browser itself certainly focuses on making its users' fingerprints identical.
At least it's the only browser I know of that passes fingerprint tests
(Panopticlick and friends) with JavaScript enabled.
As I'm repeating myself too often regarding this: the statistical norm is to be trackable. Every browser that isn't will get flagged.
Privacy does only exist if you seem to look exactly like any other person visiting the website.
That is why user agent, web apis, asset loading order and behaviours, http stack behaviours, tcp fingerprints all have to look like they came from the Browser Engine you're trying to identify with.
If you don't, you're watched. It's as simple as that. Don't kid yourself into safety if you think that the Content-Security-Policy hack of adblockers work to prevent tracking and fingerprinting. Sure, you send less data, but less data means more obvious than the norm.
As there was not a single concept that could fix this (in regards to looking like another Browser) I started to build my own "web filtering browser" [1] that is able to emulate fingerprinting, and sticks them to specific domains and their originating uncached CDN requests so that if you have the same IP for the same website, they cannot correlate it to ptevious visits.
The most effective way to not get tracked is to not visit the website. And I think that there's a huge potential in offloading traffic to peers that have the same URL already in their caches. Peer-to-peer Browser Caches would fix so many issues on their own, I don't understand why nobody is doing that. To the end user it doesn't matter where the assets come from, as long as the website is rendered in the same way afterwards.
I still think there is an opportunity for a hosted browser vpn type service or a generic contained browser. It would look like every other person because every other person uses the same setup.
Except tracking services know about centralized VPN IP ranges that do not rotate. That's why everybody in algotrading switched to mobile apps and 3G/4G proxies that are installed on actual smartphones, and usually they have dozens of SIM cards laying around.
Er, wait, your comment sounds really interesting, but what is the threat model for algorithmic traders? Who are they trying to hide from and why?
Almost none of the web breaks simply because you're using a (very well known, very old) VPN IP. Purchases using a credit card are pretty much the only thing that is likely to get blocked. And you face a few more captchas sometimes.
Do algorithmic traders need to use something on the web that blocks VPN IPs more aggressively than this? They must, because juggling all those SIM cards sounds like a huge headache, and cell phone data has awful latency. I'm wondering what they're scraping that the average person doesn't use, and why they want to look like an average person if that's the case.
AFAIK mostly details, news, stories and stuff related to public knowledge about a company (which they factor in to the real HFT data) due to - as you already said - high latencies in mobile networks.
The issue that arises is mostly cloudflare-related, due to them having a huge influence on hosting, and the forced recaptchas when they detect anomalies in traffic behaviour makes a web scraping workflow real shitty.
From an algotrader's point of view it's a fix to make their web scrapers work again. I'm not sure how Chrome/ium headless could fix this (if it could). I'm a bit sceptical as it's just yet another cat and mouse game.
But as of late I've seen a huge scene start to develop around extension-building for headless Chrome specifically, so that they can run headless and still get the data as they want it to by integrating a content script that sends the data to another service.
Ah, thanks, now I get it: it doesn't bother them that fingerprinting lets them be tracked; it bothers them that fingerprinting (and VPN IPs) mean their scrapers get hit with captchas.
FWIW, it takes very little effort to completely conceal the fact that firefox is running headless under marionette/webdriver/geckodriver. Chrom(ium) takes more effort, but these guys have solved it (and built a business around it):
Of course neither of these address fingerprinting -- all your scrape requests will have the same or similar fingerprint, which will lead to captchas pretty quickly.
This might help (and might even be part of the reason for buying piles of cellphones):
A peer to peer consent algorithm always has to be trustless, meaning that you have to be able to verify things cryptographically so that there can be no 51% attack for each peer bucket. How that cryptography can be "safe" in the sense that nobody is able to fake it is a different story, because in the web lots of things can be statistically true or untrue, but stochastically the opposite (e.g. a website being down right now or unreachable or rendered differently for each country etc).
In my case I'm using existing TLS infrastructure so that each peer can use their own certificates to communicate with other peers directly.
So far, the only things which haven't reported my graphics card to me are the Tor browser (good), KHTML (good, but probably because it doesn't support WebGL), and Lynx (of course, no js). Firefox reported much less info than Chrome did.
Firefox already the same thing for html5 canvas [1], enabled I think via privacy.resistFingerprinting (in about:config). A similar feature would make sense for webGL. In the meantime, power users can disable webGL entirely via webgl.disabled (but without a user-friendly prompt to warn you when it's being blocked).
The problem with popups like that is too many, or ones where the user doesn't understand why it's blocked leads to fatigue where they just start approving everything
Agreed, but it would be nice to have an "advanced (paranoid) mode". E.g., I run little snitch and while it is super noisy, I don't mind the overhead, but I wouldn't force it on everyone.
Paradoxically, that backfires. If we only let a small subset of users disable WebGL... well, having WebGL disabled _becomes_ the identifying characteristic.
"Do you want to let the website use WebGL? It's either for rendering 3D graphics, such as 3D games, or an attempt to exploit bugs in WebGL to maliciously take full control over your machine, which may succeed if you consent"
Safari did this in the original implementation - it turns out to be absolutely useless. A person cannot know whether they should be clicking yes, even if they are well versed in the technology.
Until poorly-constructed websites and/or 3rd-party ad spyware added by the marketing dept starts requiring it or the site breaks. Just like Javascript, third-party cookies, canvas, localstorage...
Also Google runs the browser standards by way of their Chrome market share, and they would surely never implement something like this until they were confident that they didn't need WebGL for fingerprinting, at which point they'd just stop supporting WebGL entirely in favor of something that they invented.
It's not Embrace Extend Extinguish with Google; it's more like Embrace Replace Extinguish.
You'd be surprised how many websites use WebGL now. You'd probably end up with some sort of whitelist like the Google one for web audio, otherwise end users would complain incessantly.
Web Audio's autoplay block was implemented in a very poor way though, basically guaranteeing that existing content will be broken with the user having no way to unbreak it.
I've had webgl disabled for almost two years now because I don't find that webgl provides no benefit that I care about, yet lets websites slow down my laptop and drain my battery.
I disabled wasm a few months ago for the same reason.
I've had WebGL disabled forever, I don't use Twitch (as the other comment describes a valid use for) and really have never missed having it enabled for many years.
It's frequently included in "secure yourself" tuning guides: Firefox -> about:config -> webgl.disabled: true
It's been awhile, but IIRC the recommendation to disable it is based on security, not privacy. Other comments here explain it better than I could, not a programmer by trade.
The official registration page for vaccinations in Germany ( impfterminservice.de ) uses Akamai's Bot Manager, which in turn uses WebGL for fingerprinting. It does a lot more, computes timings of certain tasks and stuff, all in all, pretty impressive work, but it is doing fingerprinting.
The privacy policy of the website doesn't mention this technique, which effectively is a replacement for cookies, nor the use of any of Akamai's services. It even CNAMEs Akamai.
So basically Akamai's service is left out of this list for no reason at all.
I stumbled upon this while trying to automate the querying for available slots to get notified via email, and one of the POST calls was sending my graphics card model to the server.
Have you told the government how this abuse of tech is making it less accessible and also a subtle privacy invasion? It's a bit disturbing that what would've probably been a simple HTML form a decade or two ago, and would still work just fine today, instead requires this extra crap.
If all hardware and software is the same, you should get the same print. Which doesn't make the fingerprint globally unique, but unique enough to make you "one of these 10000 people" instead of "any of this billion people". Combine it with other data, and the combined set identifies you uniquely.
I would guess nobody uses webgl exclusively for fingerprinting, but rather, in combination with other things. Apple machines are notoriously homogeneous hardware wise. You can imagine there's a much wider spread of webgl fingerprints for android phones and non-apple desktops/laptops.
It might work for simple fingerprinting scripts, but not for anti-anti-fingerprinting scripts. See: https://palant.info/2020/12/10/how-anti-fingerprinting-exten.... Since you're already on firefox, you're probably better off enabling resistfingerprinting instead.
Getting a random value each time is trivially detectable. That's a better signal that what you're blocking if more people share your video card model than install the extension.
We need more of those “do you want to allow....” confirmations. They are trivial to accept in the rare cases when webgl is actually required (online game or something), but will introduce enough friction and “default off” behavior that sites should stop using them by default.
This will probably help with security as well, there has been a fair share of webgl exploits as well.
The issue with this and all similar permission dialogs is that they often go the other way, you're training the user to just hit "allow" and swat the dialog away so they can get at the task/site they wanted. It's one thing to give the user options and ask for everything, but you've got to consider whether they've got the knowledge/context behind it to make a decision and whether it's presented in a way where they won't swat it away.
Remember those things are non-blocking.. for users who do not care, it is just another bar at the top they can ignore and proceed with their life.
Also from a "user privacy" point of view, this is still OK. Right now, disabling webgl is so rare it makes the user stand out a lot, and can be used as a tracking signal. Also, websites do not expect this to be disabled, and can break.
But if a non-trivial fraction of the users (say 10%) start refusing WebGL, then websites have to keep working without it; and the fact that it is disabled can no longer be used as tracking indicator.
I think this can be solved via UX. When the browser wants WebGL, notification privs, etc:
1. Pend the request
2. Notify the user "This site wants access to the WebGL graphics API to display complex graphics. This is disabled by default to prevent browser fingerprinting. If the site fails to work properly, click the lock icon and allow it."
3. Deny access until the user explicitly enables it on a temporary or permanent basis
More granular browser API permissions would be fantastic, but the current interface for grant/deny in all the major browsers is annoying enough as is. The prompts need to be moved to a less in-your-face place, or at least have the option to be. I almost always find myself clicking "deny" because no, I don't want to give your recipe site permission to show notifications.
Somewhat related is the new "sign in with Google" prompt I've been seeing on lots of websites. I accidentally clicked it when visiting the Seattle Times website and have been working inundated with spam emails even after clicking unsubscribe.
Forgive a bit of ignorance on this, but I'm not 100% sure I know what browser fingerprinting actually is. I remember reading something by the DuckDuckGo founder mentioning that it could be a problem even if you use a VPN and incognito mode, but I had some trouble actually figuring out what that actually meant.
Browser fingerprinting allows you to identify uniquely a browser and thus the user. This usually means that VPN and incognito mode will not help you to "change you identity".
audiocontext fingerprinting is less identifying than you think. It pretty much boils down to "what's your browser complied with and what FPU implementation you're running"
Imagine gathering screen size, installed fonts, graphics card, processor model, extensions, plugins, operating system, ... and so on. Perhaps you could gather enough system properties to actually identify someone uniquely. It’s like a fingerprint.
Some of these are more interesting than others. OS is exposed through the user agent. I'm not sure if you can actually capture processor model an graphics card, and plugins were more interesting when people installed plugins like Flash and Java.
WebGL fingerprinting usually works by rendering something off-screen that exposes GPU differences, then captures the rendered image as the fingerprint.
This tracking method (WebGL) disregards VPNs, user agent changes and incognito mode.
Browser fingerprinting is the ability of the website you are accessing to differentiate between you and other people because of a "fingerprint": A hash string that uniquely identifies your current browser, and since its using WebGL, your computer, because of its configuration and graphical capabilities.
Using the way that your browser renders a page to uniquely identify you amongst other visitors. This can take the form of measuring how your computer generates random numbers, timing how long it takes to compute certain things, minute differences in how your your gpu renders some pixels via webgl etc. With enough signals you can pierce through the noise.
It's when a site is able to take a variety of Javascript-accessible state that in isolation is benign (such as your reported graphics driver, as the case here) but together form a unique identifier for a user given the high dimensionality involved. This allows identifying users without their consent and avoids some methods of anti-tracking.
My understanding is it's simply checking the level of support for different Web APIs. Since that is invariant with respect to the route between you and the website, a VPN wouldn't save you from this.
Basically, each browser has an unique set of features so each device can be uniquely identified and profiled across different domains, regardless if you block cookies, use incognito mode and other techniques.
Some of the web APIs expose information that in aggregate can give an almost unique fingerprint to that device. Using fingerprinting techniques you can then track what the device is doing.
basically no two systems are the same, and if the website can gather enough information about your system, it can identify you, even through a VPN or in incognito mode.
Popular information for fingerprinting include: your browser, the country you are in, your language(s), your graphic card (through webgl), the fonts installed on your computer, the size of your screen (or your browser window), ...
window.botguard on google.com could very well be a way to detect bots running WebDriver to impersonate humans. Since WebDriver is headless, there would be detectable differences in WebGL support. This would also explain why the code is "very obfuscated".
That's exactly what it is. BotGuard isn't designed for identifying individual users, that's what cookies are for. It's designed to detect programs that are pretending to be normal web browsers being used but which aren't (because they're scripts, or rendering engines being automated, etc).
WebGL is a very useful feature, but the top sites by traffic are not exactly where I'd look for good examples of WebGL use. It is a bit more of a niche feature, but for some use cases you can't really replace it at all.
One thing that continues to confuse me is the fact that all this remains speculative. Surely on HN, there are people who have worked in FAANGs. Doesn't anyone have first hand knowledge of how tracking is performed at various websites, and whether people are really using webGL to track? And if so, how that's stored and correlated across different sites?
I don't doubt this is a thing, but the place to look for anything concerning is information returned from these API calls going out over the wire. Very few places encrypt or obfuscate their network requests with anything but base64, URL encoding, or maybe puny code, I've seen a few others but nothing difficult to figure out.
Most bare metal features introduced into what used to be browsers, but are now virtual machines for running applications, are and will be used for malicious purposes. The idea that you should allow random strangers to run code on your machine, especially this low level, is even more crazy than opening every email attachment you get.
Except Firefox is getting irrelevant, with Edge Chromium overtaking it, and Apple naturally would rather developers to use iOS APIs exactly for the same purpose, while collecting some revenue in Apple hardware for development purposes.
FF should focus on a better UX. Try FF and Edge on Windows 10, Edge seems much snappier because it boosts the mouse scroll speed. A friend wanted to switch because of that, and increasing the overall mouse scroll speed on W10 did the trick for FF. It's little tricks like those that make non techy people switch.
How would you define bare-metal? WebGL is proxied via ANGLE, so there's no risk of crashing or other forms of leaky processes, which is typically the primary thing I think of when I worry about low-level APIs.
ANGLE is not a virtual machine but a translation layer from OpenGL ES (on which WebGL is based) to Direct3D. Inappropriate translation would result in a direct exploit unlike VM, since AFAIK there is no practical means to sandbox shaders. In some sense it is akin to JIT in a kernel which has the roughly same security ramification.
>Inappropriate translation would result in a direct exploit unlike VM, since AFAIK there is no practical means to sandbox shaders.
What does it mean to sandbox shaders in this context ? GL ES shaders are sent down in a high level language and can't really do much besides do computation on input parameters to generate output in a pipeline. I wish they were more general purpose and worthy of sandboxing but the web GL shaders are really limited.
First link demonstrates reading from uncleared buffered to capture screen contents - and is from Windows XP era.
Second one is about running a bitcoin miner on the GPU, hypothetical memory breaches, and GPU DoS. You can stall the graphics pipeline without shaders, run a miner on CPU, and I'm going to need a demonstration of a usefull GPU memory breach with WebGL. On top of that any such breach is likely driver dependent and will only work on some specific driver/HW combination - that's such a low attack surface I doubt anyone will bother developing exploits targeting that.
I didn't bother going through third because I wasted enough time on first and second, I don't see how anything here suggest you need to sandbox shaders.
Modern GPUs have process isolation just like CPUs, and if that is not enough for you, GPU APIs allow an application (in this case the browser) to request bounds-checked execution for extra safety.
ANGLE is used to to a source to source translation irrespective of backend - it normalizes the source input to various GPUs (dropping things that have historically been funky like comment parsing, #controls, dodgy constructs, disallowed constructs, etc)
[Source: I worked on the original spec and spent a lot of time trying to prevent the more egregious security and privacy problems]
... but Google Chrome is not running on game consoles anyways ? Blink likely can, but then it does not necessarily use ANGLE, e.g. for instance QtWebEngine definitely does not use ANGLE on Mac / Linux but does native GL calls instead.
I don't understand your point. Yes, it's using WebGL. Do you think that it's going to do WebGL -> OpenGL ES 2/3 calls -> ANGLE -> Direct3D -> PS4 graphics driver (like it does on e.g. chrome on windows) ? No, it's likely doing WebGL -> PS4 proprietary graphics API -> PS4 graphics driver, not going through ANGLE or whatnot
but why would it go through ANGLE's libGLESv2.so and add a backend to that for the PS4 when the PS4 team could most likely provide their own libGLESv2 which maps directly to their driver (since they were doing that already for the PS3) ? As this is what chrome actually calls, gles functions.
Unless I missed something, when were they doing that with PS3?
The only effort to do any GL like stuff was on the PS2, with GL ES 1.0 + Cg, an effort that was dropped when almost no dev cared to use it.
Other than that there was the PSGL library for PS2 Linux.
Anyway, I did a bit more research and Chrome was actually only used for development purposes, they created their own WebGL engine on top of PhyreEngine for the console OS.
Regardless of how you define bare-metal, OP has a point here. Besides from bespoke web applications (which should be fully packages and signed, btw.), execution of turing-complete code inside the browser is pointless. At best, it helps to workaround certain limitations (bandwidth, browser quirks), or proves the UX of a page. But mostly it is just getting in the way or outright used against the user.
How is this possible at all? Don’t allow readout of canvas (or similar) at all without asking me if it’s ok. Problem solved. I don’t care if it becomes annoying on the off website that does have a valid use case for it. This just shouldn’t happen.
Sadly this sort of readback is necessary for various features and scenarios, so disabling it across the board is a big hassle. Brave has an approach to this where they scramble the output some to undermine its usefulness for fingerprinting, at least.
Again - I don’t care if it’s a hassle. It should be a hassle. If I’m on a web based image editor I accept the readout, if I’m on YouTube I definitely don’t.
There are some valid reasons for using fingerprinting.
In the UK, gambling companies use fingerprinting to enforce exclusion lists (i.e. gambling addicts who self-exclude from gambling websites), and stop people from outside the UK gambling illegally (the latter is used generally, although desktop apps are generally more secure).
I believe it is also used in online retail to combat bots (self-evidently, not everyone is doing this).
I think it is a concern when websites use this kind of thing indiscriminately. I have worked in web development roles, and I have had devs tell me point blank (once, a lead dev) that fingerprinting is impossible whilst using products that expose their customers to fingerprinting...so it is important to be aware of it. But there are legitimate uses to fingerprinting specifically, exclusive of the APIs that some apps rely on certain features (i.e. WebGL). It is sometimes important to know who someone is.
Another compelling reason to have JS disabled by default. If you need to browse those sites listed on the page, consider having a dedicated laptop that would have a specific fingerprint separate from the machine you would do most of your daily browsing on, and compartment away those sites that are known to track you across the web.
I actually rarely have to enable JS on pages now. Most JS-only sites can be find in places like https://news.ycombinator.com/show where people showcase JS-only sites / apps. Most are innocuous enough and are not trying to track you though, so I sometimes turn on JS just to test them out. I have a dedicated laptop for Youtube, Reddit etc. I'm well aware that sites like Reddit track you via JS, so I compartment / silo sites like that with a dedicated machine.
The sites that are looking for the GPU details may have been trying to detect which CPU an iOS device has - until Apple obscured the information it used to be pretty reliable way of discovering that
What about the situations with banks or financial related field? I think a lot of people would prefer the site collect information to check for any irregularities on who is making accounts or logging in so they can be warned if anything stands out. Fraudulent accounts pass the cost to every other customer.
Maybe I haven't thought of every situation but don't you want to be able to prevent odd behavior to an account?
i think they are because I used to google "enjo kosai love hotels" in 2015 and it used to show me several establishments but now they block the results saying it is offensive.
since when the hell did Google an American corporation become the arbitrator of morality and truths?
This fingerprinting is closely tied to this. The sad part is that the web scraping technology that I've been invited access to easily bypasses this.
"player-core-variant-....js" is the Javascript video player and it uses WebGL as a way to guess what video format the browser can handle. A lot of the times mobile android devices will say "I can play video X, Y, Z" but when sent a bitstream of Y it fails to decode it. WebGL allows you to get a good indication of the supported hardware features which you can use to guess what the actual underlying hardware decoder is.
This is sadly the state of a lot of "lower end" mobile SoCs. They will pretend to do anything but in reality it just... doesn't.