>Since there is no way to 100% fingerprint a device, therefore there is no way to uniquely identify anyone with 100% confidence using pure fingerprinting techniques.
Pedantic point, so forgive me, but 100% uniquely identifying a device does not imply 100% uniquely identifying the user of the device. We call them User-Agents for a reason. Anyone could be using it.
It's critical people not fall into the habit of conflating users and user-agents. Two completely different things, and increasingly, law enforcement has gotten more and more gung-ho about surreptitiously forgetting the difference.
Ad networks and device/User-Agent based surveillance only makes it worse.
There are several initiatives to implement UUID's for devices. There is the Android Advertising ID, systemD's machine-id file, Intel burns in a unique identifier into every CPU.
IPv6 (without address randomization) would also work as a poor man's UUID.
It's frighteningly easy, and you'll be surprised how unintentionally one can be implementing something seemingly innocent and end up furthering the purposes of those seeking to surveil.
- look at the statistical behavior of how they operate the mouse
- estimate their reading speed based on their scrolling
- for mobile devices, use the IMU to fingerprint their walking gait and angle at which they hold the phone (IMU needs no permissions)
- measure how the IMU responds at the exact moment a touch event occurs. this tells you a quite a bit about how they hold their phone
- if they ever accidentally drop their phone, use the IMU to detect that and measure the fall time, which tells you the distance from the ground to the height they held the phone. then assuming the phone is held normal to the eyes, you can use the angle they held the phone to extrapolate the location of the eyes and estimate the user's approximate height
That's a lot of extraneous data to be adding to a stream leaving the phone. (Or dumping to a locally stored db file.), but you're technically correct, though not infallibly so.
The level of noise is incredibly problematic. My leading cause of dropped phone, for instance is forgetting I have it in my shirt pocket, on my lap, off my desk, or from my back pocket if I don't put it in just right. Am I a different person in each of those circumstances? The statistical answer would be no, but only cones from widening the scope of collected data. Suppose I fiddle with it? Dance with it? Have a habit of leaving it in a car? Without a control, you have a different set of relative patterns. At best you know there is a user with X. Yes you can make some statistical assumptions, but at best, when it really counts, it still needs to line up with a hell of a lot of other circumstantial datapoints to hold water.
Furthermore, I guarantee not a single person would dare make any high impact assumption based on that metadata given that once it gets out, it's so adversarially exploitable it isn't even funny. Imagine a phone unlock you could do just by changing your gait. Or worse, a phone that locks the moment you get a cramp or blister. Madness. Getting different ads because you started walking like someone else for a bit. Do I become a different person because I try to read something without my glasses, or dwell on a passage to re-read it several times? Or blaze through a section because I already know where it is going? These are not slam dunk "fingerprints" by a long shot. More like corraborating data than anything else, and in that sense, even more dangerous, because people are not at all naturally inclined to look at these things with a sense of perspective. It can lead a group of non-data-savvy folks to thinking there is a much cleaner tighter case than there necessarily is, and on top of that, mandates that people be okay with the gathering of that data in the first place, which has only been acceptable up until now because there was no social imperative to disclose that collection.
Going off on a tangent here, so I'll close with the following.
There is the argument to be made that that exact kind of practice is why defensive software analysis should be taught as a matter of basic existence nowadays. If I find symbols that line up with libraries or namespaces that access those resources, why should I be running that software in the first place?
I can't overstate how over 90% of software I come across I won't even recommend anymore without digging into it anymore. There's just too much willingness to spread data around and repurpose it for revenue extraction. It does more harm than good. What people don't know can most certainly hurt them, and software is a goldmine for creating profitable information asymmetries.
> My leading cause of dropped phone, for instance is forgetting I have it in my shirt pocket, on my lap, off my desk, or from my back pocket if I don't put it in just right. Am I a different person in each of those circumstances? The statistical answer would be no, but only cones from widening the scope of collected data. Suppose I fiddle with it? Dance with it? Have a habit of leaving it in a car? Without a control,
Oh, but all of these can be added to your statistical model and learned over time! If we figure out that you suddenly walk with a limp, and all the other metrics match, we can recommend painkillers! Or if the other metrics match and you start dancing, we start recommending dance instructors! Hell, we can even figure out how well you dance using the IMU and recommend classes of the appropriate skill level.
For a recommendation system, like ads, the consequences of mis-indentification wouldn't be that high either. You'd still target much better than random, which is the alternative in the absence of fingerprinting.
Pedantic point, so forgive me, but 100% uniquely identifying a device does not imply 100% uniquely identifying the user of the device. We call them User-Agents for a reason. Anyone could be using it.
It's critical people not fall into the habit of conflating users and user-agents. Two completely different things, and increasingly, law enforcement has gotten more and more gung-ho about surreptitiously forgetting the difference.
Ad networks and device/User-Agent based surveillance only makes it worse.
There are several initiatives to implement UUID's for devices. There is the Android Advertising ID, systemD's machine-id file, Intel burns in a unique identifier into every CPU.
IPv6 (without address randomization) would also work as a poor man's UUID.
It's frighteningly easy, and you'll be surprised how unintentionally one can be implementing something seemingly innocent and end up furthering the purposes of those seeking to surveil.