You could fingerprint the user as well: - look at the statistical behavior of ho...

salawat · on April 18, 2021

That's a lot of extraneous data to be adding to a stream leaving the phone. (Or dumping to a locally stored db file.), but you're technically correct, though not infallibly so.

The level of noise is incredibly problematic. My leading cause of dropped phone, for instance is forgetting I have it in my shirt pocket, on my lap, off my desk, or from my back pocket if I don't put it in just right. Am I a different person in each of those circumstances? The statistical answer would be no, but only cones from widening the scope of collected data. Suppose I fiddle with it? Dance with it? Have a habit of leaving it in a car? Without a control, you have a different set of relative patterns. At best you know there is a user with X. Yes you can make some statistical assumptions, but at best, when it really counts, it still needs to line up with a hell of a lot of other circumstantial datapoints to hold water.

Furthermore, I guarantee not a single person would dare make any high impact assumption based on that metadata given that once it gets out, it's so adversarially exploitable it isn't even funny. Imagine a phone unlock you could do just by changing your gait. Or worse, a phone that locks the moment you get a cramp or blister. Madness. Getting different ads because you started walking like someone else for a bit. Do I become a different person because I try to read something without my glasses, or dwell on a passage to re-read it several times? Or blaze through a section because I already know where it is going? These are not slam dunk "fingerprints" by a long shot. More like corraborating data than anything else, and in that sense, even more dangerous, because people are not at all naturally inclined to look at these things with a sense of perspective. It can lead a group of non-data-savvy folks to thinking there is a much cleaner tighter case than there necessarily is, and on top of that, mandates that people be okay with the gathering of that data in the first place, which has only been acceptable up until now because there was no social imperative to disclose that collection.

Going off on a tangent here, so I'll close with the following.

There is the argument to be made that that exact kind of practice is why defensive software analysis should be taught as a matter of basic existence nowadays. If I find symbols that line up with libraries or namespaces that access those resources, why should I be running that software in the first place?

I can't overstate how over 90% of software I come across I won't even recommend anymore without digging into it anymore. There's just too much willingness to spread data around and repurpose it for revenue extraction. It does more harm than good. What people don't know can most certainly hurt them, and software is a goldmine for creating profitable information asymmetries.

dheera · on April 18, 2021

> My leading cause of dropped phone, for instance is forgetting I have it in my shirt pocket, on my lap, off my desk, or from my back pocket if I don't put it in just right. Am I a different person in each of those circumstances? The statistical answer would be no, but only cones from widening the scope of collected data. Suppose I fiddle with it? Dance with it? Have a habit of leaving it in a car? Without a control,

Oh, but all of these can be added to your statistical model and learned over time! If we figure out that you suddenly walk with a limp, and all the other metrics match, we can recommend painkillers! Or if the other metrics match and you start dancing, we start recommending dance instructors! Hell, we can even figure out how well you dance using the IMU and recommend classes of the appropriate skill level.

For a recommendation system, like ads, the consequences of mis-indentification wouldn't be that high either. You'd still target much better than random, which is the alternative in the absence of fingerprinting.