Hacker News new | past | comments | ask | show | jobs | submit login
Privacy Implications of Accelerometer Data [pdf] (acm.org)
127 points by gtirloni on May 6, 2021 | hide | past | favorite | 43 comments



GPL-licensed Gadgetbridge for Android [1] allows syncing various fitness trackers to your phone without internet connection

[1]: https://codeberg.org/Freeyourgadget/Gadgetbridge/


User endorsement:

I use this with a lenovo WatchX and it works well. It is pretty basic so you can join the party and hack if you feel it. WatchX is very cheap. ~USD $30 on aliexpress.

I would not ever go near the lenovo watch android app which I consider to be on par with radioactive toxic waste.

Using gadgetbridge & Watch X is superior to the Apple Watch & ecosystem at about 1/10th the cost. Recommended.


I'm finding the S2 instead of the Watch X, is that what you meant? Or if you mean watch X, do you have a link?



I use this with a Mi Band 4! The initial steps to get the Xiaomi DRM Pairing key to actually connect it are a bit of a pain (which isn't the developers' fault), but once you've done that once it's a seamless experience. Huge props to the developers!


I used this with miband, but then xiaomi went full-google and requires their watches to connect to their network before the first use.

I wonder if I can buy a second hand miband and get around that.


You can still get the key from huami servers and say good by to xiaomi scientists checking your data.


Can you do that without installing their app which will have access to your contacts and location upon first access?


Well, you need to install the app at least once, no need to give it full permissions. You might also use a burner device. Once your device is paired, you can remove the app and go to fetch the key.

Search for huami-token on github or for Huafetcher (android version of the same) on Codeberg.

And read our wiki

Disclaimer: I did the Kivy wrapping of huami-token for Android.


If only Wyze Watch were supported...


I want to share a small project, closely related to the paper. You can try it out if you have free 5-10 minutes and an Android device: https://edge-ml.dmz.teco.edu/

You login; create a so-called workspace; add some labels like walking, jumping, tapping on the device etc.. (simple actions for testing purposes); collect data for your labels with your mobile device and create a machine learning model with a single click. Then scan the QR-code of the model and your actions are predicted by the model in real-time. It's really interesting how accurate the predictions are even with a small sample size.


This is cool.

Possibly-pathological usage report: training with default parameters against three samples each of tapping the left and right edges of my device ~rapidly (2.5Hz?) doesn't produce a model that can accurately infer which side of my device I'm tapping.

I wouldn't be surprised to learn that adjusting the model parameters or adding more samples would likely remedy this. (By all means go dig out the samples and model created under "i336_" if you're interested.)

The UI is also notable as well. I initially visited on desktop because that's what I was already using, then realized the site is actually designed to be used in a desktop/device split scenario. Nice.

Some pedantic UI nits, in case you want them :) -

- The backend very occasionally 503s, which creates a toast/popup full of HTML in the UI. Only noticed on desktop, but can probably also occur on device too.

- The current approach of tap-label-then-begin-typing-name-to-rename is cool but doesn't catch the backspace key, and I also had to guess that I could type. Create-textbox-on-mousedown would be more intuitive ("ah yes clicking this does let me rename it") and also eliminates the gotta-catch-all-the-keycodes problem.

- I can't either preset the countdown and duration, or have it be persistently saved on mobile. (I wanted to set countdown to ~1-2 seconds.)

- If you submit an empty countdown in the mobile UI the site gets very confused :) a full page reload was needed

- Once I'd hit Train I was like, "...okay, now what? Oh, the header parts ('Workspaces / blah / blah') are clickable, I click in there to go back. Right then"). Back and/or obvious navigation buttons everywhere would probably be helpful for orientation.

- After leaving the tab for some time then coming back to it I was met with a bouncing "Unauthorized" at the top of the page. The requests for .../samples, .../models and .../workspaces are all 401ing.

That's actually everything. The UI is overall refreshingly straightforward/to-the-point (ie, not full of unnecessary layers of indirection!), and fast.

For example, I didn't realize I needed to create a layer to begin with, then when I created it on desktop it immediately showed up on my phone. Nice. (Next step, long-polling :P)


I was hesitant to share because it's quite new and still in development but I'm glad someone from outside actually tried it! It has some issues as you have pointed out and I really appreciate the detailed report, I've noted them.

There are no currently no explanation of the ML parameters since I still use the website mostly for manual testing. Sadly, some of them are crucial for model accuracy and they are not easy to manually configure for the user, so the plan is to integrate meta-learning that finds the best possible parameters on its own. I will take a look into the model now :)

Edit: After a quick inspection I see that the problem is actually sample size. Your samples are about 3 seconds long and for a consistent model I need at least 30 seconds of sample for each label.

Maybe some background: You can simply set the recording duration to 30 seconds and keep tapping for 30 seconds, you don't need 10 samples each 3 seconds long. The server splits the recordings into 0.3 seconds long windows for training and the model tries to interpolate the function between these windows and their labels.

In your case, only 13 windows could be extracted from your recordings so the model has only 13 data points for learning the mapping and tries to guess the function based on them. The beginning and the end of a recording is also cut because the sensors are not very reliable during these periods, so a 3 seconds long recording becomes about 2 seconds long after that. This is why tiny recordings are generally not very useful for the model training (and I should definitely change the default recording duration which is 3 seconds!). From a 30 seconds recording, we can extract ~100 data points and that is usually enough for a somewhat reliable model.

But of course none of these are obvious (not even close..) and I should do a much better job explaining the process to the user. I learnt a lot from your experience, so thanks again :)


Adding this several days later, so I realize you may never see this, but if (when? :D) you do happen to notice, feel free to email me at the address in my profile (click my username), regardless of how far in the future. :)


Really glad I decided to have a look through my comment history. Not sure when you added the edit, just noticed it now.

I forgot to ask - what are you actually using this for? What's the actual use case? I'm guessing some kind of specific research situation, but I'm curious exactly what.

Thanks for the explanation. So much more goes into this kind of thing than intuition would suggest...

Thanks for having a look at the data too. I recorded some new 35 second samples (at 100Hz, since why not) into a new workspace - but then decided to make things interesting by recording both fast and slow taps :), I'm not sure if this was a bad idea lol

(By all means poke at these too)

If I'm honest (and I think that would probably be most useful here) it's difficult for me to say the model is reasonably more correct now. It still

- oscillates between tap side (and speed)

- doesn't correctly pick out which side a single tap was made on, or makes the correct prediction for 8.4 milliseconds and then rapidly follows it by all other possible predictions :)

- continues to flip through predictions after I stop tapping (possibly because the prediction system examines/averages the entire 5-second classification window?)

- placing my device on a flat surface apparently means "fast right", "left"

- holding my device mostly (say 95%) steady (ie, as little deliberate movement as possible) for several seconds apparently means "right", "left", "fast right", "right", "fast left", "right", "left", "right"... etc

There seem to be occasional spots of consistency, but I can't deny that's probably my brain wanting it to work :P

I'm still learning about sensor analysis and ML, and the above is what my gut instinct says might potentially be most useful to report.

While playing with the system again I had a thought - if you displayed toggle buttons (that showed as buttons but functioned as radio selectors) for all sample labels during classification, and allowed the user to select which "actual" input they were providing in real time, then recorded the raw sensor data against the user label selection during classification, you could store this recording then repeatedly replay that training data into the model during development and eg show prediction vs user-specified actual input.

Given that model training only seems to take a few seconds (and can perhaps be run on higher-end equipment during development to make the process even faster), such a high-iterative workflow might viably be rerun on every source file save. (On Linux I personally use `inotifywait` for this; for a wall of text on how much I like inotifywait, see also https://news.ycombinator.com/item?id=26555366.)

--

On a completely unrelated note, I thought I'd ask: for a while I've wanted to measure my realtime speed on trips on the (...underground...) fastish metro/subway system that was installed a little while ago in NSW Australia. All conventional wisdom I can find universally suggests that using accelerometers is utterly useless for this type of task, but GPS generally needs to see the sky :). I was curious if there are any hacks/workarounds/cute tricks I might be able to use that might not be widely known.


It's strange, I look at my Google maps histories, and it showed when I traveled by car vs by motorcycle. In a state like California, where lane splitting is legal, I figured they might use [going faster than surrounding stopped traffic] to determine, but since I've moved, it continues to indicate correctly. Only way I assume they're doing this is via accelerometer.


Yes. Activity detection is performed by occasionally sampling the the accelerometer in android phones. IIRC, it is turned on for a few seconds every 60 seconds or so, and the samples are sent through a rather simple ML model that produces: {activity as one of: {biking, walking, running, driving, motorcycling, unknown, <a few i do not remember>}, confidence: float [0..1)} . Source: worked on this long ago at Google.


How does one turn this off?


On Android, if you go into Permissions, your Google Maps app will probably have 'Physical Activity' as an allowed permission. You can change it to 'Deny'. In fact you can do the same for 'Google Maps' and 'Google'.

Weather that permission denying actually blocks accelerometer access to the app or simply stops it being associated with your profile is beyond me.

At least on Pixel I cannot deny 'Google Play Services' from access Physical Activity so it seems like a feel good permission and they still grab whatever data they want.

Perhaps someone working in Maps can answer that?


The actual processing happens on a microcontroller, not on the main application core. In some cases it is a small hexagon core on the main CPU, in other cases an actually separate Cortex-M3 (or the like). Doing this sort of thing on the main CPU would cause too much power draw due to frequent wakes.

The microcontroller OS code lives here: https://android.googlesource.com/device/google/contexthub/+/... It supports loading binary "micro-apps" (with optional encryption and/or signing) and provides them an API to use to gain sensor data access. Sensor drivers are also loaded as micro-apps. As is the "communicate with the main android CPU" code. It is also a micro-app. The OS itself is little more than an event loop, some power management, timer provider, and the loader of micro-apps in a particular format.

Activity detection is one of such micro-apps and is, as far as I know, not itself open source.

Source: I was the TL for this project back in the day


Wow I never noticed that. I wonder how they do that. iOS activity recognition only offers "automotive", and Android has "IN_VEHICLE" and "TILTING". You could classify motorcycling pretty easily offline with sensor data but I'm not really familiar with how motion data is collected, and how often.


The forces are balanced when turning on a motorcycle, just like turning in an airplane. The phone’s gyroscope could notice the roll, but the accelerometer will just read slightly higher g forces in the “down” direction.

That said, you can differentiate between car vs motorcycle. Cars don’t tilt so turning causes lateral acceleration.


From this point on I shall lean into my turns in my car


> iOS activity recognition only offers "automotive"

I assume you mean in terms of motorised movement? Because it definitely tracks cycling, running and walking separately.


Right, of the relevant activities, I meant. Android also has others like walking, bus, etc.


As in, the bike ride is shakier?


If hacker news had tags, the "anxiety" tag would be increasingly active these days (prions anyone?), but I feel like this right here should worry people more than it does given how many devices most people carry around at all times with an over-confidence in life because they use a password manager and SMS for 2FA.


Doesn’t mention the most prominent one: coordinated activity measured on two wrist accelerometer devices in proximity is a decent-confidence indicator of sexual activity.


Or simultaneous zero. Both took watches off.


I took my watch off some 30 years ago. I wonder if my phone accelerometer is sending data to some company. I'm pretty careful about which apps I install and I logout from everything (not Google Play of course) and yet I don't trust every single app on my phone (banks, there is little I can do) or Google.


What about basketball or run together? Unless you can get a measurement of extreme proximity


"Grand-inquisitor I am not-guilty of the capital crime of adultery defiling the will of god. My explanation is that I met Ms Nin at the motel in question at lunch time to play basketball and go for a run."

We hope our world will continue to change incrementally rather than sharply and suddenly. History tells us that all periods of incremental change come to an end.

We continue to be setting up the tech for the future so there exists a turn-key surveillance system the stasi could only dream of. It doesn't matter so much just now, we think. Just like having no insurance usually doesn't matter when you make that decision.

It seems really risky to me.


when this sort of data can distinguish between walking normally and walking drunk, identify the sex of the wearer, whether they're carrying heavy objects, etc, yes, it can distinguish between playing basketball, running, and sex.

this data can be 9D (accelerometer/gyroscope/compass), it's much finer-grained than "moving a bunch or not", and it wouldn't even take those kinds of movement analyses to differentiate between "horizontal, active, and highly correlated with person B (estimate: sex)" and "vertical, active, highly correlated with person B (estimate: dancing)"


They could also be wrestling, e.g., brothers and sisters.

So as with all sensing data: context is critical to interpreting the data.


or sparring for boxing or martial arts, or track and field heats, rowing, ...


The accelerometer patterns and activity duration are very different for basketball vs sex.


Do you usually run in sync with others?


The site doesn't work without javascript and/or cookies. Irony for a paper about privacy.


What do you mean? At least the current URL [0] returns a response of content-type application/pdf. So the builtin pdf viewer of your browser should display it, no? I know sometimes admins change URLs tho.

[0]: https://dl.acm.org/doi/pdf/10.1145/3309074.3309076



Huh interesting, I can confirm. If you add the JSESSIONID cookie manually via -b param to the curl request it works tho. No JS needed, pure curl.


Even so, I'm not interested in providing or generating something random. If the site doesn't want to work with someone who values privacy then I don't want to work with that site. :)


Yeah, it only works if you have cookies enabled(works with js disabled). You could use containers or CookieAutoDelete if you use firefox to auto delete cookies later. Or just configure the browser to be on incognito mode always with only third-party cookies disabled.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: