In other news, there's a Google MediaPipe hand tracking example.[1][2] It's still documented as iOS/Android only, but there's now a hand_tracking directory under linux desktop examples![3] Results have been mixed.[4]
It’s worth paying attention to the deaf community’s thoughts on any tech that comes along, gloves etc., for translating sign language. Short version, ASL is more than hand gestures and past efforts at solving this tend to focus more on assisting the hearing than the deaf.
Thank you for the constructive reply and the link, it will certainly be useful. It is hard for me to properly respond, when this topic (sign language machine translation) comes up. You cannot encode sign language as text! You generally cannot understand what two people are talking about informally, if you don't know them. Oh well...
My high school had a fairly large deaf program. I took Latin with a majority of the students being deaf/hearing impaired. Best part was watching the translator relay my smart-ass jokes to the other students. I’d say something and a half minute later get people grinning at me. It was a fun experience and helped prepare me for performing in other countries.
I remember learning the bloods sign in middle school, but are gang signs thrown around often that they need identifying?
Maybe this can be a pitch to MLB teams to learn the opposing team's catcher's pitch signs? If it wasn't only hands, not sure if this could offer anything to analyzing third base coach signs as well.
This is one problem where getting the results in software is very impressive, but the problem becomes much simpler with just a modicum of extra hardware.
LeapMotion[1] devices accomplish this with nothing more than a pair of cameras in a matchbox, in real time. And this kind of hardware is already becoming standard on cell phones and laptops.
Still, the killer obstacle for the applications I was trying to make was not precision, but latency.
Amazing work on HAMR - I wish they had the latency numbers in the article too!
LeapMotion, the last I looked, was two stereo cameras and three IR emitters, plus their own special software algorithms. I don't think this makes the problem simpler if the problem you're trying to solve is a low-cost, extremely robust hand detection system that can be used with any device that has a camera.
Those three IR emitters aren't standard on any device or category that I'm aware of. (Apple's iPhone uses an IR pattern emitter.)
A software-only solution has the advantage of potentially improving dramatically with new ML models. LeapMotions's physical hardware - revolutionary when it launched – now seem like a disadvantage given the pace of human model detection (range of the IR sensors, locked into lower resolution cameras, etc.)
Solving it in software...assuming there's not a performance hit, is usually the better solution, in my opinion.
This isn't research about hands, this is research about recognizing 3d reality in a 2d image, filling in occluded details based on knowledge about complex objects.
Hands do make a great subject for the experiment, but are not the end goal. The results are impressive and a step forward.
[1] https://ai.googleblog.com/2019/08/on-device-real-time-hand-t... [2] https://github.com/google/mediapipe/blob/master/mediapipe/do... [3] https://github.com/google/mediapipe/tree/master/mediapipe/ex... [4] https://www.youtube.com/watch?v=ZwgjgT9hu6A (Aug 31)