HAMR – 3D Hand Shape and Pose Estimation from a Single RGB Image

mncharity · on Oct 17, 2019

In other news, there's a Google MediaPipe hand tracking example.[1][2] It's still documented as iOS/Android only, but there's now a hand_tracking directory under linux desktop examples![3] Results have been mixed.[4]

[1] https://ai.googleblog.com/2019/08/on-device-real-time-hand-t... [2] https://github.com/google/mediapipe/blob/master/mediapipe/do... [3] https://github.com/google/mediapipe/tree/master/mediapipe/ex... [4] https://www.youtube.com/watch?v=ZwgjgT9hu6A (Aug 31)

alok99 · on Oct 16, 2019

I didn't see any mention of how long it takes to recover a mesh from an image. I imagine it's a significant amount of time, not including training.

What I wonder is, if this technology was fast enough, could it be used to caption sign language?

The full publication: https://arxiv.org/abs/1902.09305

thedaemon · on Oct 16, 2019

This looks amazing and really useful for real-time sign language translation!

amayne · on Oct 17, 2019

It’s worth paying attention to the deaf community’s thoughts on any tech that comes along, gloves etc., for translating sign language. Short version, ASL is more than hand gestures and past efforts at solving this tend to focus more on assisting the hearing than the deaf.

This is a really good write up on the issue of signing gloves and tech: https://www.theatlantic.com/technology/archive/2017/11/why-s...

112233 · on Oct 17, 2019

Thank you for the constructive reply and the link, it will certainly be useful. It is hard for me to properly respond, when this topic (sign language machine translation) comes up. You cannot encode sign language as text! You generally cannot understand what two people are talking about informally, if you don't know them. Oh well...

amayne · on Oct 17, 2019

My high school had a fairly large deaf program. I took Latin with a majority of the students being deaf/hearing impaired. Best part was watching the translator relay my smart-ass jokes to the other students. I’d say something and a half minute later get people grinning at me. It was a fun experience and helped prepare me for performing in other countries.

thedaemon · on Oct 17, 2019

My point was that is will assist the hearing, not the deaf. So yes, this seems like it's not useful at all to the deaf community.

blorenz · on Oct 16, 2019

Real-time identification of gang affiliations doesn't seem to be too far of a leap either.

leovander · on Oct 17, 2019

I remember learning the bloods sign in middle school, but are gang signs thrown around often that they need identifying?

Maybe this can be a pitch to MLB teams to learn the opposing team's catcher's pitch signs? If it wasn't only hands, not sure if this could offer anything to analyzing third base coach signs as well.

mncharity · on Oct 17, 2019

Other work: https://github.com/xinghaochen/awesome-hand-pose-estimation

etaioinshrdlu · on Oct 17, 2019

Maybe the soli radar is just unnecessary, if you have a low light camera or just illuminate with IR.

romwell · on Oct 16, 2019

This is one problem where getting the results in software is very impressive, but the problem becomes much simpler with just a modicum of extra hardware.

LeapMotion[1] devices accomplish this with nothing more than a pair of cameras in a matchbox, in real time. And this kind of hardware is already becoming standard on cell phones and laptops.

Still, the killer obstacle for the applications I was trying to make was not precision, but latency.

Amazing work on HAMR - I wish they had the latency numbers in the article too!

[1]https://www.leapmotion.com/

amayne · on Oct 16, 2019

LeapMotion, the last I looked, was two stereo cameras and three IR emitters, plus their own special software algorithms. I don't think this makes the problem simpler if the problem you're trying to solve is a low-cost, extremely robust hand detection system that can be used with any device that has a camera.

Those three IR emitters aren't standard on any device or category that I'm aware of. (Apple's iPhone uses an IR pattern emitter.)

A software-only solution has the advantage of potentially improving dramatically with new ML models. LeapMotions's physical hardware - revolutionary when it launched – now seem like a disadvantage given the pace of human model detection (range of the IR sensors, locked into lower resolution cameras, etc.)

Solving it in software...assuming there's not a performance hit, is usually the better solution, in my opinion.

new_realist · on Oct 16, 2019

Last time I tried LeapMotion it was terrible. Has it improved?

Tade0 · on Oct 16, 2019

Interesting. I didn't notice any significant latency when I was using it.

Or were you trying to make musical instruments?

romwell · on Oct 16, 2019

>Or were you trying to make musical instruments?

Bingo :) The latency was pretty low, and good for UI, but you'd want <10ms total latency for musical instruments.

Tade0 · on Oct 16, 2019

I can relate. My biggest disappointment in this regard were Android devices - such flexibility and potential! But still, they couldn't deliver.

This goes to show that consumer-grade hardware(and associated software) is not up to par with the requirements of making music.

paulsutter · on Oct 17, 2019

This isn't research about hands, this is research about recognizing 3d reality in a 2d image, filling in occluded details based on knowledge about complex objects.

Hands do make a great subject for the experiment, but are not the end goal. The results are impressive and a step forward.