Hacker News new | past | comments | ask | show | jobs | submit login
HAMR – 3D Hand Shape and Pose Estimation from a Single RGB Image (fritz.ai)
91 points by posnererez on Oct 16, 2019 | hide | past | favorite | 18 comments



In other news, there's a Google MediaPipe hand tracking example.[1][2] It's still documented as iOS/Android only, but there's now a hand_tracking directory under linux desktop examples![3] Results have been mixed.[4]

[1] https://ai.googleblog.com/2019/08/on-device-real-time-hand-t... [2] https://github.com/google/mediapipe/blob/master/mediapipe/do... [3] https://github.com/google/mediapipe/tree/master/mediapipe/ex... [4] https://www.youtube.com/watch?v=ZwgjgT9hu6A (Aug 31)


I didn't see any mention of how long it takes to recover a mesh from an image. I imagine it's a significant amount of time, not including training.

What I wonder is, if this technology was fast enough, could it be used to caption sign language?

The full publication: https://arxiv.org/abs/1902.09305


This looks amazing and really useful for real-time sign language translation!


It’s worth paying attention to the deaf community’s thoughts on any tech that comes along, gloves etc., for translating sign language. Short version, ASL is more than hand gestures and past efforts at solving this tend to focus more on assisting the hearing than the deaf.

This is a really good write up on the issue of signing gloves and tech: https://www.theatlantic.com/technology/archive/2017/11/why-s...


Thank you for the constructive reply and the link, it will certainly be useful. It is hard for me to properly respond, when this topic (sign language machine translation) comes up. You cannot encode sign language as text! You generally cannot understand what two people are talking about informally, if you don't know them. Oh well...


My high school had a fairly large deaf program. I took Latin with a majority of the students being deaf/hearing impaired. Best part was watching the translator relay my smart-ass jokes to the other students. I’d say something and a half minute later get people grinning at me. It was a fun experience and helped prepare me for performing in other countries.


My point was that is will assist the hearing, not the deaf. So yes, this seems like it's not useful at all to the deaf community.


Real-time identification of gang affiliations doesn't seem to be too far of a leap either.


I remember learning the bloods sign in middle school, but are gang signs thrown around often that they need identifying?

Maybe this can be a pitch to MLB teams to learn the opposing team's catcher's pitch signs? If it wasn't only hands, not sure if this could offer anything to analyzing third base coach signs as well.



Maybe the soli radar is just unnecessary, if you have a low light camera or just illuminate with IR.


This is one problem where getting the results in software is very impressive, but the problem becomes much simpler with just a modicum of extra hardware.

LeapMotion[1] devices accomplish this with nothing more than a pair of cameras in a matchbox, in real time. And this kind of hardware is already becoming standard on cell phones and laptops.

Still, the killer obstacle for the applications I was trying to make was not precision, but latency.

Amazing work on HAMR - I wish they had the latency numbers in the article too!

[1]https://www.leapmotion.com/


LeapMotion, the last I looked, was two stereo cameras and three IR emitters, plus their own special software algorithms. I don't think this makes the problem simpler if the problem you're trying to solve is a low-cost, extremely robust hand detection system that can be used with any device that has a camera.

Those three IR emitters aren't standard on any device or category that I'm aware of. (Apple's iPhone uses an IR pattern emitter.)

A software-only solution has the advantage of potentially improving dramatically with new ML models. LeapMotions's physical hardware - revolutionary when it launched – now seem like a disadvantage given the pace of human model detection (range of the IR sensors, locked into lower resolution cameras, etc.)

Solving it in software...assuming there's not a performance hit, is usually the better solution, in my opinion.


Last time I tried LeapMotion it was terrible. Has it improved?


Interesting. I didn't notice any significant latency when I was using it.

Or were you trying to make musical instruments?


>Or were you trying to make musical instruments?

Bingo :) The latency was pretty low, and good for UI, but you'd want <10ms total latency for musical instruments.


I can relate. My biggest disappointment in this regard were Android devices - such flexibility and potential! But still, they couldn't deliver.

This goes to show that consumer-grade hardware(and associated software) is not up to par with the requirements of making music.


This isn't research about hands, this is research about recognizing 3d reality in a 2d image, filling in occluded details based on knowledge about complex objects.

Hands do make a great subject for the experiment, but are not the end goal. The results are impressive and a step forward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: