Hacker News new | past | comments | ask | show | jobs | submit login

> Without going into a long technical explanation, our mobile monocular SLAM implementation, semantic segmentation, structure from motion and relocalization (loop closure) systems are firmly in the Machine Learning/Hard Tech sphere.

I am sure it was hard for you guys but aren't these systems no longer technical unknowns? There are lone PhD students who managed to build them and open sourced their work. [1]

This is no longer hard in the sense of "I'll need a research team and five years -- and we might not figure it out." It seems like it's just academic technology transfer where you need people who can read the latest papers and implement them.

[1] http://vision.in.tum.de/research/vslam/lsdslam (The author is now at oculus research)




Well, first off no Lone PhD student has built a SLAM system. LSD itself was years and a whole team. Similar for ORB-SLAM.

Notice also that those systems are built for robots and standup machines to run it, not mobile handsets. Sounds like a small difference if you aren't in it, but it is critically harder, so the approach is actually different. It's not about just "reading the latest papers and implement them." Somewhat offensive to assume that is the case. I'd challenge you to find a mobile monocular SLAM system that is up to our capabilities, let alone usable. They have all been acquired (Apple, Facebook, Intel). The reason you can't just copy paste implementations is because they are non-deterministic in optimization.

Second, SLAM isn't the only thing we do. In fact it's not even the hardest thing we do. The majority of what we do isn't something I'll go into in depth, but it actually falls into the category of "I'll need a research team and five years -- and we might not figure it out" - though now we're in year three and have made enough progress that it's starting to come out of the realm of "we might now figure it out."


> It's not about just "reading the latest papers and implement them." Somewhat offensive to assume that is the case.

I wouldn't take too much offense to it. From my (much smaller) experience in computer vision and pattern recognition, everything in CV sounds easy until you actually do it AND make it work in the real world. Real data, poor lighting, low contrast, realtime, etc. There are just so, so many factors that make this an extremely challenging field.

When I was in grad school (2012–2013) the textbooks we used said things like "generalized object recognition is an unsolvable problem". In a lot of ways since then "unsolvable" problems have been solved. The field is just changing so rapidly.


If you've never tried to implement academic papers, I think you'll find reimplementations have a very high failure rate. Some of it is probably due to different datasets, some of it is due to not disclosing the entire algorithm or crucial implementation choices, and some of it is probably due to the bits in the code where "magic goes here", but those parts don't get published... Crucial tweaks to optimization algorithms, smart choices of initialization for iterative algorithms, etc. eg it's not really like saying L-M really tells a practitioner precisely what you did; it's more of a family of techniques.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: