Hacker News new | past | comments | ask | show | jobs | submit login

Not a physical reason, no, but an AI one.

Humans and most other animals don't rely solely on stereoscopic vision to navigate the world, we rely on a model of the world where we recognize objects in the image we perceive, know their real size from experience, and use that as well as stereoscopic hints to approximate distances and speeds. We additionally use our understanding of basic physics to assist - we distinguish between an object and its shadow, we can tell the approximate weight of something by the way it moves in the wind (to know if we need to avoid an obstacle on the road), and there are other hints we take into account.

We also take into account our knowledge of the likely behavior of these objects to judge relative speeds (e.g. thr car is moving away, it's not the tree coming closer).

Without this crucial aspect of object recognition and experience about the world, our vision is actually very bad at navigation. If you put us in an artifical environment with, say, pure geometric shapes at various distances, no/fake shadows, objects with non-realistic proportions and so on, we will have much more trouble navigating and not bumping into things even at walking speeds. And this is the level the AI is currently operating at, more or less.

And if you don't believe me, note that humans with one eye, while having impaired depth perception, are still perfectly able to drive safely, with ~0 physical mechanisms for measuring distance (I beleieve the spherical shape of the iris may still give some very subtle hints about distance as you move your eye around, but that is minimal compared to stereoscopic vision). A LOT of our depth perception is just 2D image + object recognition + knowledge about those objects.




While all of this may be true, this doesn't explain why stereoscopic vision wouldn't work where a LIDAR would. Both provide identical geometrical information and neither has anything to do with AI. Neither tells you approximate weights of things, or judge based on human experience how things might move in the future depending on their type (tree vs car), or anything like that. And if you swap one system providing geometric information for another one that provides identical information, I don't see how this makes the cognition of any AI later in the pipeline magically any better, no matter how good or bad that AI was previously.

However, one benefit that long baseline stereoscopic vision (for example with cameras in corners of the front windscreen) would have compared to a short baseline stereoscopic vision (a human) or a point measurement (LIDAR) that could be relevant for safety would be the ability to somewhat peek around the vehicle in front of you from either side. Admittedly, this may overall be a small-ish benefit relative to a LIDAR but it does provide strictly more information (slightly) than a LIDAR would.


Well, LIDAR uses very well understood physics to give you precise measurements of distance from the world around you, without any need for object recognition. It is not enough on its own, but it is an excellent safety technology. It's basically impossible to run into an object that's moving slow enough to avoid based on LIDAR input.

Stereoscopic vision first relies on object recognition of the elements of the pictures taken by each camera, then identifying the objects that are the same between the pictures, and only THEN do you get to do the simple physical calculation to compute distance. If your object recognition algorithm fails to recognize an object in one of the images; or if the higher-level AI fails to recognize that something is the same object in the two pictures, then the stereoscopy buys you nothing and you end up running into a bicycle rider crossing the street unsafely.

LIDAR does have limitations of its own (for example, it can't work in snowy conditions, since it will detect the snow flakes; not sure if the same applies to rain), but the regimes under which it is guaranteed to work are well understood, and the safety promises it can make in those regimes don't rely on ML methods.


> Well, LIDAR uses very well understood physics to give you precise measurements of distance from the world around you, without any need for object recognition. It is not enough on its own, but it is an excellent safety technology. It's basically impossible to run into an object that's moving slow enough to avoid based on LIDAR input.

Again, claiming that LIDARs make things magically safer sounds like a lot of snake oil to me. Both LIDARs and stereoscopic systems use well-understood physics. Stereoscopic rangefinders were being used in both World Wars for gun-laying and you wouldn't say that you don't need precise measurements for gun-laying.

> Stereoscopic vision first relies on object recognition of the elements of the pictures taken by each camera, then identifying the objects that are the same between the pictures, and only THEN do you get to do the simple physical calculation to compute distance. If your object recognition algorithm fails to recognize an object in one of the images; or if the higher-level AI fails to recognize that something is the same object in the two pictures, then the stereoscopy buys you nothing

As for whether stereoscopic vision relies on object recognition, that seems like a mild stretch to me. Generally it, like for example SfM (of which it is a special case), seems to rely on local textures and features for individual data points -- and in a simple single-dimensional stereoscopic vision case, your set of possible solutions is extremely limited, so matching features from SIFT or SURF in stereoscopic vision is way simpler than even the general SfM case. Those individual data points do not require in any way for individual objects to be recognized and separated. I have NOT seen in my life an SfM solution that would not give you a point cloud if it failed to separate objects -- in fact, SfM software doesn't even try to identify objects when generating a point cloud because it doesn't even operate at such a high level. Note that this actually provides the exact same information as a LIDAR would, namely a point cloud with no insight how the points are related to each other.

Pretty much the only situation where stereoscopic vision or SfM fails to provide depth information is with a surface of highly uniform color completely devoid of textures. Whether this could or couldn't be solved with structured light is an interesting problem.


Human stereoscopic vision could also be fooled by specifically designed optical illusions in science museums. We just avoid them when designing roads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: