Hacker News new | past | comments | ask | show | jobs | submit login

Lidar can create a great 3d representation of an environment, but then you're back to the same problem as cameras. You need some sort of AI to identify objects in the data. So the question is how much of an advantage does lidar give you in object identification? It's hard enough to identify objects with confidence in 2d images. Where is the confidence level at for identifying and/or discerning 3d objects with AI?



Sure, lidar gives you way less resolution. Take the velodyn alpha puck for instance, 300k points/sec. If you are moving at 60 mph once second = 88 fps. So the puck spreads 300k points over 88 feet of highway.

The tesla system has 10 cameras, but I believe only two of them look forward. I believe they are 1280x960 @ 30 FPS or 36M pixels/sec. But there's two of them, 73M pixels/sec. Each pixel is in color (lidar is just a distance).

So the tesla system has WAY more information about the environment, granted distance has to be inferred, but it also had radar to help with that.

Additionally being inherently more similar to eye sight, a car using cameras is likely to get along with human drivers better. Slowing down when it's foggy or rainy, seeing at similar ranges to humans, and being able to use color for additional context. Is that a UPS truck or an ambulance? Is that a reflection of a police car with it's light on or just a window reflection? Is that a boulder or a wind filled trash bag?


> So the tesla system has WAY more information about the environment, granted distance has to be inferred, but it also had radar to help with that.

I'd argue the contrary. Intelligence is primarily not about the amount of data, but the amount _and_ quality of data you receive. If I would have a magic sensor giving me obstacles in a segmented form, that would be couple of KB, and it would beat any other sensor on the market.

Inferring the distance from stereo images has its own failure-modes and are not easy to account for as in LiDAR. LiDAR also gives you reflectivity, so you will be able to differentiate between a UPS truck and an ambulance.

> Is that a reflection of a police car with its light on or just a window reflection?

Fun thing, to my knowledge reflections are a major unsolved problem for vision. It is easier for LiDAR, as you can rely on the distance measurement and will have an object somewhere outside of any reasonable position (e.g underground, behind a wall). Depending on the lidar, the glass might even register as a second (actually primary) return.

Yes, you need cameras (likely color) to be able recognise any light based signalling (traffic lights, ambulance/police lights...), so LiDAR is not the panacea. But having the lidar telling you that there is a window and that police is behind it is likely vastly more robust with lidar.

Also, the difficulty is that you have to see arbitrary objects, on the road and possibly stop for them. As long it is larger than maybe a couple of centimeters (or an inch), it will show on the LiDAR, with stereo vision, you need a couple of pixel texture to infer it.


I've worked with lidar data a fair bit in a VR environment. It can be quite hard to tell what's going on in any kind of complex environment. The date is so sparse and the datasets I was working on were static.

300,000 per second... if you are trying to figure out what's going on in 1/20th of a second that's only 15,000 points. Assuming your scanning 3 lanes (3.7M each) out to 100 meters, say 3 meters high that 3330 cubic meters. So lidar gives you 2 points per cubic meter. Not exactly going to be easy to tell a bicycle from a motorcycle, or an ambulance from a ups truck.

From what I can tell machine learning has led to near human levels of object identification, not nearly as competitive for things like sparse monochrome point clouds.

At 65 MPH, to be able to avoid something you need some lead time, which means distance. The lidar stuff I've seen is pretty sparse that far out. Of course the sexy almost real looking detailed landscapes from lidar are from tripod mounts and long sample times.

Which leads me to the relevant question. Do you have any reason to think that machine learning will handle lidar at 180 feet range (2 seconds at 65 mph) than a pair of color cameras running at 1280x960 @ 30 FPS?


layman here, does lidar necessarily have to sample the whole environment uniformly?

I ask because, as I understand it, humans actually have quite poor visual acuity through most of our FOV, with a small very precise region near the center. the visual cortex does some nifty postprocessing to stitch together a detailed image, but it seems to me that human vision is mainly effective because we know what to pay attention to. when I'm driving, I'm not constantly swiveling my head 360 degrees and uniformly sampling my environment; instead, I'm looking mostly in the direction of travel, identifying objects of interest, and taking a closer look when I don't quite understand what something is.

is it possible for a lidar system to work this way? maybe start with a sparse pass of the whole environment at the start of a "cycle", and then more densely sample any objects of interest?


Lidar generally operate by rotating a laser that pulses at a fixed rate, using optics to sweep the beam up and down to get a reasonable vertical FoV 360 degrees around the car. They output a stream of samples - basically a continuous series of timestamped (theta-rotation, phi-rotation, distance) tuples - that software can reconstruct into a point cloud.

But! The lidar data is useless by itself since the car is moving through space at an unpredictable rate. Each sample has to be cross-referenced by timestamp with the best-estimate location of the car in order to be brought into a constant frame of reference. This location estimation is a complex problem (GPS and accelerometers get us most of the way there but aren't quite high-fidelity enough without software help) so it can't be done onboard the lidar.

So to do what you suggest, the lidar would need a control system that allows its operational parameters to be dynamically updated by the car. But what parameters? Since the laser is already pulsing at least hundreds of thousands of times per second, there's probably not much room for improvement there without driving up cost, and if we could go higher we'd just do that all the time anyway. The only other option would be to slow down the rotation of the unit while it sweeps over the field of view we've decided is interesting.

That way is a little more conceivable, but I doubt it would work out in practice. If the unit averages 10 rotations per second, it has to be subject to 20 acceleration/deceleration events, which would be a significant increase in wear and tear on the unit. It would also make it harder to reliably estimate the unit's rotation at any point in time, again driving up costs.

All this can't grant you much more than, say, a 100% increase in point density on the road (assuming 120 degrees of "interesting" FoV and a 1/6th sample rate on the uninteresting parts). If these things are to be produced at scale, I imagine it would be easier to increase point density by just buying a second lidar, which would also bring better coverage and some redundancy in case of failure.


It definitely works pretty well. I'm doing work currently with a robotic arm and hand doing random, unplanned tasks. I'm testing out different scenarios with lidar units placed on the ceiling and walls, and directly on the arm and hand, in combination with traditional camera based object detection.

Combining the real-time data together is straightforward, but figuring out how to optimally take advantage of it all is definitely a challenge. Having the extra dimension or sense definitely helps though.

I see it helping most in tasks that require a lot of dexterity, such as multiple digits working together, where a camera lens would be too close to the object or covered by the item in the task (folding a blanket etc), blocking light.


I think the LIDAR usage in this context is not to identify objects but to detect them. There are many situations where a camera-based-only vision system falls down e.g. bad weather, glare from secondary light sources etc.

The main downside is cost.


> There are many situations where a camera-based-only vision system falls down e.g. bad weather, glare from secondary light sources etc.

A splat of dirt on the LADAR, and it will be not much better.

MM and Thz wave radars are cheaper and better for this application.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: