Hacker News new | past | comments | ask | show | jobs | submit login
Oyster: Towards Unsupervised Object Detection from Lidar Point Clouds (waabi.ai)
135 points by abrichr on June 13, 2023 | hide | past | favorite | 71 comments



"No need for that, it can all be done using cameras because humans don't have LIDAR but drive with their eyes." /s


When I’m driving I can determine the difference between a cardboard box and a plywood box. One of those objects I am perfectly fine to drive over. That’s just one of an infinity of embodied decisions I am able to make while driving. How does LIDAR help with that? Isn’t this a software problem?


When I'm driving at night I cannot reliably determine the difference between a moose and an empty stretch of road because unlike deer moose do not have reflective eyes and stand tall enough that headlights will merely show a few easy to miss dark lines (the legs). One of these (empty road) I am perfectly fine to drive through. The other will result in bad day all around (assuming either side survives).

Ie, I find people like you just, like, mindbogglingly ridiculous. Sure, no single sensor is going to magically solve everything. But we have tens of thousands of deaths and millions of casualties a year with humans at the wheel, and a sizable number of them are due to the limits of human visual spectrum input. The goal here shouldn't be to match human limits but to greatly exceed them. FLIR, lidar, radar, ultrasonics, all around vision in general, all should be part of the final package. Crashes, injuries and deaths are expensive to put it mildly. The idea that cameras are or ever could be good enough is absurd, and that was if they were as good as human eyes, which they aren't. No amount of software or compute power is going to deal with lack of quality input data, and the visible spectrum alone just can't provide that consistently without costs we have long since determined are unacceptable (have ultra high output flood lights around all vehicles). More sensors beyond human senses can both cover scenarios we can't and help fill in for individual sensors not being as high quality as might be ideal yet.


What I find mindbogglingly ridiculous is how emotionally charged all of the "the system must include LIDAR" proponents are. It seems to suggest otherwise is perceived as nothing but a coldly calculated cost cutting measure. None of the arguments I come up against are anything but a surface level "well duh, intuitively more sensors equals more better" along with a litany of "you must like murdering babies".

We do not have autonomous systems that can deal with the kinds of problems that human drivers currently deal with, which as I illustrated, is an infinity of embodied problems based on the entirety of our lives spent interacting with our physical environment.

My intuitions tell me that in order to have a machine that can make human-like decisions that it needs to have that exact same kind of embodied experience, that is to have interacted with the physical environment in the same manner as a human, which suggests to me that perhaps an android is the way forward. Then once the autonomous system is extracted from the android and placed in a car it can be further trained on additional sensors, like how a human driver can take the input of LIDAR.


> What I find mindbogglingly ridiculous is how emotionally charged all of the "the system must include LIDAR" proponents are.

> None of the arguments I come up against are anything but a surface level "well duh, intuitively more sensors equals more better" along with a litany of "you must like murdering babies".

A bit pot blaming the kettle there.

The thing is, the 'vision is enough' crowd often manages to pick a single use case, like seeing the difference between cardboard and plywood, that has nothing to do with Lidar and then asking, rhetorically, how Lidar would help with that. By opening a discussion in such a disingenuous way, you provoke the emotional response you seem to dislike.

If you want to have an actual discussion, maybe start by asking a genuine question.

> in order to have a machine that can make human-like decisions

Honestly, I don't think many people want it to make human-like decisions. We want it to make much, much better decisions. By your standard, we should just put two camera's on top that rotate on a platform that can only move about 180 degrees both ways. Also, remove radar and ultrasonics, because humans don't have those either.

I can't give you decent arguments for using Lidar. I'm not in that field, so I can only return my own gut feeling in response to yours and I honestly don't think that has any value.


If you want to have an actual discussion, maybe start by asking a genuine question.

Go back and look at the comment that I was responding to…

Also, re-read the comment I wrote that you responded to and then re-read your response…


> deaths […] with humans at the wheel, and a sizable number of them are due to the limits of human visual spectrum input.

There will always be situations where (human or computer) captors aren’t sufficient. You can add more sensors or blame the existing one. Until that, a typical driver adapt its velocity depending on his capabilities and the road/weather. In my country that also partially enforced by the driving code: freeway is limited to 130km/h but reduced to 110km/h if it’s raining or 50km/h if visibility is less than 50m.

Adding more sensors will let you increase your speed but if you want safety, look for your speed.


> Until that, a typical driver ?? adapt its velocity depending on his capabilities and the road/weather.

There’s an important missing word here, do you mean “should” or “does” where I’ve added in question marks? Because people hit moose and deer all the time, I think it would be good if everyone would drive safely, but they don’t.

An autonomous car with capabilities similar to a human that is designed by responsible engineers to drive safely would not be very popular I think, it would drive very slowly and annoy the passengers.

And of course an autonomous car that was as dangerous as a human would be the subject of all sorts of news reports and chatter online, and nobody would buy it because it would be seen as too dangerous.

It isn’t fair, but we won’t see widespread adoption until the average person can anthropomorphize a model of a car as, like, a guy they know who drives a lot and has never gotten in an accident. Because the Tesla algorithm is what we hear about, we think of it as being a single entity that drives all Teslas, and it ran into a fire truck or whatever, so it must be kind of an idiot.


>a sizable number of them are due to the limits of human visual spectrum input

But how do you know that? You don't really. The lion's share could well be from just not being consistently careful in the right ways, or reckless high speed driving, or the tyranny of human reaction times.

I do agree that at this stage, it's worth trying to pile on all of the sensors possible to see whether that gets you more reliable odometry / mapping. But you are getting super fixated on the specific problem of knowing what the shape of things around you is, and what people are trying to get across to you is that the problem is way, way harder than that. Just knowing where stuff is is like 10% of the problem. You also have to know what that stuff is and how it semantically relates to your driving problem. You have to be able to make predictions about how the world is going to play out, and what constitutes a safe action. And that's not even the hardest part - you also have to figure out how to glide through the world and not drive like a grandma, even though there is some inherent unsafety in building up a head of steam in a 2 ton piece of metal when a child could dive in front of you at any time. In other words, you have to be able to push the limits of UNsafe actions so that you're not completely obnoxious to be around on the road. All of this is going to require a many-levels-up understanding of the world around the car, and that understanding is going to do a ton of heavy lifting in terms of interpreting things that haven't been directly sensed. As soon as you get an occlusion, you have already left the realm of sensing and entered into the higher level realm of reasoning about the world's state based on priors. It might be that the other sensing modalities will help you be better than a human, but by far the more important unsolved problem is getting human-ish to begin with, and we are still very far from that.

The other thing you're brushing past is that high-level reasoning does not have an infinite bandwidth of attention. As you ingest more and more sensors, your sensor fusion problem becomes much more expensive. For a reasonable amount of compute hardware, and with the constraints of latency, you don't have infinite latitude to reason over ever-wider sensor streams. So at a certain point you're going to encounter a squaring effect with having to add both sensors and the compute to handle it, with additional considerations once that data gets so wide that "normal" ways of handling it stop working and you have to get exotic. For a given compute substrate, it is not a given that the best way to use it is to slurp a massively wide sensor suite into it. Your cycles could be better spent doing deeper reasoning over the data you've got. If you need an evolutionary analogy, you can ask yourself why two eyes seems to be plenty for the majority of animals? The brainpower is spent on building a complicated internal state of the world, and it can only focus on so much at a time anyway. Bio-inspired robotics is a thing for a reason - we learn a lot of subtle lessons from nature that are not intuitively obvious. This could easily be one of them.


>> When I'm driving at night I cannot reliably determine the difference between a moose and an empty stretch of road

But you are still here so looks like you can drive just fine without that distinction.


... no? Just because they weren't unlucky enough to crash into a moose so far doesn't mean tomorrow he will be lucky as well?

Events tomorrow can take different turns than events yesterday.


I do not rely on just my eyes when driving and if I did I'd have been in at least one accident. I also rely on hearing and feel (vibrations, difficulty steering, car movement, etc.). Studies support this by indicating those who have hearing difficulties are more likely to get into accidents. I also rely on an organic computer far more powerful than anything you can shove into a car with modern technology and a pair of cameras with better dynamic range than anything on the market. Plus the ability to dynamically reposition these sensors by a couple feet.

Once you can shove a human brain's worth of compute and memory into a car, along with an amazingly good set of 360 coverage stereo cameras, great stereo microphones, and multi accelerometers we can start talking about "LIDAR not mattering because the car can do what a human does and a human doesn't have LIDAR." Of course then you've got 20+ sensors that may disagree with each other anyways so LIDAR is probably going to mean fewer sensors overall.


What I'm pointing out is that this is not merely an issue of sensors and that focusing on sensory input is not the way to solve the problems of embodied decision making. The focus should be on creating a system that can make these kinds of embodied decisions. Read my other comment in this thread related to androids.


Nothing says the only way to have a car drive as safely as an optimal human requires a human analogue. You're making a massive strong claim without anything to back it up. For example, humans can drive cars in video games fairly well from a third person perspective which clearly is not the same as sitting in a car.


Are you offering up any concrete path towards dealing with the kinds of scenarios that I am presenting?

And what does a human controlling a virtual car have to do with a machine controlling an actual car? The human can easily control the virtual car in the same manner as a real car precisely because of their embodied knowledge of cars, roads, and other parts of our experienced reality!


As ChatGPT has shown you can get very close to human without actually being anything remotely human.

>Are you offering up any concrete path towards dealing with the kinds of scenarios that I am presenting?

They don't matter 99.9999% of the time and the rest of the time it's actually better to avoid the obstacle every single time. Kids play in cardboard boxes and die because people's "embodied" view doesn't account for that. Plastic bags wind around the wheel and break the car because people's "embodied" view doesn't see the danger of a plastic bag. Etc, etc. People are overall atrocious unsafe drivers and I'd be surprised if even existing driving system aren't superior to humans in terms of accidents.

A real issue is that the data from existing sensors is less than the data a human gets from our sensors which will leave edge cases that software cannot solve (ie: many Tesla autopilot accidents seem to be this category).


As I said in another comment, many years ago I was on I-95 in heavy, fast moving traffic and I saw a box flutter off the back of a truck and I decided to just let it bounce off my windshield. That was the right decision to make.


It was the right decision not because it was made of cardboard, but because it "fluttered".

If it had been full of bowling balls and just thunked to the ground, it would not have been the right decision to just drive over it.


Yes, that is exactly my point, which is why I added that important detail! I saw that it behaved in a way that made it clear that it was better if I let the box hit my car than to slam on the brakes or steer out of the way.

I am not talking about any specific case. The difficulty is precisely in creating an autonomous system that is capable of making such embodied decisions. I learned to recognize how a lightweight object behaves with a lifetime of experience. The type of sensory input does not matter for this and if we are trying to train a system that is capable of this kind of deeper modeling of the world it is traveling through then the focus needs to be on building and training of such a system.

Fluttering boxes is just one of an infinite number of such embodied decisions I am able to make as a human living on the planet earth!


> the only way to have a car drive as safely as an optimal human requires a human analogue.

To be fair, they said:

> not the way to solve the problems of embodied decision making

Having more sensors might make things easier. But, after maybe an hour of watching Tesla FSD Beta videos, you'll find that the deficiency is in the decision-making that goes into actually driving like a human, predicting what other road users are going to do, and following human traffic patterns, not object detection.


>One of those objects I am perfectly fine to drive over.

I recommend you Google "kid in cardboard box run over by car"


Or less dramatic, there might be something heavy in there that you don't want to hit.


Or nails. Lots of nails.


The LIDAR won't think the moon is a yellow traffic light.

https://www.youtube.com/watch?v=7UF-S2czdCk


Ok… so that doesn’t answer my question. It seems like the LIDAR based systems would be just as apt to make different kinds of mistakes.


You misunderstood. It's not choosing either lidar or vision, it's that you need sensor fusion of both to make more accurate calls, neither of them is enough alone.


But there’s a cost to the software system when you add more inputs both in training and prediction and more-so when the inputs are in conflict!

If you think about what really needs to happen to be able to tell of a box is cardboard or plywood you need a fully embodied system, ideally similar “raising” an android in a human environment.


>the inputs are in conflict!

Better for safety to have conflicting inputs that require an arbiter (even human), than a system that's confidently wrong on a single input and drives you at full speed into a 'Wile E. Coyote' wall painted to look like a tunnel. [1]

Since the moon landing, for safety, all aeronautics have been built around triple voting arbitration system where the correct answer is the one voted by the majority. This triple voting system has been adopted in many ECUs running safety critical operation in modern cars and we should demand it for self driving tech as well if we wish to have autonomous driving that's actually autonomous and also doesn't confidently try to kill you at times.

>But there’s a cost to the software system when you add more inputs both in training and prediction

Yes, not getting people killed, who are traveling in fast moving heavy machineries, is a very expensive problem to solve(ask the aero industry), welcome to the real world of actual engineering where safety must be taken seriously, and not brushed aside by overhyped SV fantasies of playing fast and loose with regulations and human lives for the sake of "disruption".

[1] https://www.youtube.com/watch?v=4iWvedIhWjM


> Better for safety to have conflicting inputs that require an arbiter, than a system that's confidently wrong on a single input and drives you at full speed into a 'Wile E. Coyote' wall painted to look like a tunnel.

Reminded of the recent Japanese lunar probe that failed[1]. They relied on a radar ranging sensor to determine altitude, and when they flew over a large crater the sudden change in distance to ground caused the software to fail the radar sensor. This again meant it had to rely only the IMU, which has drift issues[2], causing it to believe it was just above ground when it was in fact way above.

I'm thinking a camera looking down in combination with the radar would have been more resilient, as there would be less need to rely on hardcoded threshold values in order to determine when to fail the sensors.

[1]: https://www.youtube.com/watch?v=2JlUnOAiMm4

[2]: https://en.wikipedia.org/wiki/Inertial_navigation_system#Dri...


Not having read anything on the failure, I'm surprised they wouldn't have other range finding methods. Like LIDAR actually, just not the LIDAR everyone thinks of. They have them on commercial drone packages. It's like a single beam LIDAR setup. Does better at long distance range finding than the mmWave Radar units. The mmW unit handles water better though. The engineers that built the JP lander are likely far smarter than I am and probably thought things through much better than my simple armchair observations.


A single beam LIDAR would suffer the same problem.

That is, the radar was accurate, the problem was the team hadn't foreseen the probe flying over such a deep crater, thus the threshold for failing the sensor in case of a sudden large change in apparent altitude was set too low.

This is why I suggested a visual-based sensor, since it would be using a very different measuring scheme, and one which could also be used to cross-check the IMU data.


> Better ... than a system that's confidently wrong

If ChatGPT has taught us all anything, this should be it. It can be so confidently wrong while sounding believable, that I wouldn't be surprised if some kid has turned in completely hallucinated homework thinking it was a great essay.


All aeronautics require a triple voting arbitration system where the correct answer is the one voted by the majority.

But those are not generally autonomous systems and if they are they operate within much simpler environments.

In the case of multiple sensors and a voting mechanism the pilot is still in control and making the decisions.

These are two very different kinds of systems!

SV fantasies of playing fast and loose with hopium.

Ah, so that’s it. Your arguments are based mainly on emotion and not engineering.

You still haven’t explained to me how these systems can deal with the original problem I presented.

EDIT: You'll notice that the commenter appealing to the populism of saying snarky things about Silicon Valley is receiving more encouragement from this forum than I am. I am very much trying to keep things even keeled and ChuckNorris is doing very much the opposite. Would everyone prefer that we just shoot emotionally charged hot takes back at each other with a minimum of engineering focused discussion?


(Not the person you were responding to, but)

I don't really see the dilemma here. While driving, humans use sight, sound, touch (like rumble strips), and occasionally smell (engine scents, crashes, wildfires) as inputs. Sometimes these inputs conflict, but we make judgment calls. Sometimes those judgment calls are wrong, but I'd still rather have the multiple inputs than not.

With a box on the street, cardboard or not, I wouldn't want to drive over it mindlessly. What if there was something sharp and heavy inside the cardboard box? Better to detect its presence and either come to a graceful stop if possible, swerve around it gently if possible, or warn the driver to take action and then ungracefully brake as a last resort. I wouldn't want any system to drive right over a possible collision based on an assumption of its material.

If the camera sees an obstacle, the lidar confirms it's not an illusion, the radar senses it too... that doesn't seem like some unresolvable dilemma. That's a triple redundant system telling you there's something potentially dangerous on the street ahead, and a judgment call has to be made. In this case it seems like those systems would more likely agree than not.

A more realistic scenario might be when the camera is obstructed due to glare, or the lidar due to fog or whatever and they disagree about the state of things. Then yes some arbiter would have to make a judgment call. Sometimes it'll be wrong. Sometimes deadly wrong. But that's the case whether it had three inputs or two or just one, or or one camera in a stereo setup has dirt on it, or a bird poop on the windshield is misidentified, etc. Being able separate input noise and errors is part of the job of these systems, something that's made easier with different sensor types that can cross check each either.


> I wouldn't want any system to drive right over a possible collision based on an assumption of its material.

Imagine a scenario where it is worse to change lanes or to quickly break than to just drive over the box... Like the time I was on I-95 in heavy, fast moving traffic and I saw a box flutter off the back of a truck and I decided to just let it bounce off my windshield.

There are endless amounts of these kinds of problems. It would be useless to attempt to enumerate them, either in writing, or in code, which speaks to the power of neural networks!


It's true, but in that case it's more a matter of prior experience (oh, I bet that cardboard box/squirrel/etc. probably won't hurt me) than resolving conflicting sensors. That probably lives a few levels above the raw sensor inputs anyway. It's an argument for more powerful training, not fewer sensors... those aren't mutually exclusive goals.

Isolating it to just the sensor layer, though, there are often legitimate times when a visual spectrum camera alone isn't ideal. Like not being able to maintain distance to the truck ahead because the glare of the sun is bouncing off its reflective back label at just the right (or wrong) angle, or an odd long object is sticking out the back of a pickup and the object detection can't figure out what it is (but a lidar or radar could sense the presence of something). Or for nighttime driving when deer glow in the infrared far enough ahead but not so much in visual. Or driving by another car in "dealer camouflage". Having another type of sensor could be life saving.

For what it's worth, that's true for non autonomous systems too. The human sensor suite in drivers is already often augmented by additional sensors, like parking radars or infrared HUDs or or at least an additional pair of stereoscopic visual cameras with their own judgment.

I think these are fundamentally different parts of the overall system. Sure, you can train algorithms on one input type alone. Or multiple. But unless there's evidence showing that training on multiple inputs produces worse final outcomes than training on just one alone, it seems that having multiple sensors should be a net positive.


But unless there's evidence showing that training on multiple inputs produces worse final outcomes than training on just one alone, it seems that having multiple sensors should be a net positive.

Go train a CNN on 4k images and then go train a CNN on those same images at VGA resolution and you'll see just how problematic a larger set of inputs can be for minimal gains!

I have never argued that LIDAR couldn't be used by an already trained autonomous system in the same manner that a human driver is presented with LIDAR sensory inputs! An android seems to be the trajectory we're on if we want a machine to have the same embodied decision making that humans are capable of.

Like I said in my original comment, this infinity of embodied experiences seems to be software problem, not a sensor problem.


Hmm, I think we're just agreeing around each other, here...? lol

Yes, being able to distinguish a cardboard box from a plywood one is primarily a software and training problem. But the bigger problem above that -- which is what the overall thread is about, if not your specific scenario -- is whether multiple sensors can make overall driving (autonomous or not) safer.

> Go train a CNN on 4k images and then go train a CNN on those same images at VGA resolution and you'll see just how problematic a larger set of inputs can be for minimal gains!

Surely there's a difference between incremental resolution gains of the same sensor type vs a completely separate channel of data? For example, the camera-based driver assist systems gradually evolved from black-and-white to color inputs. Maybe "more pixels" isn't as valuable as "sensors able to differentiate color" in terms of being able to easily identify traffic lights, road signs, lane markers etc. Even if the training is harder to do, the outcome for the human passengers is usually better. And it might not even make the training harder, if the additional color channels can help disambiguate the black-and-white data. Same with infrared. Maybe each sensor type has an optimal resolution/sampling rate/etc., but that's not the same question as whether mixing altogether different sensor types can help. One is trying to classify 3D objects from 2D photogrammetry, the other from a 3D point cloud.

To use another contrived example, let's say there's a black round thing on the ground, a few meters ahead, in damp twilight conditions. A single camera might be able to see that it could be a manhole cover or a pothole. Two cameras might be able to gauge depth too, ascertaining that it's indeed covered (and not just a hole you're going to fall into). But if it's really dark out or wet, or the thing is surrounded by a puddle of water that visual cameras have trouble making out, being able to map its topography with a lidar seems like a nicer way to make sure it's safe to drive over.

I think the overall opposition you're seeing here is not really directed at the "cardboard vs plywood" question, but the bleedover from Tesla's decision to remove lidar and radar (for cost reasons) while cynically using safety/software as excuses. People take issue with that because it's a disingenuous argument that's led to preventable fatalities -- and we're hoping other automakers don't make the same mistake.

> I have never argued that LIDAR couldn't be used by an already trained autonomous system in the same manner that a human driver is presented with LIDAR sensory inputs!

That's not the ONLY way lidar can be helpful. Even in existing driver-assist systems using basic object recognition technology from yesteryear, having different sensor types can be helpful. Lidar & radar don't require sentience to be useful.

If having additional sensors makes training ML-based systems harder... okay? Wait a few more years, rather than cutting them out altogether to rush a camera-only system out the door? I used to be really excited about Teslas, back when they actually were trying to do the hard engineering of integrating all these different sensors. When they switched to gaslighting marketing trying to convince the world that cameras were enough (and they weren't)... well, I lost a lot of trust in them -- especially having seen firsthand the success rate of the camera-only system in my own car, and knowing that it typically fails when my own eyes fail (high glare, low light, fog, etc.).

But yes, to your point, being able to distinguish plywood from cardboard is not the strength of lidar.


I suspect you overestimate the computational or logical complexity of sensor fusion.


The point is, if you say Lidar is essential, then how do you deal with fog, etc. Happy to be proven wrong, but my understanding is that the tech simply can't overcome these problems.

So you either say 'driver takes over' in these cases, or you create software that uses video to figure out where stuff is. Since you need cams to read signs, you'll have that hardware and some software anyway.

So if you have this software that can use video to figure out where things are, why do you need Lidar? The car needs to work safely without the Lidar. So including that in the sensor fusion should not be required.

You can point out cases where software fails, but that does not disprove the fundamental reason for not using Lidar. It perhaps means that the software isn't good enough yet, but that does not seem fundamental for cases where humans wouldn't also fail.


Lidar is capable of viewing through fog, whereas ordinary cameras won't be able to.

https://www.photonics.com/Articles/Lidar_System_Delivers_Vis...


> For bright reflective objects like traffic cones or road signs, the system is able to provide information in under a minute.

A minute..

Even if they could bring that down to 1ms, the main point still stands that Lidar can't read signs. And you'll need the tech in the car to read that stop-sign at high speed. Also the position of the stop-sign in relation to the streets is super important, so the software of positioning is also 100% required.

So the main point still stands, if you're going to need that tech, why do you need the Lidar, since you'll need to stop the car the moment the cams fail to work.

As a side note: I also I wonder if it would still work if every car at a crossing would have this tech (in a fog), how would the cars know that the laser light was coming from their car, and not form any of the other cars? Would have to morse-code an ID in there or something perhaps.

Second side note: It's a tech discussion, it sometimes seems people bring personal views about Elon Musk and Tesla into these discussions. I think this clouds the judgement.


nothing is essential. everything is contributory.


And many times less is more!


This argument works in students building self driving robots for competitions on a budget, not in 2 tonne cars transporting your family.


I am already thinking about this in terms of safety.


How do you know the cardboard box is OK to drive over? There might be something inside it.


Hearing is unable to detect colors, so I won't use it.


What does this have to do with autonomous driving systems? Does a car just start driving itself once you've added enough sensory inputs?


It's re this point:

> When I’m driving I can determine the difference between a cardboard box and a plywood box... How does LIDAR help with that?

Do you see the analogy?


No, I can't, because there's a lot more going on than just having recognized an object. How can we tell that it is cardboard? Or if we see someone throw a ball into the street, how can we get an idea of how much mass the ball has? Or any number of such scenarios... the answer is, we have a lifetime of experience of interacting with our environment. We've seen cardboard boxes, felt them, crushed them up and tossed them around. We've thrown balloons and rolled bowling balls. We have a learned understanding of both ourselves and the objects in our environment and how they behave.


> No, I can't, because there's a lot more going on than just having recognized an object. How can we tell that it is cardboard

How can my ears tell that an object is red? Eyes are fantastic things.


Knowing the color, shape, or sound that something makes doesn't mean anything without a lifetime of experiences to frame those sensory inputs and especially with regards to driving!

You're being obtuse!


I've been dancing around the point in the hope you'd come to your own conclusions. Removing the last trace of subtlety:

Your main point is this: > How does LIDAR help with [things cameras are good at and LIDAR is not]?

This is a strawman, because no one claims it does. LIDAR's value is as one of multiple sensors to fuse together to build a coherant model.


That's not my point, and I was explicit about this in my original comment! My point is that sensory input to an autonomous driving system is not the difficult part to solve! Just take a quick look at some of my other comments so I don't have to repeat myself.


I don't think you can easily determine whether the cardboard box is empty or full of concrete though. Unless you can, then kudos to you.


This could be problematic if there’s an anvil in that cardboard box.

Just sayin’.


Maybe LiDAR would realize the cardboard box is full of kittens.


People have dug into a weird trench insisting LIDAR is absolutely necessary because Tesla (but really by association Elon Musk) are trying to do it without.

The reality is that with or without LIDAR, the entire problem space is very far from solved, and the distance to close has a lot more to do with prediction, decision-making, and abstract representation. By the time these things have come far enough that full self driving cars are a reality, it's very likely vision will have progressed as well.

The whole self-driving race is a leap of faith that there is some mid-point between "current state-of-the-art" and "full human-grade general intelligence" that is good enough to safely drive a car. Nobody knows whether that's actually true. So far, all we know is that general low-level sensorial awareness isn't a sufficient condition for tolerable driving. We all suspected this was the case, and so now comes the long tail of discovering just how much more behavior you have to add before you've sufficiently emulated human driving. These systems are so complicated and the interactions so numerous that the horizon is very foggy. It's POSSIBLE that you don't have an acceptable driver until you're basically doing human-level reasoning. Nobody knows.

The point is that if you need a significantly deeper level of inference, prediction, reasoning, world state estimation, etc. etc., then the LIDAR will end up irrelevant, because the processing chain will be so sophisticated that camera-only input is easily sufficient for the sensing aspect. And if that's the case, camera-only is by far preferable because it is cheaper and simpler.

Saying that you need LIDAR is just as unfounded as saying you don't. And it has the downside that you're doubling down on a more expensive technology that is demonstrably unneeded by a sufficiently sophisticated intelligence (i.e. humans).


> full self driving cars are a reality

Full self driving cars are a reality. Waymo is giving 10,000 trips per week with nobody behind the steering wheel in SF and Pheonix.


Waymo creates very high resolution scans and maps of the areas in which they operate. They don't operate in novel areas.


Sure, but the point is, they're doing it.


Mercedes-Benz Drive Pilot is the only true self-driving technology available to individual vehicle buyers today, and it uses LIDAR. This is only a limited form of self driving and not human equivalent, but it does work at least in limited circumstances.

https://media.mbusa.com/releases/mercedes-benz-worlds-first-...

So far no one has demonstrated an equivalent system without LIDAR. As you state, it might be possible but no one knows how or even has a clear path to doing it.


Object detection is solved; I'm not sure if everyone advocating for LIDAR is theorycrafting or just hasn't watched any FSD Beta videos, but all of the mistakes the system makes in the course of the drive seem to fall into the realm of unperformant decision-making, like predicting what bikes or pedestrians are going to do, or being too timid to enter a roundabout with another car also entering the roundabout to your left, or stopping in the middle of the road on a divided highway during an unprotected left turn to yield for right traffic (instead of positioning itself parallel to the traffic in the median in the first place).


Or phantom braking because it thinks a weird shadow is actually an object. Or not realizing the difference between a reflection on a guard rail and a road line marking (because of low sun) causing it to directly aim for a guard rail (something I personally experienced).

Sure, vision object detection is 'solved'.


The more nuanced argument is that LIDAR can't be required for driving if you want to drive in inclement weather, so any L5 driving system needs to be able to drive without it.

Of course that's like saying I don't need my glasses to drive since in heavy rain my vision is equally impaired.


> in self-driving scenes


Does object detection, as of now, not rely on both broad translational variance and then temporal consistency of tracks?


It should be spelled as OySTeR.


tbh I am so annoyed by people spelling lidar as LiDAR. Nobody spells radar (which stands for Radio Detection And Ranging) as RaDAR...


Can we get a printout of Oyster smiling?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: