The only selling point of FSD (Supervised) is that it (can) work "everywhere." T...

ra7 · 2024-06-25T20:47:53 1719348473

> HD Mapping is great when it's accurate and available. But it requires a ton of data and constant updating, or the car will get "lost," and realistically will never be implemented in general, at best in certain cities.

Waymo have said time and again they don’t rely on maps being 100% accurate to be able to drive. It's one of the key assumptions of the system. They use it as prior knowledge to aid in decision making. If they got "lost" whenever there was a road change, they wouldn't be successfully navigating construction zones in San Francisco as we've seen in many videos.

They can also do constant updates because the cars themselves are able to detect road changes, self update maps and rollout changes to the entire fleet. See https://waymo.com/blog/2020/09/the-waymo-driver-handbook-map...

At this point, the whole “HD maps are not practical” is just a trope perpetuated by the Tesla community for years.

flutas · 2024-06-25T23:49:58 1719359398

> They can also do constant updates because the cars themselves are able to detect road changes, self update maps and rollout changes to the entire fleet.

Which leads to mapping failures being unchecked, as the system that generated the data is the one checking the data by driving it. See bullet point 1 in their recent recall for an example.

> Prior to the Waymo ADS receiving the remedy described in this report, a collision could occur if the Waymo ADS encountered a pole or pole-like permanent object and all of the following were true:

> 1) the object was within the the boundaries of the road and the map did not include a hard road edge between the object and the driveable surface;

> 2) the Waymo ADS’s perception system assigned a low damage score to the object;

> 3) the object was located within the Waymo ADS’s intended path (e.g. when executing a pullover near the object); and

> 4) there were no other objects near the pole that the ADS would react to and avoid.

cstejerean · 2024-06-26T00:03:12 1719360192

> the Waymo ADS’s perception system assigned a low damage score to the object;

and Tesla would do better how in this case? It also routinely crashes into stationary objects, presumably because the system assumes it wouldn't cause damage.

flutas · 2024-06-27T20:04:29 1719518669

> and Tesla would do better how in this case? It also routinely crashes into stationary objects, presumably because the system assumes it wouldn't cause damage.

Are the Teslas in the room with you right now?

Please point out in my comment where I mentioned Tesla. I can wait.

ra7 · 2024-06-26T00:23:02 1719361382

The changes can be checked additionally by humans, although not always.

> We’ve automated most of that process to ensure it’s efficient and scalable. Every time our cars detect changes on the road, they automatically upload the data, which gets shared with the rest of the fleet after, in some cases, being additionally checked by our mapping team.

Doesn’t mean it’s foolproof. But the benefits far outweigh the drawbacks.

gyudin · 2024-06-26T02:23:54 1719368634

Wasn’t it Waymo that had troubles navigating in winter and not seeing bike racks and other obstacles due to poor HD mapping?

kajecounterhack · 2024-06-27T07:09:43 1719472183

Waymo doesn't serve any snowy locales yet. But sure, years and years ago mapping was worse than it is today? The mapping used today is working quite well in warm weather locales.

kajecounterhack · 2024-06-25T19:16:43 1719343003

> Reliance on HD Mapping gets you to "robotaxis" quicker and easier, but it doesn't and likely cannot scale.

If you can make the unit economics work for a large quantity of individual cars, mapping is a small fixed cost.

I agree that it's not economical to map every city and road in the US, since you need to generate revenue from every mapped road and city. So you can think of HD maps as amounting to building roads. They will be built in lucrative places. Cruise and Waymo won't make money from putting taxis in nowhere Arkansas, so they don't need to map it.

> the current limiting factor is not what the car sees or knows but what it does with that information. It is unclear how or why HD mapping would help them at that point.

That's simply untrue. All the hard stuff continues to be reliability and sensor gated. Cruise and Waymo have amazing sensors and even they struggle with sensor range, sensor reliability, model performance on tail cases, etc. For example, at night these cars typically do not have IR or Thermal sensing. They are relying on the limited dynamic range of their cameras + active illumination + hoping laser gets enough points / your object is reflective enough. Laser perception also hits limits when lasers shine on small objects (think: skinny railroad arm). Cars also have limits with regard to interpreting written signs, which is a big part of driving.

Occlusions are still public enemy #1. Waymo killed a dog. Cruise crashed into a fire truck coming out of a blind intersection even though their sensors saw the truck within 100ms.

LiDAR and HD mapping together are supremely useful, even if you don't drive with it, for enabling you to simulate accurately. You cannot simulate reliably while guessing at distances and locations. HD maps let you use visual odometry to localize, and distance measurements grounded in physics backstop the realism of your simulation at least in terms of the world's shape.

Tesla lacks the ability to resim counterfactuals with confidence since they don't have HD ground truth. There are believers at the company that maybe you could make "good enough" ground truth from imagery alone but that in and of itself is a huge risk, and it's what skipping steps looks like. Most in the industry agree that barring a major change in strategy they just have no way to regression test their software to the level of reliability required for L4 / no human supervision.

FireBeyond · 2024-06-26T00:16:30 1719360990

> For example, at night these cars typically do not have IR or Thermal sensing.

This one in itself continues to kinda surprise me. High end Audis have full-blown thermal cameras, and will literally flag heat producing objects on the console: https://www.audi-technology-portal.de/en/electrics-electroni...

kajecounterhack · 2024-06-26T08:16:26 1719389786

It's likely because thermal is expensive for the BOM and also wrt energy usage. Also low res.

Workaccount2 · 2024-06-25T19:21:26 1719343286

Google already has a fleet of vehicles driving around continuously taking new street view photos.

bangaladore · 2024-06-25T19:25:15 1719343515

Google Maps seems to update at a frequency of 2-8 years. Maybe longer in some areas and we don't know what they do other than stitch it together.

HD mapping, on the other hand, likely needs to be updated frequently and immediately when any construction occurs.

It seems pretty clear that what they are doing today is nowhere near what they need to do. And again, I don't think that is possible.

TulliusCicero · 2024-06-25T19:30:48 1719343848

The obvious thing to do is to just have every Waymo robotaxi or car with licensed Waymo tech report in its daily mapping/obstacle data to the mothership, so you can get new changes almost immediately.

I dunno if said data would be as high quality as dedicated HD mapping cars, but it's probably at least decent, given the variety of cameras and lidars every Waymo car has.

singleshot_ · 2024-06-25T19:50:02 1719345002

Further, it seems to me that if you brake hard to avoid a dog, your car should warn me as I’m approaching. I’m not sure why we are trying to teach each car to drive when we could be teaching all the cars and the road to drive.

TulliusCicero · 2024-06-25T21:05:35 1719349535

> Further, it seems to me that if you brake hard to avoid a dog, your car should warn me as I’m approaching.

What does this mean? Electric cars are already required to emit a sound as they drive.

I guess if it has to brake hard for something, honking might be a good idea, but I wouldn't want cars to constantly be beeping at everything in their vicinity if there's no imminent crash.

> I’m not sure why we are trying to teach each car to drive when we could be teaching all the cars and the road to drive.

I'm not sure what you mean. Presumably Waymo's software is the same across its fleet. They're not training one car's model at a time.

singleshot_ · 2024-06-26T18:13:18 1719425598

> What does this mean?

Well, if your car brakes hard to avoid a dog, your car should warn me. I’m not sure how to make this concept simpler so I can only repeat it.

> Electric cars are already required to emit a sound as they drive.

I know.

> I guess if it has to brake hard for something, honking might be a good idea, but I wouldn't want cars to constantly be beeping at everything in their vicinity if there's no imminent crash.

If you think in a discussion about robot cars that drive themselves being conducted on a hacker website, I’m suggesting that cars communicate their sensor data to each other by honking their horns, in really not sure what to tell you other than yes, this would be profoundly dim witted.

> I'm not sure what you mean.

I believe it.

> Presumably Waymo's software is the same across its fleet. They're not training one car's model at a time.

I believe it. I also believe you’re deeply missing the point, perhaps intentionally.

c22 · 2024-06-25T20:01:34 1719345694

This is a good point. Do Waymo cars ever use their horn?

nolist_policy · 2024-06-26T08:14:36 1719389676

They do: https://youtu.be/YxVihk4c-h8?t=3m5s

bangaladore · 2024-06-25T20:53:01 1719348781

I suspect its the processing and validation, not the raw data that is difficult. At least for cities.

TulliusCicero · 2024-06-25T21:11:14 1719349874

Agreed, but having the raw data is still useful, especially for less-used routes where it's not economically feasible to send out dedicated mapping cars all the time.

directevolve · 2024-06-25T20:51:20 1719348680

I'm just speculating here, but I can envision a few ways of dealing with the cost problem in scaling an HD mapping-based robotaxi fleet:

1. Robotaxi companies might simply stand to make enough money to cover the cost of routine HD mapping. Anywhere the revenue of putting taxi services in a new city outweighs the cost of implementing the necessary updates sufficiently, won't companies do it? We could think of these companies as having similar economics to Uber, but replacing the cost of paying drivers with the cost of routine HD mapping updates.

2. Smaller towns have less frequent construction, so the update costs might be lower as you target less dense areas.

3. I could see a single company that specializes in providing routinely updated maps to a variety of fleet-operating companies. This could potentially be a utility or somehow subsidized by the government. It would also be possible for government to coordinate construction with HD mapping updates. After all, by lowering the rate of accidents and decreasing square footage devoted to cars, governments have a vested interest in seeing robotaxis replace human-owned and driven cars.

Dunedan · 2024-06-26T05:13:27 1719378807

> Tesla lacks the ability to resim counterfactuals with confidence since they don't have HD ground truth.

Tesla does have HD ground truth data for verification generated by their own LIDAR-equipped vehicles. However, according to a recent tweet by Elon Musk [1], they don't need LIDAR for that anymore.

[1]: https://nitter.poast.org/elonmusk/status/1788064109633875999

bangaladore · 2024-06-25T19:31:07 1719343867

> That's simply untrue. All the hard stuff continues to be reliability and sensor gated.

IR and thermal sensing are unnecessary if the bar is human level and neither is the lidar. The point is overused, but humans rely on two eyes in the driver seat. I don't see any evidence to suggest the modern model that Tesla has developed for their vision system is their limiting factor in the slightest to reach L4/L5.

Dogs jump into the road in front of cars all the time and get killed, and kids get endangered at school bus crossings. That's a reality of life that robotaxis do not need to solve.

rurp · 2024-06-25T19:52:08 1719345128

That vision-only argument is marketing spin from Tesla. The biggest thing it leaves out is that humans process their vision input with a human brain, which Tesla vehicles very much do not have. If and when we create true AGI they will have a good argument, but a world where that exists will be wildly different from our current one and who knows if Tesla's tech will even be relevent anymore.

bangaladore · 2024-06-25T20:51:04 1719348664

Why are you so confident that AGI, or a human brain, is necessary to be able to drive a car with only cameras?

I get annoyed with statements like this because technology changes and advances so quickly, and Tesla has made substantial technical leaps in this field of machine learning. They have the state-of-the-art vision -> voxels/depth models and are only improving.

michaelt · 2024-06-25T21:26:03 1719350763

Tesla, who use cameras only, have not demonstrated full self driving, despite trying for a decade. Elon Musk has stated "It is increasingly clear that all roads lead to AGI. Tesla is building an extremely compute-efficient mini AGI for FSD" [1]

Waymo, who use additional sensors like lidar, have a driverless taxi service which needs no safety drivers.

The evidence kinda speaks for itself, IMHO.

[1] https://twitter.com/elonmusk/status/1740641473849352450?s=20

diebeforei485 · 2024-06-25T23:16:11 1719357371

Waymo does have safety drivers, they're just driving the vehicle remotely when it's in certain areas instead of being in the vehicle. So it isn't "full" self driving either.

> Tesla, who use cameras only, have not demonstrated full self driving

There are entire youtube channels with hours of continuous video showing Teslas driving around SF, but also other parts of California, with no human intervention.

kajecounterhack · 2024-06-25T23:53:36 1719359616

No, Waymo is not driving remotely. Remote operators can only answer simple questions. They're at the point of commercialization so it's all about unit economics. There's no point in driving remotely especially since it does not scale cost-wise.

Waymo is geofenced, but within its geofence it requires zero human intervention. Tesla on the other hand is famous for mistaking the moon for a traffic light. Saying "Tesla has so many miles on YouTube" is hilarious because first of all there are channels with lots of Cruise & Waymo footage too, and more importantly it's not the # of miles that matters, but the # of non-trivial scenarios you can handle.

diebeforei485 · 2024-06-26T00:51:01 1719363061

I don't see why Tesla can't handle those scenarios if they also use remote operators. I wouldn't be surprised if they do.

Btw Waymo is nowhere near achieving unit economics. Their cars cost like 5 times what Teslas cost, and the sensors require a lot of upkeep and maintenance.

kajecounterhack · 2024-06-26T04:16:09 1719375369

Who’s to say what unit economics they’ve achieved but I’d hazard to guess that their investors wouldn’t support expanding their fleet and service unless the unit economics are at least close to break even. Cost for sensors and overall BOM keeps going down as more suppliers enter the market.

the-rc · 2024-06-25T23:54:11 1719359651

Are you saying that there are times when a Waymo car's ability to respond to events is at the mercy of a random Internet connection? What happens if the safety driver is steering remotely, from another town, and there's packet loss for a couple of seconds in the middle of a curve?

ladon86 · 2024-06-26T00:58:13 1719363493

Again, they don’t drive or steer remotely. What sometimes happens is a multiple choice question is presented to the operator in an ambiguous situation:

<photo of construction zone> Can I drive through here? [Yes] [No]

When this is happening, the car is stopped and lets the passenger know that it’s reaching out for remote help to figure out what to do. For me this has happened two times across my 125 Waymo rides (571 miles) so far, and was resolved in under 20 seconds. Though I must say, 20 seconds feels like ages when you’re in the car and blocking traffic!

the-rc · 2024-06-26T02:23:12 1719368592

I agree with you, I was just highlighting for the other poster how difficult/crazy/riskier remote driving would be.

diebeforei485 · 2024-06-26T00:47:00 1719362820

In that case if the car isn't able to decide what to do, it stalls in the middle of the street. Happens all the time in SF.

It's probably not accurate to say that the remote operator is "steering" though. It's more like pathfinding.

rootusrootus · 2024-06-25T19:45:41 1719344741

> humans rely on two eyes in the driver seat

Eyes which are orders of magnitude capable than the best cameras, and Teslas come with mediocre cameras, not the best. Eyes which are connected to a brain, and ML is a looooong ways from rivaling that.

> That's a reality of life that robotaxis do not need to solve.

Robotaxis do not need to account for things jumping out unexpectedly in front of them?

bangaladore · 2024-06-25T20:52:11 1719348731

Again, another claim that a brain or AGI is required to drive a car. Does anyone have any research to cite that establishes this seemingly known fact?

mlyle · 2024-06-25T21:19:05 1719350345

I am not sure that the vision in Teslas is adequate with -any- amount of processing to drive a car. Spatial resolution is limited, as is seeing distant vehicles during merges, etc.

Secondarily, there is no guarantee that the amount of processing is enough, because the extant human systems use much more.

“Cheating” by using more sensors to simplify out complexities and to cover for the shortcomings of other sensors in the suite seems wise.

kajecounterhack · 2024-06-25T21:50:49 1719352249

> “Cheating” by using more sensors to simplify out complexities and to cover for the shortcomings of other sensors in the suite seems wise.

Also, "cheating" is just a necessary step to build baseline metrics. You need ground truth.

It may very well be the case that cheating is needed to generate the training data necessary to stop cheating...someday.

mike_hearn · 2024-06-25T20:44:52 1719348292

Orders of magnitude more capable than the best cameras? I wish. I need corrective lenses for my eyes to even work at all. With that fixed they feed my brain an image that's upside down, black and white except in the centre, which is covered in blood vessels and which has a blind spot. They also take a long time to adjust to sudden changes in lighting conditions, don't do any true depth sensing, suffer frequent frame drops and can't run for more than about 20 hours at a time before they basically stop working.

My brain tries to hides all this from me, and makes me think that I see the world in glorious 3D technicolor all the time, but that's a lie as revealed by the many amusing optical illusions that have been discovered over the years.

Meanwhile, today I used ML that knows more than me, can think and type faster than me, which is a much better artist than me and which can read and react far faster than me to visual stimuli. Oh, it can also easily look in every direction simultaneously without pausing or ever getting distracted or bored.

Somehow it doesn't feel like I have a big advantage over computers when it comes to driving.

mlyle · 2024-06-25T21:20:14 1719350414

The cameras we are talking about have poor angular resolution, worse dynamic range than the human eye, and don’t do any direct depth sensing either.

mike_hearn · 2024-06-25T21:30:30 1719351030

Are we talking about Tesla's cameras or the "best" cameras? There are smartphone cameras that do depth sensing and HDR, and cameras are cheaper than eyeballs so composing them to get more angular resolution seems OK.

mlyle · 2024-06-25T22:35:44 1719354944

ToF/structured illumination cameras are honestly not that capable.

The maximum dynamic range of the eye is ~130dB. It's very difficult to push an imaging system to work well at the dark end of what the eye will do with any decent frame rate.

It's not as different as it used to be, but even so: the Mk. I eyeball does pretty damn well compared to quite fancy cameras.

kajecounterhack · 2024-06-25T23:11:48 1719357108

> There are smartphone cameras that do depth sensing and HDR

Depth sensing is again, estimated or using time of flight sensors which is pretty much short-range lidar. HDR is used already in AV perception, but still loses to your eyeballs in dynamic range and processing time.

mike_hearn · 2024-06-26T08:00:55 1719388855

Yes, but human depth sensing is also estimated.

Eyeballs have high dynamic range but with high mode switching times. Walk from a bright area to a dark area and it'll take seconds for your eyes to adjust. Cameras are so cheap you can just have a regular day camera and a dedicated night vision camera together, switching between feeds can be done in milliseconds.

kajecounterhack · 2024-06-26T08:20:49 1719390049

> Yes, but human depth sensing is also estimated.

Robots aren't humans. You need accurate depth perception to maneuver a robot precisely, and you need ground truth depth measurements to train learned depth perceivers as well as to understand their overall performance. Humans learn it by combining their other senses and integrating over very long time using very powerful compute hardware (brain). To date, robots learn it best when you just get the raw supervision signal directly using LiDAR.

> Walk from a bright area to a dark area and it'll take seconds for your eyes to adjust

You do realize cameras have the same issue, and that HDR isn't free / is very computationally intensive?

kajecounterhack · 2024-06-25T21:24:58 1719350698

> My brain tries to hides all this from me

Your brain is _really really_ good at surmounting challenges including many that you did not mention. We don't know how to get close to this in terms of reliability when using cameras and ML alone. Cameras and ML alone can go very far, but every roboticist understands the problem of compounding errors and catastrophic failure. Every ML person understands how slow our learning loops are.

Consider that ML models used in the field have to get by with a fixed amount of power and ram. If you want to process time context of say 5 seconds, and with temporal context 10Hz and with resolution 1080p, how much data bandwidth are you looking at? Comparing what you see with your eyes with a series of 1080p photos, which is better? Up it to 4k: how long does it take to even run object detection and tracking with a limited temporal context?

Your brain is working with more temporal context, more world context, and has a much more robust active learning loop than the artificial systems we're composing today. It's really impressive what we can achieve, but to those who've worked on the problem it feels laughable to say you can solve it with just cameras and compute.

There are plenty of well respected researchers who think only data and active learning loops are the bottlenecks. In my experience they're focused on treating the self driving task as a canned research problem and not a robotics problem. There are as many if not more respected researchers who've worked on the self-driving problem and see deeper seated issues -- ones that cannot be surmounted without technologies like high fidelity sensors grounded in physics and HD maps.

Even if breadth of data is the problem and Tesla's approach is supposedly yielding more data -- there is also the question of the fidelity of said data (e.g. the distances and velocities from camera-only systems are estimated and have noiser gaussians than ones generated with LiDAR). If you make what you measure, and your measurements are noisy, how can you convince yourself or your loss function for that matter that it's doing a good job of learning?

It's relatively straightforward to build toy systems where subsystems have something on the order of 95% reliability. But robotics requires you to cut the tail much further. https://wheretheroadmapends.com/game-of-9s.html

RankingMember · 2024-06-25T19:45:27 1719344727

> if the bar is human level

IMO the bar is well above human-level if you actually want to attain the level of trust necessary to remove the steering wheel from a car.

rootusrootus · 2024-06-25T19:51:35 1719345095

Agree 100%. And IMO it is worth remembering that a really significant share of collisions are caused by well known risk factors. For those of us who avoid being in those situations to begin with, the robotaxi would need to be a good bit safer than our average.

kajecounterhack · 2024-06-25T19:41:46 1719344506

> I don't see any evidence to suggest the modern model that Tesla has developed for their vision system is their limiting factor in the slightest to reach L4/L5

For one, frame rate and processing rate on human eyes is way higher than cameras. Dynamic range is another. Also, Cruise and Waymo are some of the only companies that have hard internal data / ability to simulate how well their safety drivers do, and in the very same scenario what their software driver will do. Without LiDAR you can't build that simulation, and once you have that data if you continue to use HD Maps and LiDAR there's probably a good reason.

> Dogs jump into the road in front of cars all the time and get killed, and kids get endangered at school bus crossings. That's a reality of life that robotaxis do not need to solve.

Robotaxis need to avoid any accident that a human would be able to avoid.

> IR and thermal sensing are unnecessary if the bar is human level

See, you could say this if you had some data that showed that incidents per X miles (when the vehicle is driving at night) is sufficiently low, + if the software passes some contrived scenarios to gut-check its ability to see in the dark with the necessary reliability. But you don't have that data, do you? Someone has it though :) and I'd argue regulators should have it too.

bangaladore · 2024-06-25T20:48:08 1719348488

> For one, frame rate and processing rate on human eyes is way higher than cameras.

I don't think it's exciting to say that you must have theoretical parity with something to use it for this use case. Tesla's solution monitors ~6? cameras at once with accurate depth in each. That's 6x more views than a human can see. I wish people would stop comparing apples to oranges.

> Robotaxis need to avoid any accident that a human would be able to avoid.

I never said anything to the contrary. Animals get hit all the time, not just because a human wasn't paying attention.

kajecounterhack · 2024-06-25T21:33:14 1719351194

> Tesla's solution monitors ~6? cameras at once with accurate depth in each

No, the depth is estimated. It's not accurate, at least not in the way you need for L4.

> I never said anything to the contrary. Animals get hit all the time, not just because a human wasn't paying attention.

I was just clarifying what the bar is. The bar is that avoidable accidents need to be avoided. Nobody will get mad if a plane crashes due to unavoidable circumstances (freak accident where two engines go out due to bird strikes or something). People will stop flying in the plane when it becomes clear that the airline is not doing everything it can to avoid fatalities.

consumer451 · 2024-06-25T20:25:36 1719347136

> The only selling point of FSD (Supervised) is that it (can) work "everywhere."

I seem to recall Musk saying in the last couple years that "full self driving will basically require AGI." This appeared to me to be extremely honest and accurate, though I believe that in the moment he was trying to promote the idea that Tesla was an AGI company.

Does anyone happen to remember when he said that?

Mockapapella · 2024-06-26T05:07:21 1719378441

Would have been at one of their AI days in the last few years

ddalex · 2024-06-26T09:05:59 1719392759

> requires a ton of data

Not a problem for Google, right?

> constant updating

I guess the cars can and will update the mapping in real time ?

> at best in certain cities

If mapping a city is possible, so it's mapping a highway, even easier.

If cars do update the maps themselves, they require might just a couple of human-driven passes of the standard WWaymo cars on a highway to generate the maps.

solardev · 2024-06-26T06:22:05 1719382925

The obvious question here is "why not both". Use mapping data where you can, LIDAR and other sensors where you can, and visual cameras when you must. There's no reason to limit yourself to just one input type. Elon claims that, sure, but it doesn't seem like a given at all.

hindsightbias · 2024-06-25T23:56:21 1719359781

I’ve watched Waymos navigate the Tenderloin. It doesn’t get any more random than that.