A new software release has driven zero miles. You do not know what kind of perfo...

jessriedel · 2024-06-28T22:23:29 1719613409

> They could have no deaths for years, then suddenly a whole lot after a bad software push.

The software release process itself becomes the thing that is tested statistically. If they release 100 versions, and each is safer than the last, it’s silly to think that one can’t be confident in the safety of the 101st without some gov’t software approval/validation process.

> What happens to Waymo if they kill even 1 child due to a bug or lapse in model quality? …would you continue to let your kid ride Waymo?

If they had driven billions of death-less miles and then a software update killed a child, I would obviously let my kid ride Waymo. It wouldn’t be close.

kajecounterhack · 2024-06-29T04:31:38 1719635498

> If they release 100 versions, and each is safer than the last, it’s silly to think that one can’t be confident in the safety of the 101st without some gov’t software approval/validation process.

That's just the thing. Each release is not always safer than the last. Even ascertaining that with any reasonable level of certainty requires a lot of technology. In a company's hands alone, this "trust me bro" amounts to a ticking time bomb since companies have a tendency to trade safety for profit.

> If they had driven billions of death-less miles and then a software update killed a child, I would obviously let my kid ride Waymo. It wouldn’t be close.

You don't think the outcry at Boeing applies to Waymo?

rsanek · 2024-06-27T23:35:35 1719531335

I feel like this concern is mostly a thing having to do with safe rollouts. Just as with any other software, I'm sure there's a gradual rollout to ensure bugs don't just kill a ton of people at once. Google and other large cos have largely solved the reliability problem with web services. while cars are ofc different i think the same learnings apply

kajecounterhack · 2024-06-29T04:40:05 1719636005

> while cars are ofc different i think the same learnings apply

They are very different. Unit tests and stress tests can be conducted on pure software. You can monitor them and if there are failures you can retry.

Robot software needs all the stuff you use for regular software, but then you have this huge space of counterfactuals that can only be verified using large scale simulation. How do you know the simulator is realistic? That it has a sufficient level of scenario coverage? That the simulation was even run? That the car didn't catastrophically regress in a way that will lead to a crash after 1M miles of driving (which in a scaled service will happen quite frequently)?

Oh and a fault can result in a death for Waymo, but not for Google Search. So that's kind of a major difference.