Post-Mortem for Google Compute Engine’s Global Outage on April 11

brianwawok · on April 13, 2016

This is a very good Post-Mortem.

As I assumed it was kind of a corner case bug meet corner case bug met corner case bug.

This is also why I am of afraid of a self driving cars and other such life critical software. There are going to be weird edge cases, what prevents you from reaching them?

Making software is hard....

ben_jones · on April 13, 2016

Self driving cars don't have to be perfect. They just have to be safer then driving is today [1].

The real question is if society can handle the unfairness that is death by random software error vs. death by negligent driving. It's easy to blame negligent driving on the driver, we're clearly not negligent so it really doesn't effect us right? But a software error might as well be an act of god, it's something that might actually happen to me!

[1]: https://en.wikipedia.org/wiki/List_of_motor_vehicle_deaths_i...

gliderShip · on April 13, 2016

Well No, There is an upper limit on the damage a bad driver can do by say crushing his car with a bus or something like that. Imagine a bug or malware triggered at the same moment world-wide. It could kill millions. So it not as simple as 'It just has to be better than a human'

essayist · on April 13, 2016

I've been itching to release this terror movie plot into the wild:

It's 2025 and more than 10% of the cars on the road in the US are self-driving. It's rush hour on a busy Friday afternoon in Washington, DC. Earlier that day, there'd been a handful of odd reports of self-driving Edsels (so as not to impugn an actual model) going haywire, and the NTSB has started its investigation.

But then, at 430pm, highway patrol units around the DC beltway notice three separate multi-Edsel phalanxes, drivers obviously trapped inside, each phalanx moving towards the Clara Barton Parkway, which enters DC from the west. Other units notice four more phalanxes, one comprising 20 Edsels, driving into DC from the east side, on Pennsylvania Avenue.

At this point, traffic helicopters see similar car clusters, more than two dozen, all over DC, all converging on a spot that looks to be between the Washington Monument and the White House.

We zoom in on the headquarters of the White House Secret Service. A woman is arguing vociferously that these cars have to be stopped before they get any closer to the White House. A colleague yells back that his wife is one of those commandeered cars and she, like the rest of the "hackjacked" drivers and passengers is innocent.

Afforess · on April 13, 2016

I'm afraid Daemon (novel) beat you to the punch. It's an excellent novel, about fairly similar situations.

http://www.goodreads.com/book/show/6665847-daemon

pavel_lishin · on April 13, 2016

And a really fun read. If someone here decides to get this book, get the sequel as well - Daemon ends on kind of a cliff-hanger.

mitchellhislop · on April 13, 2016

Freedom (TM) is the sequel, and the author (Daniel Suarez) has a few other near-term what-if-this-all-goes-skynet books which are equally good.

Cerium · on April 13, 2016

Thank you! I didn't know there was a sequel. I really enjoyed the first book.

noir_lord · on April 13, 2016

One of my favourites, the sequels where similarly as good.

DickingAround · on April 13, 2016

Daemon is in that area but I'd for sure also enjoy the original proposal.

rayval · on April 13, 2016

Cool plot line, I'd go see that movie.

A related scenario, one that theoretically could happen today, is hacking into commercial airliners auto-pilot systems, and directing dozens of flights onto a target.

Set aside the fantasy movie plot angle, how realistic is this today? Is it any more or less plausible than the millions of cars scenario? If people are truly concerned about the car scenario, shouldn't they be worrying about the aircraft scenario?

LeifCarrotson · on April 13, 2016

I will disagree with the other commenter and say that this is more plausible for the aircraft than for the cars. Modern jetliners and military aircraft (scarrier yet) are purely fly-by-wire - there aren't cables running between the yokes and the control surfaces like in a Piper Cub, and if there were, no pilot would be strong enough to move them.

Yes, the autopilots can be turned off, but that's just a button, probably a button on the autopilot itself. Depending where the infection happens, the actual position of the yoke could be entirely ignored by the software. Or the motor controllers for the control surfaces themselves could be driving the plane, though I don't know how they could coordinate their actions and get feedback from an IMU.

Perhaps the pilots could rip out components and cut cables fast enough to prevent the plane from reaching its destination, and maybe they could tear out the affected component and limp back to a runway with what remains, but it's an entirely feasible movie plot.

But should we actually worry about either? No. The software sourcing, deployment and updating protocols at the various manufacturers of aircraft are certain to be secure. Right?

SixSigma · on April 14, 2016

In 2007 the FAA revealed the Boeing 787 had passenger Internet traffic and flight control traffic on the same network separated via software firewall.

This gives us the classic reassuring response from Boeing spokeswoman Lori Gunter :

"There are places where the networks are not touching, and there are places where they are," she said.

http://www.wired.com/2008/01/dreamliner-security/

m_mueller · on April 14, 2016

"had"? So it's fixed now, did they rewire the whole network?

SixSigma · on April 14, 2016

Oh, I don't know if they ever air gapped them fully. The FAA made them do some changes is all I know.

mschuster91 · on April 13, 2016

> Yes, the autopilots can be turned off, but that's just a button, probably a button on the autopilot itself.

Airplaine components tend to have shitloads of fuses for each components, any trained pilot knows how to disable the fuse for the autopilot system (or, in an extreme case, ALL fuses to kill the entire airplane).

seanp2k2 · on April 13, 2016

In the airplane case, it's possible today: https://m.youtube.com/watch?v=CXv1j3GbgLk

And

https://m.youtube.com/watch?v=Uy3nXXZgqmg

TL;DR you simulate a bunch of other planes in close proximity and the auto-pilot freaks out and tries to avoid them. As the second talk explains, the pilots would definitely notice and switch autopilot off. This is why IMO it's very important to not take ultimate control away from humans in cars. I would personally never buy one of the Google (or any other) self-driving models with no controls. It already freaks me out that many cars are drive-by-wire (for the accelerator), and now even steer-by-wire: http://www.caranddriver.com/features/electric-feel-nissan-di... #noThankYouPlease

neurotech1 · on April 14, 2016

No current airliner will automatically change course in response to a traffic conflict. If TCAS [0] gives an advisory, the pilot takes manual control or reprograms the autopilot. Spoofing transponder returns wouldn't do much to the aircraft except annoy the pilots.

Another reason traffic spoofing wouldn't cause the aircraft to deviate is that airliners fly standard approaches and departures (STAR [1] and SID [2]) and heavy traffic away from the approach paths would definitely get noticed.

Even the fly-by-wire Airbus can be flown manually using differential thrust and/or pitch trim control.

The only time I've heard of an Airbus loosing control of a damaged engine is when the electrical cable was physically severed. This was Qantas QF32 [1], after one engine exploded and damaged the cables to another engine.

To "take over" an aircraft with pilots in the cockpit, would require the compromise to multiple systems.

[0] https://en.wikipedia.org/wiki/Traffic_collision_avoidance_sy...

[1] https://en.wikipedia.org/wiki/Standard_terminal_arrival_rout...

[2] https://en.wikipedia.org/wiki/Standard_instrument_departure_...

[3] https://en.wikipedia.org/wiki/Qantas_Flight_32

EmployedRussian · on April 14, 2016

> I would personally never buy one of the Google (or any other) self-driving models with no controls.

Google cars have the Big Red Button, which shuts off self-driving system and brings the car to a stop.

What more controls do you need?

infinite8s · on April 14, 2016

When are you barreling down a highway at 65 miles per hour, turning off the car might not be the best solution.

EmployedRussian · on April 15, 2016

When you are barreling down a highway at 65 miles per hour and are not paying attention (and you wouldn't, because the car drives itself just fine), giving you controls is much more dangerous (for you and others around you) then not.

Urmson talks about it here: https://youtu.be/Uj-rK8V-rik?t=14m3s

blackflame7000 · on April 13, 2016

If a fuel injection system were to fail via fried component or even a short would trip a fuse and cause it to fail safe by cutting fuel and shutting off the car. Fuel Throttle cables however have definitely become stuck in their sheathing in the WOT position. Happened to my dad on the highway in a 1992 Rodeo Isuzu.

landryraccoon · on April 13, 2016

> I would personally never buy one of the Google (or any other) self-driving models with no controls.

It won't matter if all the other cars on the road besides yours don't have controls.

sk5t · on April 14, 2016

A minor point, but the electronic accelerator control in autos is called "throttle-by-wire."

dfox · on April 14, 2016

I've always seen that called EPC for Electronic Pedal Control, but that is probably VW-ism.

On the other hand on EFI car, having mechanical throttle cable does not add much to hack-safety as the ECU always has some way to override closed throttle (either disengaging throttle pedal mechanically switches the control of throttle to ECU operated servo or there is completely separate throttle controlled by ECU).

pavel_lishin · on April 13, 2016

> hacking into commercial airliners auto-pilot systems, and directing dozens of flights onto a target.

I would imagine that any pilot would figure out what was going on, unless it was on an incredibly foggy day.

blackflame7000 · on April 13, 2016

It's Hollywood, name one movie where the villain did not disable the manual override. That's villainy 101.

pavel_lishin · on April 13, 2016

Sure, but rayval was talking about a scenario that could happen today.

Although looking at the other comments, I think I'm significantly underestimating just how much of modern airliners is dependent on software. The pilots might be able to see that they're heading for disaster, but may not be able to do anything about it.

blackflame7000 · on April 13, 2016

I know for a fact that there are 3 separate computer systems from 3 separate manufacturers on each Boeing airplane. Auto-pilot always uses the consensus of the 3 machines. It's a pretty far-fetched scenario in real life so I thought we were talking fiction.

coredog64 · on April 14, 2016

Former Boeing software engineer, worked on engineering simulators (where real hardware was in the loop):

There is an idea of triple channel autolanding, wherein the plane uses the consensus of the three autolanding systems. Should no consensus be available, then the pilot is advised that autolanding is not available.

Other than that, any sourcing from different manufacturers is happenstance. 737 avionics are sourced from a different vendor than 747/757/767/777. And different functions can come from different vendors, although vendor consolidation has cut down on that.

I'm not across what happened post 777, as I left Boeing in 1999.

tedks · on April 13, 2016

This is a good movie plot, it just has one huge plot hole:

>It's rush hour on a busy Friday afternoon in Washington, DC.

>each phalanx moving towards the Clara Barton Parkway

>all converging

DC rush hour? Moving cars? Please. Independence Day made me suspend less disbelief.

nefitty · on April 13, 2016

Maybe it'd be waves of vehicles, the first few waves crashing through traffic to make way for the rest of the swarm... I'm surprised I hadn't heard of this doomsday scenario yet haha

Johnny555 · on April 13, 2016

You don't need self driving cars for such a scenario to happen -- cars are increasingly drive by wire, and driver assistance features being added to cars (automatic lane keeping, automatic breaking, smart cruise control, etc) mean computers are already capable of taking over cars.

outworlder · on April 13, 2016

I've come to that realization when driving my Leaf.

You see, with an internal combustion engine, there are several ways that you can stall the engine, even if the computer is controlling it. As long as you can stop it from rotating, it will stall.

Now, take a Leaf. The engine can't physically stall. It is completely controlled by electronics – in contrast, even an ICE with an engine control unit will have some of it being driven mechanically (valves and driveshaft are all mechanical). This also causes cars to "creep" when you release the brakes, as the engine has to keep rotating. In the Leaf, the "creep" exists, but it is entirely simulated.

Similarly, the steering is also electric and controlled by algorithms (more assist in parking lot, less in the highway).

Braking is also software-controlled. The first ones had, as people called them, "grabby breaks" (it would use regenerative breaking with a light force, if you pressed more, the breaks would suddenly "grab" the wheel). This was fixed in a software update.

Turning on and off is also a button. Can't yank the keys either.

So yeah, presumably, a Leaf could turn on, engage "drive" and start driving around, all with on-board software. It lacks sensors to do anything interesting, but the basic driving controls are there.

Good thing it cannot be updated over the air.

unprepare · on April 13, 2016

I can see it now - a modern day tiananmen square as a lone figure stands in front of a long line of Edsels

seanp2k2 · on April 13, 2016

It's Will Smith, he's the president, and he just got finished dealing with a one of his super-cute children acting up, but actually doing something noble (but getting in trouble at school for it). Upon seeing the swarm: "Aww helllll naw"

Splines · on April 13, 2016

Autonomous machines gone haywire?

Gratuitous CGI?

Will Smith saying "Awww, hell naw"?

https://www.youtube.com/watch?v=L1UxZJ9owXY

TeMPOraL · on April 13, 2016

Preventing the terror plot because the "do not drive over humans" goal overrides the "navigate to preselected target" goal?

unprepare · on April 13, 2016

my first thought actually was, how would the cars react to road spikes thrown by police? Especially on a highway with barriers on either side.

Of course a competent writer would've thrown in a line about how these cars are on run flats at some point...

Our only hope is for the scientists in the So-Secret-President-Doesnt-Even-Know Facility to come up with something so crazy it just might work

pacaro · on April 13, 2016

An extremely selective and directional EMP cannon springs to mind as being sufficiently Hollywood

brainfire · on April 15, 2016

There are actually already physical barriers throughout the governmental parts of Washington DC to prevent this sort of thing. There are permanent walls, blocks, and poles along the edges of the roads (often well-integrated into the architecture,) and raise-able barriers built into the road surface at intersections. Ain't no cars running over our president!

essayist · on April 13, 2016

Fun. Johnny555 is right of course, and it was the Jeep hacked on the Interstate last year that first inspired me. The airplane scenario may be more likely, but I liked the additive nature of so many cars, cars a bit like insects. (You could go further in this direction by conjuring up multiple drone swarms.)

As to movie points: of course Will Smith is the hero, and we'll handle DC rush hour stasis through special effects. ;-)

dsfyu404ed · on April 14, 2016

You're over thinking it. Hack into onstar, brick every connected vehicle at hh:mm and have some gunmen start shooting at hh:mm + 1min.

foota · on April 13, 2016

Sounds vaugely similar to speed[1] (okay just in the sense that there's a vehicle that won't stop)

[1] https://en.wikipedia.org/wiki/Speed_(1994_film)

prawn · on April 14, 2016

I'm less worried about vehicle incidents on flat surfaces and more worried by anything happening on mountain roads with deadly drops on one side.

bigiain · on April 14, 2016

From a friend a while back:

https://medium.com/@mpesce/the-great-hack-part-one-attack-70...

Benjammer · on April 13, 2016

If a bug can kill millions then it's not "better than a human" though, right?

sriram_sun · on April 13, 2016

Car manufacturers conduct recalls all the time. There might be the possibility that a million self-driving cars will be held hostage from a remote control tower simultaneously leading to injury or death to millions. However, in practice, as soon as an issue is discovered, there will be the equivalent of recalls (remote updates) and things like this will be fixed. People who are uncomfortable with self driving cars will always be able to drive manually or override the automated controls. At some point, technology will progress enough that the benefits will outweigh the risks and people will adopt.

seanp2k2 · on April 13, 2016

Car manufacturers are some of the last people I trust to be doing software updates. The recent Takata airbag recall is an example of the ensuing fecal tornado from large recalls: http://blog.caranddriver.com/massive-takata-airbag-recall-ev...

In some cases, people are having to wait months to get new airbags because they just don't have them in stock. In the computer case, would you want to keep driving until they can get you scheduled for a software update? Remember that many cars can't update critical software OTA.

dpark · on April 13, 2016

> Car manufacturers are some of the last people I trust to be doing software updates. The recent Takata airbag recall is an example of the ensuing fecal tornado from large recalls: http://blog.caranddriver.com/massive-takata-airbag-recall-ev...

>In some cases, people are having to wait months to get new airbags because they just don't have them in stock. In the computer case, would you want to keep driving until they can get you scheduled for a software update? Remember that many cars can't update critical software OTA.*

So I assume you don't own a car and you avoid them at all costs? Otherwise your paranoia becomes hypocrisy. If you cannot trust the car company to deliver software updates, you can't trust them to write the software in the first place, and modern cars are full of safety-critical software.

I also don't know why you're equating the a manufacturing capacity limitation with a software update limitation. It's not as if Toyota is going to have trouble shipping bits a million times vs a thousand times once the software update is written.

I think we can also safely assume that self-driving cars will generally be updatable OTA. But yes, you could drive it to the dealer if needed, and worst case the dealer could send people on-site to do the update.

coredog64 · on April 14, 2016

Allegedly, Honda is offering rental cars to customers who are concerned about their safety and there are no parts available to repair their vehicle.

I say allegedly, because my local Honda dealership told me to pound sand when I asked for a rental car for the day they needed to repair my CRV.

creshal · on April 13, 2016

A friend of mine works for VW's engine computer division. Yes, those engine computers. After all I've heard of their development methods (or lack thereof), I'm surprised the engines even start more often than one time out of ten.

72deluxe · on April 13, 2016

My VW Golf has a bug where the driver's side door will be completely unresponsive after starting the ignition, with all the lights on the door being off too. After 5 to 10 seconds it will become responsive, which is a bit annoying if you're trying to open the windows to clear the damp mist on them, as you can't....

Also, if during normal routine you run through all four electric windows to close them (so passenger, driver, passenger rear, driver rear) in that order, you hear the solenoids click in a COMPLETELY different order. I am not sure if it is prioritising the messages in some way but the order that the windows "click" is not the order I press the buttons.

Also, I can get the CD player to crash.

Such minor noticeable issues make me think about the quality of the more important bits somewhat.

The breakdown on Toyota's safety code was interesting; and frightening really.

qaq · on April 14, 2016

Care to elaborate? Kernel developers are not practising SCRUM or TDD yet are shipping a fairly stable product.

creshal · on April 14, 2016

But they do know what a VCS is, and they don't re-invent lint because they want to ship broken code and need to only check for 2-3 minor issues while leaving the rest alone, as they need to rely on "magic" code exploiting undefined behaviour in certain hardware+compiler combinations.

xiphias · on April 13, 2016

People won't be able to override a buggy software: they can't even do that now, just look at the remote Audi and BMW hacks that can brake the car on the highway.

seanp2k2 · on April 13, 2016

Or the jeep: https://blog.kaspersky.com/blackhat-jeep-cherokee-hack-expla...

Auto mfgs seem to be about 20-30 years behind when it comes to computers. Not really surprising that Tesla is whomping them on this front, given how SV people are scrambling to work there. You don't see that with the Big 3 or really any other car mfg.

darawk · on April 13, 2016

Depends on how unlikely that bug is :)

probability x value, etc.

Matt3o12_ · on April 13, 2016

There is something that's called fail safe(ly). In case of any error inside the car, it should sowly decelerate and pull over. Sure, it will cause a lot of traffic problems if say 10% of all cars did that at the same time but the damage would not be as severe as a ghost driver entering the freeway with 140mph.

There are a few instances where some bad tesla batteries (the standard 12 volt batteries ironically) failed, and the cars handled it perfectly. It slowed down so that the driver could do safely pull over. Sure it did not happen to all cars at once, and autonomous cars migh not be able to do that by themselves but we have a log way to go to reach 100% autonomous driving (i.e. Without a steering wheel and a car that drives everywhere humans drive and not only on San Francisco's perfect sunny roads where it's been thoroughly tested on).

hashkb · on April 13, 2016

Google had fail safes which failed. What if a million cars pull over to the left instead of the right as a failed failsafe? Willing to risk your life?

thisisnotatest · on April 14, 2016

Yes, I'm eager to risk my life on self-driving car failures; it would be a tremendous step up from risking my life on human drivers (including myself!) as I do on a daily basis.

I'm also a biker. In 2013, 4,735 pedestrians and 743 bicyclists were killed in crashes with motor vehicles. http://www.pedbikeinfo.org/data/factsheet_crash.cfm

In the future when self-driving or at least augmented driving is commonplace, I hope that number will be a lot lower.

distrill · on April 14, 2016

I think it still is that simple. Clearly killing millions is not better than human control, so would be unacceptable.

bduerst · on April 13, 2016

Sounds like current hardware defects and recalls for vehicles, with the difference being it's easier to fix.

hinkley · on April 14, 2016

Imagine a leap second bug causing vehicles all over the US to crash into NYE revellers.

eslaught · on April 13, 2016

That assumes that failures are uncorrelated. My personal concern is with correlated failures, like those that occurred in GCE. What if cars from some manufacturer all fail simultaneously in the same way (say, because of a software push that rolled out more aggressively than it just have, just as in the GCE case)? That's the sort of scenario I'm really concerned about.

At least with human drivers, the failures are generally uncorrelated.

harryf · on April 13, 2016

> The real question is if society can handle the unfairness that is death by random software error vs. death by negligent driving.

Most people have a greater fear of flying than driving by car although statistically you're far more at risk in a car. One cause of that fear of flying is loss of control; you have to accept placing your life in someone else's hands.

With self driving cars suspect lack of control will also be a problem. Either we need to provide passengers with some vestige of control to keep them busy or we just wait a generation until people get used to it

nothrabannosir · on April 13, 2016

> Most people have a greater fear of flying than driving by car although statistically you're far more at risk in a car. One cause of that fear of flying is loss of control; you have to accept placing your life in someone else's hands.

Really? That sounds counter intuitive. You'd think the reason people are afraid of flying is, because, you know, it's flying. Thirty thousand feet between you and the cold, hard ground. That's a long fall of agony to utmost certain death, and some magic turbo voodoo keeping you from it.

Would people with fear of flying really rather be the pilot?

__jal · on April 13, 2016

Can't find it with a quick google now, but there was a study on this. Giving people fake controls, even when they knew they were fake, reduced anxiety.

Give ambulatory meat some onboard decision-making ability, and it will want to use it.

nothrabannosir · on April 14, 2016

That's incredible! I would never have guessed. I stand corrected, thank you.

harryf · on April 14, 2016

If you want to see the problem practice, here's someone's mother trying out the Tesla self-drive... http://mirror.ninja/sniz

capttombunnlcsw · on April 14, 2016

Maybe not the pilot, but simply being in the cockpit is usually enough to control the fear. Why? First, you can see what is going on, whereas in the cabin you can imagine what might be going on, and think the worst. Second, you can see the pilot is calm. You could watch the flight attendants when in the cabin but that may not be convincing. Third, if you hear or feel something unusual, you can simply ask the captain.

But, since flying in the cockpit isn't available, then what? Get a copy of "SOAR: The Breakthrough Treatment for Fear of Flying" (Amazon editors' 2014 favorite book).

flippyhead · on April 13, 2016

Yeah and it's funny how people seem to have some level of tolerance for death-by-design-flaws-in–hardware but somehow, software is a different kind of engineering endeavor. My guess is this will persist for some time, but eventually even out.

Fiahil · on April 13, 2016

As you say, it can be pretty difficult to accept the unfairness of death by random software error. However, this is a situation that already exist today, but understood and acknowledged by a very small portion of people in our societies. For (a non life-threatening) example, thanks to the post-mortem above, we have a pretty deep understanding of what happened and why during this outage; but for some people it could have been a mere "Spotify is down, again, no more music :(".

I'm very curious to see if our understanding (as a society) of our own technology will improve over time or if people will continue to blame the internet for "not working" 20 years from now.

pyre · on April 13, 2016

Well, this bug took down the entire system. What happens when self-driving software hits a similar bug? I don't think that there is any precedent for that sort of thing with manually driven cars. The scale could easily be larger than 100-car pile-ups due to poor weather conditions.

Bjartr · on April 13, 2016

>I don't think that there is any precedent for that sort of thing with manually driven cars.

A stroke or heart attack while driving?

pyre · on April 13, 2016

A stroke or heart attack while driving has (possibly) nationwide or worldwide reach? I somehow don't think you understand what I'm getting at.

A "Perfect Storm" of bugs could cause a systemic failure that has the possibility to affect all cars everywhere (well, probably limited to a single car-maker / model / etc). This has the possibility to affect millions. Claiming that a stroke or heart attack while driving has the possibility of a similar scope / reach doesn't make sense.

Bjartr · on April 20, 2016

AFAIK, because most new cars today are pretty close to drive-by-wire anyway, there's little if anything that exists today that precludes such a perfect storm scenario from occurring with the ECUs in human driven vehicles today other than whatever internal processes manufacturers happen to have in place to avoid ECU bugs.

So in that respect, the introduction of self-driving-cars won't necessarily make such events more likely.

jschmitz28 · on April 13, 2016

That would only affect the person who had a stroke or heart attack and their close vicinity. The parent is suggesting a scenario where the manual driving equivalent would be every single driver getting a stroke or heart attack at the same time.

SubuSS · on April 13, 2016

I think the bigger worry is not the perfection part: It is the uniformity part. Considering software will be replicated (ignoring the ML ways of driving for now), updated and refreshed en masse, the impact is going to be very severe. A single nut case shoots up one school or his neighbors. The whole world turning into nut cases is going to be a walking dead scenario.

We see this ALL the time with ALL the big companies including the ones I have worked for in the past. I am very interested in possible solutions people are cooking up here.

tambourine_man · on April 13, 2016

There's also the ethical problem of life and death choices that will have to programmed in advance.

When an accident is inevitable, software will decide if prived or public property should be prioritized, which action is more likely to to protect driver/passenger A in detriment of driver/passenger B, etc.

Most people wouldn't blame the outcome of a split second decision made in heat of the moment but would take issue when the action is deliberate.

Interesting times we live in

xanderjanz · on April 13, 2016

I imagine we'd treat it like we do getting hit by a drunk driver. Vilify reckless programmers, and don't think about it.

deegles · on April 13, 2016

This would have a dramatic chilling effect on hiring for self-driving car software developers, which would ironically make them less safe.

lallysingh · on April 13, 2016

It might lead to some decent work on formal verification of programs.

elcapitan · on April 13, 2016

Is there actually some way of formal verification of software that is driven mostly by machine learning? I imagine there's a small core of code that runs some trained models, but how are they being formally verified? How do we know there's not a blind spot in that model that turns out to be fatal under certain conditions?

lallysingh · on April 13, 2016

I'd imagine that you'd have some hybrid of ML and traditional code, and may be able to reason statistically about the ML sections, and user traditional (verified) code to cut the tail on the distribution.

All pipe dreams of mine, but the research potential here could be worth flaming truckloads of grant money :-)

iopq · on April 13, 2016

The problem is that perfectly-working software can easily get you in an accident. "Well, the bus didn't yield, and the programming said it would"

It's not quite possible to write a car that avoids ALL accidents because a car has a speed and a turning radius and breaks only work so fast.

seanp2k2 · on April 13, 2016

To other replies, since we've reached max depth: yes, and that's why it makes no sense to me to equip such badly-spec'd vehicles with the self-driving bits first (Tesla S excluded). I drive a higher-than-normal performance vehicle for the exact reason that it gives me more options when I need to get out of a bad situation. I can out-break, out-swerve (more lateral grip), and out-accelerate most other cars on the road. The stopping distance (factor of tire grip and break power) has definitely helped me avoid accidents. Lower body roll = more grip during more extreme maneuvers + more control during them.

Top Gear discussed speed limits being set based on worst-case breaking distances. Here's an analysis of that: http://www.jmp.co.uk/forward-thinking/update/top-gear-and-sp... but TL;DR performance vehicles can be safer since they're more capable.

Think what an F1 or rally car could do with computers driving it and avoiding accidents.

sah2ed · on April 13, 2016

Mind me asking which performance car you drive?

ben_jones · on April 13, 2016

Still would be nice if the software could go "shit I'm about to get into an accident" and deploy air bags etc. that much faster. Plus now that I'm thinking about it, most accidents probably occur with some option that, had it been utilized, would've prevented the incident. Imagine the insane accident-avoiding swerve maneuvers a software program could potentially pull off.

TeMPOraL · on April 13, 2016

> It's not quite possible to write a car that avoids ALL accidents because a car has a speed and a turning radius and breaks only work so fast.

It's not possible to have a self-driving car avoid all accidents, but one can presumably get pretty close. The realtime data from sensors give you enough information about the car itself, its surroundings and other objects around it to continuously compute a safety envelope - a subspace of the phase space of controllable parameters (like input, steering) within which the car can stop safely - and then make one of the goals to aggressively steer the car to remain in that envelope. This approach should be able to automagically handle things like safe driving distances or pedestrians suddenly running into the street.

Of course there will be a lot of details to account for when implementing this software, but it's important to realize that we have enough computing power to let the car continuously have every possible backup plan for almost any contingency in its electronic brain.

dfox · on April 14, 2016

The problem with formal verification of systems like this is that checking whether software matches the specification or whether the specification is self-consistent is not the interesting problem. Whether specification matches the real world is the interesting problem and there is no formal way to verify that.

pjlegato · on April 13, 2016

For this reason, most self-driving car development will only happen at large companies like Google, Tesla, Ford, etc. because they are the only ones who will be able to afford to purchase a massive general liability insurance policy.

seanp2k2 · on April 13, 2016

I think that the other part with insurance is that insurers have no idea what the risk involved actually is (since it's not been around long) so they aim way high to cover themselves.

nothrabannosir · on April 13, 2016

They should aim as high as they can; the price of a product is not bounded by its cost, but by what people are willing to pay.

llamataboot · on April 13, 2016

No because negligent driving doesn't just put the driver at risk - it puts everyone else on the road plus pedestrians at risk

wefarrell · on April 13, 2016

Exactly, negligent drivers don't just kill themselves and there is very little you can do to prevent one from killing you.

amelius · on April 13, 2016

> Self driving cars don't have to be perfect. They just have to be safer then driving is today

But how is Google or any other manufacturer going to test their software updates? Are they going to test-drive their cars for tens of thousands of miles over and over again for every little update?

agildehaus · on April 13, 2016

Google's cars drive millions of miles daily in simulation already. They take sensor data (including LIDAR) from past drives and essentially redrive their entire dataset with the updated software.

Here's Google talking about it: https://static.googleusercontent.com/media/www.google.com/en...

Of course this is only one component to feeling confident about pushing out an update where lives are on the line, but it's a key component.

kevindqc · on April 13, 2016

But if a new update changes how the car drives, wouldn't the data that would've been captured by the updated car be different than what the old outdated car recorded?

For example if the new update makes the car more aggressive, then other real drivers might be more careful, slow down more, etc compared to the original runs?

agildehaus · on April 13, 2016

I'd assume the more severe the change the more they'd want to test in the real world.

And that once a comfortable self-driving experience is found, Google will not want to change it much.

quicklyfrozen · on April 13, 2016

As long as there are human drivers, Google will need to continue updating software. We'll find exploitable behavior (e.g. how to make the self driving car yield) just like we find weaknesses in their search algorithms, and they'll need to adapt.

amelius · on April 13, 2016

> I'd assume the more severe the change the more they'd want to test in the real world.

I'm sorry but the important point is: how are we going to agree what needs more testing, and what can be updated without testing? If we let those big companies decide about those issues, then I'm afraid we will soon see another scandal like VW, except possibly with deadly consequences.

I can already predict the reasoning of those companies: last quarter our cars were safer than average, so we can afford some failures now.

cowardlydragon · on April 13, 2016

OR you group a set of changes into say... a numbered RELEASE and THEN integration/user acceptance test it.

I thought programmers were on here :-)

794CD01 · on April 13, 2016

They don't have to be safer than driving is today. They can be significantly less safe while still being an improvement for society because drivers will be able to focus on other activities while travelling instead of wasting that time focusing on driving the car.

Grambo · on April 13, 2016

Actually they have to be significantly safer than driving today. People would rather be unsafe and in control than not in control and a tiny bit safer. I know personally if a self driving car could only drive as well as I could then I'd still want to be the one driving.

oldmanjay · on April 13, 2016

People only have the illusion of safety when in control, and are also demonstrably incapable of judging their own ability to perform tasks. Your criterion won't be taken seriously by anyone involved in policy, because this is already well understood.

TeMPOraL · on April 13, 2016

While I agree with your first sentence, I think you're ignoring the fact that when media get wind of a case like this, it's almost only the irrational opinions of masses that matter. In western democracies, politics - and thus policies - is driven by pandering to the population (and bribes^Wlobbying).

g8oz · on April 13, 2016

At a certain point the policy question will inevitably be: why should any regular person be even allowed to drive given the superior abilities of the machines? There are certain ideological assumptions that will then have to be debated. Making your own mistakes is a consequence of freedom. Limiting the freedom to make mistakes for the overall benefit to society is not uncontroversial - (see the gun control debate), and contributes to alienation in the Marxist sense of the term. There is more than utility involved here. Just because the trade-off doesn't matter to techno-determinists doesn't mean it it doesn't exist.

greenleafjacob · on April 14, 2016

Cars are already extremely regulated. You can only drive at speeds dictated by the government in directions dictated by the government, turn in ways prescribed by the government. Your car has to be identifiable in specific ways by the government. You have a lowered expectation of privacy in a car.

I hardly think the argument will be difficult to just prohibit cars.

tremon · on April 14, 2016

You can only drive at speeds dictated by the government in directions dictated by the government, turn in ways prescribed by the government

Sure... so you're saying that your steering wheel blocks when you're trying to make an uncharted turn? Or that your throttle has a variable hard limit, depending on the road you're on?

Where do you live, if I may ask?

794CD01 · on April 13, 2016

That's a popular theory of how people behave but it doesn't play out that way. People overwhelmingly choose convenience and low price over safety all the time.

ThrustVectoring · on April 13, 2016

There's psychological and game theoretic factors that the safety has to overcome in order to be acceptable. Part of why human drivers are allowed today is because the people who bear the cost of driving decisions are directly involved in making those decisions. Once you give up control to a third party, they need to be significantly better to make it an acceptable choice on the individual level.

In other words, I agree that it's better for society, but that "better for society" isn't the metric that gets used for making decisions within the system.

makip · on April 13, 2016

I see where you're going with this and I agree.

A significant consideration is whether owners of self driving cars will have the right to make code modifications to their vehicles.

Captured aptly here by Cory Doctrow.

https://www.theguardian.com/technology/2015/dec/23/the-probl...

Edit: salient quotes from the article:

"Here’s a different way of thinking about [the trolley] problem: if you wanted to design a car that intentionally murdered its driver under certain circumstances, how would you make sure that the driver never altered its programming so that they could be assured that their property would never intentionally murder them?"

"If self-driving cars can only be safe if we are sure no one can reconfigure them without manufacturer approval, then they will never be safe."

"Your relationship to the car you ride in, but do not own, makes all the problems mentioned even harder."

amelius · on April 13, 2016

> Part of why human drivers are allowed today is because the people who bear the cost of driving decisions are directly involved in making those decisions.

This gives me weird visions of Google engineers with a necklace that explodes in the event that one of their cars causes an accident :S

amazon_not · on April 13, 2016

I wonder if you've read Fallen Dragon by Peter F. Hamilton?

Unremovable, remote controllable lethal necklaces are a central plot device in the mercenary invasion. They are put on randomly chosen civilians as "collateral" to ensure co-operation and disincentive insurgency.

seanp2k2 · on April 13, 2016

The difference there is that one person's mistake driving causes one accident. One person's mistake programming might cause a thousand.

794CD01 · on April 13, 2016

People in general don't willingly prioritize their own safety at the expense of convenience. That's why cars beep when you don't buckle your seatbelt. That's why people drink soda and eat fast food.

nzoschke · on April 13, 2016

Seconded. Fast recovery of the problem, fast to publish a postmortem, and a very thurough postmortem.

Outages suck, but are inevitable even for Google. With a response like this Google has gained even more trust from me.

kyrra · on April 13, 2016

Google's take on postmortems is really nice. As the SRE book points out, they are seen as a learning tool for others. Most internal postmortems are available for anyone within the company to see and learn from. As well, they are always blameless. No fingers are pointed at the person who caused the issue in the postmortem. They explain the issue, what happened, and how it can be prevented in the future.

Pair this with the outage tracking tools and you can find all the outages that have happened across Google and what caused them.

Then there is DiRT[0] testing to try and catch problems in a controlled manner. Having things break randomly through Google's infrastructure and you have to see if your service's setup and oncall people handle it properly is a really awesome exercise.

[0] http://queue.acm.org/detail.cfm?id=2371516

The opinions stated here are my own, not necessarily those of Google.

Edit: Changed from saying "all" to "most" postmortems being available to Googlers to see.

pveierland · on April 13, 2016

It also showcases the great thing about self-driving cars. Even though accidents will happen, when it does there will be plenty of sensor data and logs which can be examined to find the exact cause in a post-mortem. An improvement to the software can then be made, and millions of cars deployed can all effectively learn from a single accident.

With humans, the amount of knowledge gained and the collective improvement of driving behavior from a single accident is low, and each accident mostly provides some data points to tracked statistics. With machines, great systematic improvements are made possible over time such that the remaining edge cases will become increasingly improbable.

marcosdumay · on April 13, 2016

Just like planes.

I'll have to point that this is necessary, but not sufficient for enabling an ever improving, extremely safe activity.

Aviation also have a just right amount of blame running in the system that is hard to replicate on any other area.

xiphias · on April 13, 2016

What does post-mortem mean in this context? The software one (after an accident) or the human one (after death)? I think It's crazy that the word gets back the original meaning..

reustle · on April 13, 2016

I wouldn't worry so much. I'm sure self driving cars are going to save a lot more lives than they are going to end. Humans are terrible drivers, and the software will only get better.

brianwawok · on April 13, 2016

I know this logically. But emotionally I know how many bugs I have written in my life. I know software devs are human.. aka I know how the sausage is made.

kohanz · on April 13, 2016

Yes, but have you been part of the development & testing effort for mission-critical software (e.g. a class 1 or 2 medical device?). It's not true in all cases, but for the most part the level of QA that goes into the devices before release is significantly higher than that of your average product. This is why regulation is required.

brianwawok · on April 13, 2016

No, but I wrote software for financial exchanges that made the newspaper when I wrote bugs. And I did write bugs ;)

on April 13, 2016

[deleted]

praxulus · on April 13, 2016

GCE downtime just means people lose money, it's not life-or-death. Skimping on QA in order to reduce costs and get to market faster is a perfectly reasonable decision when the consequences are so mundane.

flurdy · on April 13, 2016

I understand what you mean but that is generalising too much what people use GCE, public clouds, self hosted servers for, and especially going forward. It is not all convenience applications, game backends etc. What people these days use AWS/GCE for is so varied, even public sector use AWS Gov Region for example. Downtime consequences is not just money lost but can be life-and-death and for some application they need solid QA even if hosted in a public cloud.

It may (emphasise 'may') be how they share medical data via GCE/AWS that gets delayed just before a surgery (ok, edge case) or how they update bugs in a critical GPS model that happen to be used by an ambulance, or even a taxi used by pregnant lady that is about to drop, etc. Or simple general medical self diagnosis information site that by chance could have saved someone in that time slot. Or any other random non medical usage which involves a server and data of some kind that happen to be in GCE.

Yes critical real time systems often are on-premise or in self hosted data centres, but more and more are not especially if viewed as not critical but in some cases indirectly are critical.

kohanz · on April 13, 2016

You make a good point, but in the end the responsibility is on the life-critical application (e.g. medical software, device, self-driving car) to ensure that is has been properly QA'd and that all of its dependencies (including any cloud services or framework that it is built upon) meet its safety requirements. The event of an app server or cloud service experiencing downtime would very much have to be planned for as part of a Risk Management exercise. Ignoring that possibility would be negligent.

tacon · on April 13, 2016

I had the same feeling just before Y2K. Too much knowledge of the process, it all must work the first time in production, etc., etc. I was pleasantly surprised at the non-event of Y2K.

ambago · on April 13, 2016

Especially since "driver-error" is the cause of 94% of motor vehicle crashes in the U.S.[1], with 32,675 people killed and 2.3 million injured in 2014.[2] Worldwide, motor-vehicle crashes cause over 1.2 million deaths each year and are the leading cause of death for people between the ages of 15-29 years old.[3]

It's estimated that self-driving cars could reduce vehicle crashes by approximately 90%! [4]

[1] http://www-nrd.nhtsa.dot.gov/pubs/812115.pdf [2] http://www-nrd.nhtsa.dot.gov/Pubs/812219.pdf [3] http://www.who.int/violence_injury_prevention/road_safety_st... [4] http://www.mckinsey.com/industries/automotive-and-assembly/o...

takeda · on April 13, 2016

This is true for a single car, but self driving cars are introducing something that did not happen before. Imagine majority of cars are self driving cars and they all malfunctions due to bug in software update.

executesorder66 · on April 14, 2016

> and they all malfunctions due to bug in software update.

That's assuming everyone with a self driving car is driving the exact same model and they all updated at the exact same time. Chances are there will be many different models and manufactures so an OTA update with a bug will only affect a much smaller percentage of the self driving cars.

VonGuard · on April 13, 2016

Yeah, remember, auto-pilot in a plane needs to be 100% reliable, or everyone dies. A car needs to be, I dunno, 80%? Compared to a bad human driver, who still drives every damn day, a computer need only be about 60% reliable to be better.

People suck at driving. Even a shitty self-driving car will save a ton of lives simply by obeying traffic laws.

dreamcompiler · on April 13, 2016

An auto-pilot for an airplane is a considerably easier problem to solve. No lanes; no pedestrians; very little other traffic; three spatial degrees of freedom. That's why auto-pilots for airplanes have existed for almost a century but we're just now beginning to get self-driving cars. Humans are still better at dealing with the full panoply of crap that road driving throws at us.

infinotize · on April 13, 2016

Aircraft autopilots also rely on experienced and licensed pilots to operate them and be responsible for the aircraft at all times. Self driving cars have assume the operator is not particularly capable nor paying attention to anything happening on the road.

ocdtrekkie · on April 13, 2016

Google Self-Driving Cars currently fail approximately every 1,500 miles, according to their own report. We are a LONG way away from being able to separate the operator's attention from driving.

duaneb · on April 13, 2016

Do they rely on the pilot? I was under the impression it was entirely hands off.

Eugr · on April 13, 2016

Current generation of autopilots doesn't handle traffic avoidance or make any routing decisions - they just follow pre-programmed routes at pre-programmed speed and altitude (or climb/descent profile).

But even this relatively simple level of automation causes problems - pilots start to rely on automation too much, and when things go south they are not capable to deal with it.

Airlines recognize it, and put more emphasis on hand-flying during training and routine operations, so pilots don't lose their basic piloting skills.

It's not a new problem - there is an excellent training video from 1997 - "Children of the Magenta": https://www.youtube.com/watch?v=pN41LvuSz10

schwarrrtz · on April 13, 2016

This video also inspired an excellent podcast from 99 Percent Invisible about the challenges and dangers of automation. I would highly recommend listening.

http://99percentinvisible.org/episode/children-of-the-magent...

durandal1 · on April 13, 2016

There are many conditions unders which an aircraft autopilot will simply disconnect without previous warning and hand over controls to the pilot.

ceejayoz · on April 13, 2016

> Yeah, remember, auto-pilot in a plane needs to be 100% reliable, or everyone dies.

First, they're not anywhere near 100% reliable. They can fail on their own, and they'll also intentionally shut themselves off if the instruments they rely on fail. https://en.wikipedia.org/wiki/Air_France_Flight_447

Second, an autopilot failure shouldn't lead to death if the pilots are competent and paying attention.

duaneb · on April 13, 2016

What does a failed autopilot look like? Would a pilot do any better with an "aerodynamic stall"? I know little about planes and it seems like that'd be a big problem with or without a pilot driving.

ceejayoz · on April 13, 2016

All pilots are trained to recover from a stall.

A failed autopilot could look like all sorts of things, from just automatically disconnecting itself (usually with a loud warning alert) to issuing incorrect instructions (which is why the pilots are supposed to be awake and alert while it's engaged, watching the instruments).

neurotech1 · on April 14, 2016

Actually a key finding in AF447 was that pilots were not trained on how to recognize and recover from a high altitude stall. It is not like flying a Cessna 150. The junior first officer didn't realize the aircraft had stalled.

Pilots were trained on the procedure for recovering from a low altitude stall; 100% or TOGA thrust and power out of it while minimizing altitude loss. Training has now changed for both low altitude and high altitude stall recovery.

ceejayoz · on April 14, 2016

IIRC, part of the issue was that the two pilots issued contradictory joystick commands, which the plane averaged to zero. Which is a bit terrifying.

neurotech1 · on April 14, 2016

AF447 was a PILOT failure, not AUTOPILOT failure. The pilot didn't use proper stall recovery procedures, and put the aircraft into an unrecoverable stall.

duaneb · on April 13, 2016

But autopilot for planes is actually much easier than negotiating traffic with irrational humans with road rage. You can coordinate with air traffic for takeoffs and landings, and there is very little to run into at tens of thousands of feet.

raverbashing · on April 13, 2016

> auto-pilot in a plane needs to be 100% reliable, or everyone dies

Actually it is much more simpler than a self-driving car. And if there is a problem it disengages.

vacri · on April 13, 2016

Autopilot in a plane usually doesn't involve autonavigation, whereas autodrive in a car generally requires navigation. Autodrive in a car without navigation is basically 'cruise control'.

dreamcompiler · on April 13, 2016

'Cruise control' does not (at least it didn't until very recently) even attempt to avoid collisions with neighboring cars or keep the car in its lane. Even without navigation, autodrive in a car is a considerably more difficult problem than either cruise control or autopilot in a plane.

vacri · on April 14, 2016

My point is that 'autopilot' is more like 'cruise control', and except in advanced cases, is not analagous to 'autodrive'.

zdkl · on April 13, 2016

Not sure low-probability/high-damage events are comparable to high-prob/"low"-dmg in the first place and that's not a trivial question to handle in real-time

boydc · on April 13, 2016

Self-driving car could be better than human in average. But as long as there are human drivers who drive better than self-driving software, it would be disaster for these drivers. We definitely do not want some technique than do good for majority but do horrible things for minority, right?

scarecrowbob · on April 13, 2016

I;m not following your argument here.

A single driver's ability isn't the only risk factor... if I'm a great driver but every one else sucks (that's how it for everyone already, right :D ), then an overall increase in the population's driving ability helps me, right?

boydc · on April 14, 2016

the overall increase helps you indeed. But do you want use self-driving software if you are a great driver(or you think you are)? I do not because I want to be more safer by driving myself.

If great drivers like to drive themselves. Others wants too because they do not trust these great drivers.

In everyone driver's eyes, there are only two kinds of drivers 1 ) bad driver slower than me. 2 ) mad driver faster than me.

llamataboot · on April 13, 2016

This would be true only if your driving ability only affected your chance to die, but your driving ability has an effect on everyone else's safety on the road as well!

boydc · on April 14, 2016

Consider you are a damn good driver, better than self-driving software. If self-driving software can reduce your risk by giving you a safer environment(by replacing lots of bad drivers), but it will increase your risk when handling risks(because it's not as good as you). Would you like to choose self-driving software?

The point here is, no matter how good the driving environment goes, I do not want to lost any chance to survive(If I'm a good driver).

rodgerd · on April 13, 2016

The best drivers in the world (Formula 1) are already outcompeted by traction and stability control systems.

jchrisa · on April 13, 2016

Why would it be a disaster?

llamataboot · on April 13, 2016

My consolation in that fact is that weird edge cases happen with human driven cars as well. Someone has a seizure and crashes, or more commonly reaches for a cigarette, the radio, their phone. People hit ice or water and overcorrect their spin. People drive too fast. Etc etc etc. Not even all edge cases, many common modes of failure. I except self-driving cars that kill people will be a huge emotional issue for a lot of people in accepting them, but for me, i just want them to be safer than human drivers, which isn't THAT high of a bar to cross.

fizzbatter · on April 13, 2016

Yup. People seem to be overly critical with automated car failures.

Personally, i think automated cars are going to easily be better than humans in the working cases (both human and ai are concious). Next, i expect to see fully operational backup systems.

Eg, if a monitoring system decides that the primary system is failing for whatever reason, be it bug or unhandled road condition (tree/etc), the backup system takes over and solely attempts to get the driver off the road, and into a safe location.

Humans often fail, but often can attempt to recover. And, as bad as we may be at even recovering, we know to try and avoid oncoming traffic. Computers (currently) are very bad at recovering when they fail. I feel like having a computer driving, in the event of failure, is akin to a narcoleptic driver - when it goes wrong, it goes really wrong. Hence why i hope to see a backup system, completely isolated, and fully intent on safely changing lanes or finding a suitable area to pull over.

nostalgiac · on April 14, 2016

Sounds good in theory. Until the bug that causes failure is also present in the monitoring system, and as such doesn't fail over to the backup system. AKA exactly what happened here to Google.

nxzero · on April 13, 2016

Idea that edge cases in autonomous vehicles would result in 30,000+ deaths a year to me is a stretch.

If you dispute this, please explain.

If your position is that one death is too many, that is illogical relative to the option of letting people drive cars.

ocdtrekkie · on April 13, 2016

Currently, the self-driving software fails out on a Google Self-Driving Car every 1,500 miles. If the car suddenly stops trying to drive in the road, and the driver isn't attentive (or worse, if Google gets their way and convinces the laws to change so they don't have to have steering wheels) that's a lot of deaths.

I'm not saying it won't get better, but pretending self-driving cars is a cure-all right now is hilarious and insane.

gizmo385 · on April 13, 2016

Source? What kind of failure are you talking about? Minor hiccups or full failures which stop the car entirely?

ocdtrekkie · on April 13, 2016

Google's report from December 2015: http://static.googleusercontent.com/media/www.google.com/en/...

Over 424,000 miles driven:

272 times the car had a 'system failure' and immediately returned control to the driver with only a couple seconds of warning. (Approx. every 1,558 miles.) A car mid-traffic spontaneously dropping control of the vehicle would likely create a large number of accidents.

13 car accidents prevented via human intervention (Approx. every 32,615 miles), 10 of which would've been the self-driving car's at-fault (Approx. every 42,400 miles). These virtual accidents were tested with the telemetry recorded during the incident, and it was determined had the human test driver not intervened, an accident would've occurred.

Total of these events is 285, which is approximately every 1,487 miles driven.

For useful comparison, a rough human average (when you add a large margin to account for unreported accidents) is somewhere around one accident every 150,000 miles driven. (Insurance companies see them every 250,000 miles approximately, I believe.)

phaemon · on April 14, 2016

Your comment is extremely misleading. Firstly, the "couple of seconds warning" does not seem accurate. The actual report states:

"Our test drivers are trained and prepared for these events and the average driver response time of all measurable events was 0.84 seconds."

Secondly, you fail to mention:

"“Immediate manual control” disengage thresholds are set conservatively. Our objective is not to minimize disengages; rather, it is to gather as much data as possible to enable us to improve our self-driving system."

Thirdly, you fail to mention that the rate has dropped significantly:

"The rate of this type of disengagement has dropped significantly from 785 miles per disengagement in the fourth quarter of 2014 to 5318 miles per disengagement in the fourth quarter of 2015. "

On the contact events, you fail to mention:

"From April 2015 to November 2015, our cars self-drove more than 230,000 miles without a single such event."

Lastly, your comparison with human drivers fails to take into account the environment:

"The setting in which our SDCs and our drivers operate most frequently is important. Mastering autonomous driving on city streets -- rather than freeways, interstates or highways -- requires us to navigate complex road environments such as multi-lane intersections or unprotected left-hand turns, a larger variety of road users including cyclists and pedestrians, and more unpredictable behavior from other road users. This differs from the driving undertaken by an average American driver who will spend a larger proportion of their driving miles on less complex roads such as freeways. Not surprisingly, 89 percent of our reportable disengagements have occurred in this complex street environment"

I don't think self-driving cars are quite ready yet, but you are not representing the state of the art accurately by making out it is as bad as you say.

Dylan16807 · on April 13, 2016

> As I assumed it was kind of a corner case bug meet corner case bug met corner case bug.

I wouldn't really agree with that. There were two pieces of code designed to perform checks on new configs and cancel them. They both failed. Neither of those checks is a corner case. If you had a spec sheet for the system that manages IP blocks, that functionality would be listed as a feature right up front.

lugg · on April 13, 2016

This wasn't an edge case. It was two bugs in two sections of code both designed to recover from a serious problem. It sounds like both sections of code were not tested properly at the very least.

Sounds to me like someone just didn't bother to test the failsafe part of the code.

packetslave · on April 14, 2016

...and you're basing this on what, exactly? It's easy to pontificate about what people "didn't bother to test" based on zero information.

lugg · on April 14, 2016

First bug:

In a failure case, it should remove the failing config, not all of them.

Pretty hard thing to miss if you test for it with any level of basic unit test or similar.

Second bug: canary failure should prevent further propogation of the bad config.

A little more difficult to test with automated tests due to requiring a connection. It sounds like this was in fact tested, but the usage between the two bits of software was not tested. A good integration test would have caught this. But I wouldn't call that required. I would at least however think it was required that the use case of that particular code to be at least manually checked because, you know it's a feature for disaster prevention / recovery.

There was enough information to deduce this pretty easily. Although they did tend to glaze over it in the write-up, almost purposefully.

For all those spouting that this was a good postmortem, not really, it's a good covering of ones ass, a good spin, sidestepping the real root cause.

What has slas and "here take credits" got to do with a postmortem?

I'm not really sure why I got downvoted for this. The post mortem was good but it wasn't something I'd aim to strive for. I like gcloud and I'll keep using it but I find the response to this thing a little bit hard to swallow.

magicalist · on April 14, 2016

> I'm not really sure why I got downvoted for this.

Because you have an apparently incredibly simple mental model for the system and so of course tests for it seem simple?

lugg · on April 14, 2016

What makes you think software that produces and rolls out configuration files is something complicated?

I don't doubt that Google's infrastructure is as complicated and nuanced as it can get. Configuration software just simply isn't.

I still don't really see the point you're trying to make here. There isn't enough detail in the two sentences they gave us on the actual cause of the problem to really say much more in any further detail than I did.

But I guess that just proves my other point. Postmortem was 90% fluff.

Yet in googles defence, the information they gave was thorough enough for me. My only gripe was how it was being treated here. It just wasn't a very interesting situation and turned out to be something quite mundane.

packetslave · on April 14, 2016

What makes you think software that produces and rolls out configuration files is something complicated?

I don't doubt that Google's infrastructure is as complicated and nuanced as it can get. Configuration software just simply isn't.

You literally have no idea what you're talking about.

Thaxll · on April 13, 2016

self driving cars will be safer than BGP that' s for sure.

clebio · on April 13, 2016

Oh geez. This comment needs to go into the pantheon of all-time nerdy humor.

slashdev · on April 13, 2016

Yes but how many people drive stoned, drunk, or distracted?

How many people drive aggressively, speeding, or erratically? How many people do dumb things on the road?

As a software engineer I know that there will be bugs and some will likely kill people. But as a driver who has driven many years in less civilized countries, I know that human beings are terrible drivers.

Who would you rather share the road with, computer drivers that drive like your grandma, or a bunch of humans? It's a no-brainer right?

ddispaltro · on April 13, 2016

Agreed, I think a good postmortem distinguishes great companies from just good companies. The depth on philosophy, reasoning and then action is very digestible.

fweespee_ch · on April 13, 2016

> This is also why I am of afraid of a self driving cars and other such life critical software. There are going to be weird edge cases, what prevents you from reaching them?

Yes. However, the current failure rate of human drivers being improved on is the standard I care about.

http://www.cnbc.com/2015/10/29/crash-data-for-self-driving-c...

> After crunching the data, Schoettle and Sivak concluded there's an average of 9.1 crashes involving self-driving vehicles per million miles traveled. That's more than double the rate of 4.1 crashes per one million miles involving conventional vehicles.

That is the only number that matters to me. Google gets that to 4.0 per million miles and I'd say they are good to go.

brianwawok · on April 13, 2016

So crashes like that include drunk drivers, drugged drivers, and texting drivers.

What is the crashes per miles for a paying attention driver? If it is 1 per million miles, the self driving car would need to be a lot lower. Now if it was 4am and I am falling asleep at the wheel, I bet any self driving car would beat me. So cool to turn on, but maybe not for a daytime cruise...

austinsharp · on April 13, 2016

It's not just crashes per mile - it's the severity of the crashes that needs to be considered too. 5 fender benders is a better outcome than 1 horrific wreck that kills someone.

duaneb · on April 13, 2016

Is crashes/mile the best metric or mistakes/mile that could lead to a crash? I certainly make a TON of mistakes that I can correct before they lead to problems--most of them are mild, like having to break a half second later than I'd like, but i bet you are UNDERestimating the improvement due to self driving cars.

username223 · on April 13, 2016

It's definitely crashes per mile. Unexpected stuff happens all the time on real roads -- black ice, debris, animals, flat tires -- and safety depends upon the driver's or software's ability to deal with them. It's easy for self-driving not to wander into another lane because they're changing the radio, but it's hard for them to deal with situations their designers didn't anticipate.

bcook · on April 13, 2016

Why are you comparing self-driving cars to exclusively a "paying attention driver"?

For self-driving cars to be safer than human drivers, there is no requirement that the self-driving cars should be better/safer than the best human driver... the self-driving car simply needs to be safer than the majority of humans.

brianwawok · on April 13, 2016

> For self-driving cars to be safer than human drivers, there is no requirement that the self-driving cars should be better/safer than the best human driver... the self-driving car simply needs to be safer than the majority of humans.

That is true on a whole, but not true for ME. It needs to be safer than ME, not some hypotehtical average person.

Further compounding it:

> For driving skills, 93% of the U.S. sample and 69% of the Swedish sample put themselves in the top 50% [0]

0 - https://en.wikipedia.org/wiki/Illusory_superiority

lfowles · on April 13, 2016

Consider that most driving is done feet away from another vehicle. If those cars start being replaced by self-driving cars, then the you are safer (system safety x personal safety). The car might make more dubious decisions than you would yourself, but now it has less opportunities for failure due to others driving erratically.

dagss · on April 13, 2016

Somebody can crash into you.

erichocean · on April 13, 2016

Are BGP updates for Google's own router configurations really so frequent that they can't pay an engineer to at least monitor the propagation of configuration changes? In this case, a human would have instantly seen that the update was a) rejected (as explained in the postmortem), and b) holy shit, WHY DID THE ROUTER CHANGE ITS OWN CONFIGURATION TO BLOW AWAY ALL OF THE GCE ROUTES!?!

I'm all for automation, but WTF? Insert even a semi-competent engineer in the loop to monitor the configuration change as it propagates around and the entire problem could have been addressed almost trivially, as the human engineers eventually decided to do.

dsl · on April 13, 2016

First of all, BGP is core to Google's load balancing architecture. So within a single datacenter you probably have at least a few dozen devices down stream from each edge router.

Secondly, I'm seeing just shy of 500 individual prefixes, 282 directly connected peers (other networks), and a presence at over 100 physical internet exchanges, just for one of Google's four ASes.

Would you be able to read over that configuration and tell me if it has errors?

DanielDent · on April 13, 2016

Any sufficiently large system quickly reaches a point where a human has difficulty tracking what the system should look like.

Google has at least tens of data center locations, each of which will have multiple physical failure domains.

There are also many discontiguous routes being announced at all of their network PoPs. They have substantially more PoPs than data centers.

It very quickly gets too much to reasonably expect people to be able to keep track of what the system should look like, let alone grasping what it does look like.

jvolkman · on April 13, 2016

The human part of any process will also eventually fail, and it's much more difficult to fix human bugs. Better to shoot for full automation.

heisnotanalien · on April 13, 2016

But a computer could do the same thing. It would be possible to alert on all the routes being blown away?

stcredzero · on April 13, 2016

This is also why I am of afraid of a self driving cars and other such life critical software. There are going to be weird edge cases, what prevents you from reaching them?

Formal systems?

tremon · on April 14, 2016

Formal systems are still built on a model of the outside world, not on the world itself FAFAIK. Even if your formal coverage is 100%, you can not anticipate all weird edge cases the real world can come up with.

robmcm · on April 14, 2016

The key thing to remember is a bug in autonomous driving doesn't mean the car swerves off the road at 100mph. If the software crashes or fails the car can come to a stop quite quickly without harm and allowing for human intervention.

Having said that I am still scared, I'm not sure how well Tesla auto pilot will handle a tire blowout at 70mph. Perhaps better than I would, but I would much rather I was in control.

mikx007 · on April 13, 2016

Assuming bugs are never intentional and mostly random... Maybe instead of one autopilot software, self driving cars of the future will have several, developed by completely different teams. Then a self driving car can take some sort of average or most common output instruction (thus minimizing the risk of random bugs/edge cases...etc.)

jfoster · on April 13, 2016

As long as the edge case bugs in self driving cars come up less frequently than human error, it's an overall improvement.

ocdtrekkie · on April 13, 2016

Currently, they don't. Google cars fail every 1,500 miles on average.