Pilot Who Hitched a Ride Saved Lion Air 737 on the Day Before Deadly Crash

gok · on March 20, 2019

> That extra pilot, who was seated in the cockpit jumpseat, correctly diagnosed the problem and told the crew how to disable a malfunctioning flight-control system and save the plane, according to two people familiar with Indonesia’s investigation.

> The presence of a third pilot in the cockpit wasn’t contained in Indonesia’s National Transportation Safety Committee’s Nov. 28 report on the crash and hasn’t previously been reported.

So the NTSC explicitly chose to exclude this and then two whistleblowers went to Bloomberg? That is fucking wild.

Someone1234 · on March 20, 2019

It didn't need to be included in the preliminary report since it is contextual/background information.

They did include in the report that on a previous flight the same sensor error occurred and the pilots resolved it by disabling auto-trim. The fact there was a third pilot there is definitely interesting, but they didn't make anyone less safe by not including it in the report.

There's no bombshell here. Previous problems were well known/reported before today.

zaphirplane · on March 20, 2019

This is very important, as it shows the ratio of pilots aware of the mitigation is low, and/or the stress of fighting with the the computer makes you forget you training amendments

Edited to add it’s a training amendment

averros · on March 20, 2019

Actual instrument-rated pilot here.

What people (including pilots) do under stress is resort to simple actions drilled in by training. That's why emergency procedure training is a major part of pilot training curriculum.

What goes away first under stress is capability for complex reasoning, which is what one would need to figure out what the heck is going on and take appropriate action. In this case an a/c doing uncommanded dive can be a result of three things: elevator controls malfuncion (unlikely in modern jets, the mechanisms for transmitting control forces from the yoke to the elevator are multiply redundant and well-understood to be critical), structural breakdown (empennage falling off would do that), or run-away elevator or stabilizer trim.

In most a/c the runaway trim can be only caused by malfunctioning auto-pilot (that's why APs have multiple redundant ways to disable them), so the pilots faced this situation react by trying to disable AP in all available ways and then immediately jumping to conclusion that there's something catastrophically wrong with elevator system or even airframe integrity. The thought that there could be yet another system which can do major trim adjustment doesn't even enter their consciousness.

Training (and/or being in a position where there's no immediate stress of wrangling misbehaving a/c) would help to get to the correct solution (disabling trim motor by pulling the circuit breaker).

That said, flying an a/c seriously out of trim is hard, and can cause an accident on its own (the fine pitch control is gone, and you can even be so far out of trim that you cannot do landing flare no matter how hard you pull up, which in heavy jets would result in structural breakdown when the plane slams into runway).

stavros · on March 20, 2019

My question is why doesn't the "omg something is seriously fucked up give me manual control NOW" button disable the MCAS as well? That seems like a UX failure.

unionemployee · on March 20, 2019

That would be the autopilot disconnect button or, in this case, the trim switch, the pressing of which is the correct immediate response to this occurrence, followed by the disabling of the system using switches on the center pedestal. While it's certainly hazardous for Boeing to knowingly expose pilots to this occurrence, I'm also conflicted by the fact that they failed to respond properly. The procedure is similar on Boeing, Airbus, Embraer and Bombardier aircraft. Standardized, in a way.

inferiorhuman · on March 20, 2019

MCAS is designed to work only with autopilot disengaged. With autopilot and autothrottle engaged there would be similar logic at play because the general goal would be to not have the automation induce a stall. With everything disabled there is a risk that applying full throttle will result in the airplane pitching up dramatically beyond the ability of the elevator to pitch the airplane back down and potentially inducing a stall. This is an inherent problem with putting big engines under the wing (and is a problem in the 737 Classic[0], NG, and MAX).

You could almost certainly disable just about anything by pulling the appropriate circuit breaker. You don't want to do that because the plane may be left in a state where you can't control it manually and you only have a few thousand feet of altitude to recover.

There are, in fact, two switches to turn off the electronic control of the horizontal stabilizer. The previous Lion Air crew used them after the pilot riding in the jumpseat went back and grabbed his copy of what I'd assume was the flight manual and had an ah-hah moment[1]. The flight 610 crew were pouring over something (quick reference handbook?) but didn't express any awareness of the stabilizer being trimmed[2].

Depending on how bad things got there are a few problems with this. First, electrically moving the stabilizer is much faster than moving it manually. MCAS can operate at two speeds, IIRC both are slower than the speed dictated by the buttons on the yoke (plus the buttons on the yoke will pause MCAS). But even MCAS will move the stabilizer much faster than a pilot could by cranking the manual wheels. I believe that moving the stabilizer from one extreme at the fastest speed to the other takes about a minute. You can find videos of the stabilizer mechanism on a 777 on youtube, it's just not something that's designed to move quickly.

The next problem is that if the stabilizer is pitched down sufficiently (perhaps by MCAS), your first instinct may be to pull back on the yoke (to move the elevator) to regain pitch. Well, once you've done that the aerodynamic forces may be such that you can't move the stabilizer into a nose up position until you let go of the yoke and let the plane pitch down further (Boeing references needing to unload the stabilizer in some documents). That's something you've gotta have cat-like reflexes for when you're already flying so low.

The really sad part is that the pilots could have disabled MCAS and retained electric control over the stabilizer by extending the flaps. But how are the pilots going to know this when Boeing refused to document anything?

0: http://avherald.com/h?article=419f2f9e

1: https://www.grid.id/read/04966850/deretan-kejanggalan-yang-d...

2: https://www.straitstimes.com/asia/se-asia/cockpit-voice-reco...

recursosamazoni · on April 1, 2019

Thank you for your post. I've read that Boeing removed the "yoke jerk" function on the 737 MAX. The flight 302 PIC might have been trying to use it and I have not read whether that functional change was clearly explained to pilots. Many commentators don't seem to realize the extreme low altitude at which this particular struggle was occurring.

tim333 · on March 20, 2019

>The pilots of a doomed Lion Air Boeing 737 Max scoured a handbook as they struggled to understand why the jet was lurching downwards, but ran out of time before it hit the water, three people with knowledge of the cockpit voice recorder contents said. (straitstimes)

I can't help thinking things would be safer and the Ethiopian crash may not have happened if they just published all the info rather leaving this kind of thing to anonymous leaks to the press.

magduf · on March 20, 2019

If they had published this info, then they couldn't have sold the jet to airlines as just another 737 that any other 737-rated pilot could fly without additional training.

magduf · on March 20, 2019

Boeing was right not to document anything, because then they might have to get a new certification (which is very expensive) for the plane, and pilots would need more training, which would make it hard or impossible for them to sell this plane to airlines as "just another 737".

Remember, a few fatal crashes are worth it as long as it means higher corporate profits.

daveslash · on March 20, 2019

I think I saw that movie. "Fight Club"? Or was this the less well known sequel, "FLIGHT Club"?

stavros · on March 20, 2019

That's very informative, thank you.

tim333 · on March 20, 2019

Well, there is no such button. Perhaps there should be though a better solution might be to fix the automation. Airbus planes seem to work ok without a button but don't have a crash the plane if one sensor fails system.

mbreese · on March 20, 2019

There an AirFrance (Airbus) flight a few years back that crashed over the Atlantic that was partially the result of a stuck pitot tube. I remember at the time one of the debates was how automated Airbus planes were over Boeing. Pilots seemed split as to which system they preferred, but generally wanted more control, not less.

https://en.m.wikipedia.org/wiki/Air_France_Flight_447

dpe82 · on March 20, 2019

That's an excellent question, and will likely be one of the conclusions of the accident investigations.

varjag · on March 20, 2019

Could be related to the issue that MAX is unstable in flight with that system off. Unlike previous 737s.

_ph_ · on March 20, 2019

No, under normal conditions the MAX isn't unstable. The instability comes only at high angles of attack (close to stalling) and only there the MCAS system should kick in. It was meant to be the equivalent to ESP systems in cars.

mpweiher · on March 20, 2019

Yep. Except that the "exceptional" conditions under which the MCAS kicks in can also "occur" (=seem to occur) due to sensor failure.

gomijacogeo · on March 20, 2019

Do we know, exactly, how much of the 737 Max 8's flight envelope was lost compared to earlier models? I'm getting the feeling that the instability happens far closer to the regular maneuvering envelope than is being alluded to in the press.

arcticbull · on March 20, 2019

It's a single data point / anecdote. These investigations are incredibly thorough, precise, and authoritative so it makes sense they'd seek to exclude that kind of information until they knew for sure.

tomnm · on March 20, 2019

> It's a single data point / anecdote.

A single data point is still a data point.

yason · on March 20, 2019

And a very relevant one, in this case: trim runaway with pilots not understanding it's a runaway. The visiting pilot does recognise the situation and restores expected flight dynamics. What went wrong, why did not two pilots realise something that is traditionally a known case and well understood? Because it didn't look like a traditional case.

emteycz · on March 20, 2019

An irrelevant one in this case. This report needs to find out the technical/training/... reason, not why the other pilot needed a ride. I'm sure they'd have included it if they wanted to recommend having three pilots on every flight.

hencq · on March 20, 2019

I'm not sure that thorough and precise implies they would seek to leave out things. In fact, one would expect the opposite.

arcticbull · on March 20, 2019

One data point may lead you to bad conclusions, like in this case "one third of pilots don't know how to disengage the MCAS" -- the fact is we don't know what ratio. It might be higher, it might be lower, but one data point isn't valuable or thorough. In the preliminary report they mentioned someone else disabled it. In their final, I'm sure we'll get the details.

cryptonector · on March 20, 2019

It also shows that pilot awareness (and therefore training) should have been sufficient. The key deficiency here was Boeing's. The agency not disclosing this detail is a problem too.

cjbprime · on March 20, 2019

That.. sounds like the literal opposite of what it shows, no?

cryptonector · on March 20, 2019

I think you misunderstood. Had the pilots been trained, they could have dealt with it -- that the off-duty pilot was able to shows it was possible to respond correctly, while that the others weren't shows the lack of training is a real problem.

cjbprime · on March 20, 2019

Thanks, I see. I don't think it does much good to notice that a small percentage of pilots can be capable of figuring it out, if they have no other duties at the time.

That doesn't tell us that it's reasonable to expect pilots in command under stress to always figure it out. And they have to figure it out 100% of the time to avoid statistically unacceptable crashes. So I don't agree that pilot training is an acceptable answer given what we know so far.

cryptonector · on March 20, 2019

Yes it does: it shows that there is a procedure that would have worked had the pilots recognized the issue and known about the procedure.

The pilots were stressed in part because they didn't know -- they had plenty of time to recognize the issue and take action had they known, but they didn't, so no amount of time would have helped them. We don't know if one more thing to know about would too much, but I rather doubt that.

gok · on March 20, 2019

It suggests that no LionAir crews knew how to handle this.

Someone1234 · on March 20, 2019

US pilots say they didn't either. The training wasn't out there.

js2 · on March 20, 2019

The runaway stabilizer trim procedure is how it was expected to be handled:

https://www.flightglobal.com/news/articles/faa-order-tells-h...

https://www.youtube.com/watch?v=3pPRuFHR1co

The specifics of MCAS weren't out there because Boeing expected the pre-existing procedure to have been sufficient. That procedure wasn't sufficient, or the Lion and Ethiopian Air pilots were somehow not aware of it, or they forgot their training in a stressful situation.

To be clear: the planes should not have put these pilots into the situation they were in, but unless the recovery procedure doesn't work, these planes should not have crashed.

[One thing I've been wondering: somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there? It's as if they understood how to correct the stabilizer trim, but weren't aware of the cut-off switches.]

gmoot · on March 20, 2019

somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there?

The MCAS activated several times, each time ratcheting the stabilizer down by 2.5 degrees. Every time the pilots tried to correct for it, it would engage again. That is what caused the characteristic up/down flight pattern.

There are a lot of details in this story: https://www.seattletimes.com/business/boeing-aerospace/faile...

js2 · on March 20, 2019

I've read that but it still leaves me confused. Say trim is at 0°. MCAS sets it to nose down pitch of 2.5°. Pilot pulls back yoke. Why doesn't pilot then also look over at the stab indicator, see that it's at 2.5° nose down and reset it with the electric trim switch back to 0°? If the pilot had done that, when MCAS runs again, doesn't it set the stab back to 2.5° nose down? So the pilot and MCAS should be fighting over 2.5° of trim.

What it seems like happened is MCAS ran the stabilizer to 2.5°, but the pilot didn't reset it back to 0, but just back enough he could counter with the elevator. So MCAS keeps cranking the stab each time a bit further nose down and the pilot keeps only partially countering MCAS, instead of running the stab fully back to 0.

The pilot was clearly aware stab trim had something to do with the situation because there were stab adjustments from the pilot. What I can't make sense of is the pilot only partially countering MCAS. I realize the Lion pilots didn't know about MCAS and so their mental model must have been flawed. I'm even more confused about the Ethiopian Air pilots who should have known about MCAS.

edit: also this - https://news.ycombinator.com/item?id=19442160

CamperBob2 · on March 20, 2019

One thing I've been wondering: somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there? It's as if they understood how to correct the stabilizer trim, but weren't aware of the cut-off switches.

My understanding is that every time the MCAS activated, its effect was incremental. It could command only a limited deflection of the relevant control surface each time -- less than 1 degree? -- but the cumulative effect of multiple activations would eventually exceed the ability of the manual controls to override it.

piva00 · on March 20, 2019

From the reports I've read it was 2.5 degrees, even though Boeing had produced safety docs to the FAA only mentioning their initial design when they were only applying 0.6 degrees per activation.

inferiorhuman · on March 20, 2019

That procedure wasn't sufficient

Winner, winner, chicken dinner. The existing runaway stabilizer trim checklist does not fit the MCAS failure to a T. The procedure dictates that you stop if the trimming action stops when you counter it with the switches (MCAS pauses when you do this). Unfortunately when you're taught to only think inside the box, you'll go as far down that checklist as you need to and no further.

Beyond that, just after takeoff is one of the busiest part of the flight. In fact, if you look at the graphs you'll see that the stick shaker on the left side was going crazy almost immediately after takeoff (and before MCAS kicked in as a result of retracting the flaps). The pilots of flight 610 were able to reign in MCAS with the trim buttons and almost certainly continued to do so in order to focus on the other failures and their relevant checklists (stick shaker, unreliable airspeed, elevator feel system non-op).

[One thing I've been wondering: somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there? It's as if they understood how to correct the stabilizer trim, but weren't aware of the cut-off switches.]

Yep, to me, that looks an awful lot like some sort of hardware failure. Something to look for in the subsequent and final reports.

Someone1234 · on March 20, 2019

Runaway stabilizer feels nothing like MCAS. They likely didn't associate the training they received with the problem at hand.

js2 · on March 20, 2019

Can you explain how they are different?

Both cause the stabilizer to tilt. Both cause the stabilizer wheels next to the pilot/co-pilot to turn. If the plane is pitching down, and you pull back the yoke and that still doesn't recover, don't you at some point check the stabilizer angle? If it's wrong, you use the stabilizer trim switch on the yoke under your thumb to correct it. Won't you see the stabilizer wheel turn when you do that? If you let go and the wheel immediately starts turning again the wrong way (pitching the plane down), doesn't that look like runaway stabilizer?

Also, how did the off-duty pilot figure it out if they are nothing alike?

BTW, when you write "runaway stabilizer feels nothing like MCAS" are you speaking from personal experience?

Someone1234 · on March 20, 2019

Runaway stabilizer causes a more dramatic sudden movement than MCAS. MCAS's start, wait 5 seconds, re-start, motion is more gradual. For example you could look at the wheel and it has stopped, only to re-start after you look away (add noise cancelling Bose headphones and you may not hear it)

Pilots are trained well for runaway stabilizer, MCAS was essentially the same thing, but it didn't FEEL like the same thing, so a stressed pilot's mind may not immediately go to that solution/memory item.

phire · on March 20, 2019

Basically, the pilots would get more and more confused as their plane went more and more out of trim.

Now things are starting to make sense.

Boeing claimed (and F.A.A agreed) that pilots didn't need training because they already had training for a memory procedure which would have solved the issue.

They both appear to have overlooked is the MCAS symptoms were sufficiently different to the runaway stabiliser scenarios pilots trained on, that pilots are having problems knowing which procedure to apply.

_ph_ · on March 20, 2019

That might be the ultimate answer to why the accidents did happen. That while pilots should be well trained to deal with runaway stabilizers, they didn't recognize the situation in time (which probably was extremely short with the second crash). Which can only be explained that the way the events unfolded, distracted the pilots from dealing with a runaway stabilizer. The third man might just have not been "distracted" by piloting the airplane and thus could see it.

Besides obviously making the sensors used by MCAS truely redundant and limiting its extreme behavior of moving the trim up to the stops, a large part of fixing the MAX might be just having a big warning light for MCAS operating.

cameldrv · on March 20, 2019

My guess would be perspective, two kinds.

First he’s gonna be behind the throttle quadrant. The stabilizer wheels are roughly in the center of his vision, so he will notice them moving and when they start and stop. The pilots have them down near their thighs, so they’re not even necessarily in their peripheral vision. They will be less likely to notice every time they start and stop.

Second the jumpseat guy has no flight responsibilities. He can focus on thinking about what’s wrong without dealing with all of the other stuff you’re doing on departure. Radio calls, navigation, performance monitoring, configuration changes or being agitated by the stick shaker.

adontz · on March 20, 2019

Pilot priorities are: aviate, navigate, communicate. If you listen to incident recordings often pilots tell tower to wait and do not respond immediately. So radio calls are always low priority compared to actual flying. But I agree with point of view argument.

inferiorhuman · on March 20, 2019

Also, how did the off-duty pilot figure it out if they are nothing alike?

My assumption is that the jupmseat guy took a very generous interpretation of the checklist. But there may have been more panic about the trim on the earlier flight. If you compare trim inputs on flight 610 to the previous flight, the pilots on the earlier flight were making much shorter, more rapid inputs than the pilots on flight 610. The plane also pitched down far more dramatically than on flight 610.

The graphs for flight 610 seem to indicate that the pilots thought they had the trim situation under control (and they did right up until they didn't).

https://reports.aviation-safety.net/2018/20181029-0_B38M_PK-...

js2 · on March 20, 2019

Thank you for the preliminary report link. It’s the first I’ve seen it.

sandworm101 · on March 20, 2019

A runaway stabilizer is a single motion. The control surface moves, you react by shutting down the motor or fighting it by hand, and the situation should calm down. A malfunctioning MCAS will feel like multiple different runaways, in various directions. Once you react to the first, the system could be start pulling things the other way. Rather than a steady pull one way or another, you get into a back-and-forth fight.

rocqua · on March 20, 2019

> The specifics of MCAS weren't out there because Boeing expected the pre-existing procedure to have been sufficient.

This was argued by Boeing to the FAA, who accepted the reasoning.

It should be noted that, if it had been determined the pre-existing procedure wasn't sufficient, then all pilots on the 737 Max would've had to undergo re-training. This would have been expensive, and probably hurt adoption of the new airframe.

js2 · on March 20, 2019

> This would have been expensive, and probably hurt adoption of the new airframe.

What alternative did the airlines have though? The A320neo? Cause that would've required re-training too.

Probably Boeing should've pushed back against the airlines: we've made you a more fuel efficient 737, but it's going to require re-training.

Someone or someones may have thought: "MCAS is the only thing that requires retraining? Can't we fudge that?"

And the thing is, it still seems like MCAS is a reasonable solution, if poorly implemented.

tremon · on March 20, 2019

all pilots on the 737 Max would've had to undergo re-training. This would have been expensive, and probably hurt adoption of the new airframe.

Am I the only one hoping that in the end, the costs for Boeing will be more expensive than the global re-training would have been?

magduf · on March 20, 2019

It's not the cost of the training; that would have been borne by the airlines, not Boeing, I think. The problem is that if this plane required extra training, this would have been an additional cost to the airlines, meaning they might not have bought the plane in the first place, and would have bought the competing plane from Airbus. Also, not only is it an additional cost, it means they would have had to have a different type-rating for this plane (a pilot rated for a regular 737 wouldn't be allowed to fly this version).

masonic · on March 20, 2019

The article contradicts itself on this. Early on, it says the rescuing pilot referred to "part of a checklist that all pilots are required to memorize".

Later, it quotes claims that it "isn't in the documentation".

I would consider that published, official checklists are the most imminently critical form of documentation.

spricket · on March 20, 2019

The fact that "AoA disagree" light and logic was an optional feature seems criminal enough to me. A sensor with no failover unless you pay for an option. Who the hell thought this was a good idea or approved it? WTF!

500 people are already dead. Boeing should be brought to the coals. It probably takes longer to go through the checklist than it does for everyone to die.

I promise if the audio recordings are ever released from CVR they will be absolutely damning. Pilots trying to make it through a loss of control checklist as they dive to their doom. A lot of those checklists have 50+ steps. Imagine trying to make it through that while fighting the plane and descending at over 3x "maximum design descent rate".

I'm sure the fucking alarms we're blaring and pilots cursing the system carrying them toward certain death.

tntn · on March 20, 2019

The checklist in question has three steps. Step two is "move stab trim to cutout," which disables automatic systems that adjust the stabilizer. The pilots in lion air had ~10 minutes to do this.

It is extremely unlikely that the pilots were trying to work through the checklist. More likely they simply did not know what to do.

cjbprime · on March 20, 2019

This one's actually a memory item, not a checklist. But it's the memory item for "runaway trim", which is a very different qualitative experience than the slow march of an MCAS system that you didn't know existed.

spricket · on March 20, 2019

This doesn't make jive with disaster being averted by a third pilot. Assuming the third was totally dedicated to checklist vs preventing the plane from diving, it was only his insight that stopped the plane from going down.

The MCAS system apparently increased downward trim without any speed considerations, to over 2.5 degrees in 10 seconds. I don't have the full flight control details but it sure sounds like pilots would lose control within minutes at most. In the LionAir crash the pilot reported control problems and asked to return to airport within 3 minutes, and they slammed into the ocean in 12.

Not sure where you're getting this info but I'm more than sure they knew something was wrong in the last 2 minutes (while they were heading into the earth at almost the speed of sound).

You really think they have ten minutes to react when by then everyone on LionAir was doomed to die?

fegul · on March 20, 2019

This "human" factor was also characterized in the movie Sully when it was apparent that just a few extra seconds for pilots to process the issue made all the difference between landing safely at a nearby airport vs. landing in the Hudson River.

magduf · on March 20, 2019

Hopefully whatever engineer signed his name to this goes to prison, and gets personally sued by the families of the victims.

ashelmire · on March 20, 2019

Engineers don't make decisions like this. Not providing documentation of the feature or providing training to pilots was the problem, and it's caused by multiple management failures. The lack of sensor redundancy likely also is due to management.

magduf · on March 20, 2019

The lack of sensor redundancy HAD to be signed off by a professional engineer. It doesn't matter if management pushed it; it's the engineer's (PE's) responsibility to refuse to sign off on it.

ashelmire · on March 20, 2019

I’m not convinced sensor redundancy is the issue. The plane has two sensors; for some reason, however, the package that includes the AoA disagreement light costs extra.

It seems that lack of training and documentation was the reason these planes crashed though. The pilots didn’t know how to recognize the issue or resolve it.

Moral_ · on March 20, 2019

You seriously think an engineer didn't raise this concern? This reeks of high management telling engineers to know their place.

magduf · on March 20, 2019

A PE's job is to refuse to sign off on things that aren't safe.

flashman · on March 20, 2019

So a single point of failure (malfunctioning sensor) can engage the horizontal stabilizer without notifying pilots, in a way that the control yoke can't override.

What the hell did Boeing think was going to happen?

Diesel555 · on March 20, 2019

All fly by wire jets have movements based on sensors. In the F-16 a faulty AOA probe is okay. 2x Faulty AOA probes are okay as long as they disagree. 2x faulty AOA probes that agree (frozen) are a very bad thing. But the loss is just a jet with an ejected seat if it can't be fixed, not nearly 200 people which is terribly sad. The worst part is from a previous Hacker News post listed below. In particular, the "Economic Problem" which states "Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light, which lets pilots know that this problem was happening. Both 737MAXes that crashed were delivered without this option. No 737MAX with this option has ever crashed."

https://twitter.com/trevorsumner/status/1106934369158078470?...

dataflow · on March 20, 2019

> No 737MAX with this option has ever crashed.

I mean there are only two 737MAXes that have crashed in total, right? It's not like we have such a huge sample to work with for this to carry that much weight.

cjbprime · on March 20, 2019

And the only airlines that bought the option are North American, so we are immediately confounding with pilot training differences.

tim333 · on March 20, 2019

I guess when ordering a plane or a car you assume you don't need to order optional extras to stop the thing killing you.

skykooler · on March 20, 2019

I mean, some safety features such as automatic emergency braking are available but optional on many cars.

tim333 · on March 20, 2019

True but this is more like it may steer into a tree if you don't have the optional steering plus package.

shitgoose · on March 20, 2019

Would you personally board the plane whose crew didn't have this training option?

dataflow · on March 20, 2019

I'm not sure I'd board any 737-MAX's at this point.

magduf · on March 20, 2019

>But the loss is just a jet with an ejected seat if it can't be fixed, not nearly 200 people which is terribly sad.

Not necessarily true. If that F-16 is flying over a populated area, it can kill numerous people on the ground when it crashes.

jimktrains2 · on March 20, 2019

Do we have a confirmed source that it can't be override by pilot input? My understanding was that using the stick could overcome it with other control surfaces and that the controls for trim can be set/reset by the pilot.

I think Boeing has handled the aftermath (and much of the lead up since the release of the plane) very, very poorly. I, as a layman to aviation, am not willing to bet that Boeing knew the true likelihood of a problem and didn't tell anyone or had a whistle blower over it.

However, if the issue with a non-redundant hydraulic value in the original 737 didn't teach us the lesson, this should: no matter the likelihood of failure, safety critical systems should always be redundant.

(Also, Boeing didn't handle that original issue very well either.)

kayfox · on March 20, 2019

The trim adjustments by MCAS can be disabled by setting the two stabilizer trim cutout switches[1] to cut-out. The recognition that trim is being automatically adjusted outside of their ability to control the aircraft and how to correct it is considered so important to be a "memory item" or something pilots should know how to do without referring to the quick reference handbook.

1. https://aviation.stackexchange.com/questions/58798/why-doesn...

crooked-v · on March 20, 2019

> Do we have a confirmed source that it can't be override by pilot input?

Pilot input works... then MCAS silently does it again with up to +2.5 degrees additional adjustment, until after enough times it's maxed out the full rotation of the tail flap. See: https://www.seattletimes.com/business/boeing-aerospace/faile...

dsfyu404ed · on March 20, 2019

>safety critical systems should always be redundant.

That's east to say from an armchair but on an aircraft everything is "safety critical" to some extent and you have to choose what gets redundancy. Something where without it you can't fly the plane sure, that make sense. The argument for considering MCAS, a system which is not necessary to fly the plane safely, to be "safety critical" is much weaker. The Lion Air crash wouldn't have happened had the pilots disabled MCAS instead of fighting it into the drink.

ajxs · on March 20, 2019

I'm fairly certain that the characteristic "If this component malfunctions loss of life is one potential outcome" is a solid case for the component actually being safety critical. The pilots would never have needed to disable MCAS had it not malfunctioned in this manner. I'm not sure what redundancy has to do with this, but clearly there was a failure in a safety-critical system.

dsfyu404ed · on March 20, 2019

>I'm fairly certain that the characteristic "If this component malfunctions loss of life is one potential outcome" is a solid case for the component actually being safety critical.

The same thing can be said for literally tons of components on a plane. You can't have them all be redundant. At some point you have to pick and choose. A sensor for an obviously supplemental feature seems like a pretty obvious one to choose not to be redundant.

>I'm not sure what redundancy has to do with this,

I think the part in the GP comment where he/she state that "no matter the likelihood of failure, safety critical systems should always be redundant" has something to do with it.

chrisseaton · on March 20, 2019

> You can't have them all be redundant.

Why not?

RandomTisk · on March 20, 2019

Redundant wings? Fuselage? Just 2 obvious examples, back to my armchair.

comex · on March 20, 2019

True, there are some parts which are both safety critical and fundamentally unable to be made redundant, at least not without drastically compromising the overall design. But that doesn’t justify leaving out redundancy in parts which are able.

In any case, the actual regulations in play here would have required redundancy if the consequences of failure had been properly categorized:

> [I]n normal flight, an activation of MCAS to the maximum assumed authority of 0.6 degrees was classified as only a “major failure,” meaning that it could cause physical distress to people on the plane, but not death.

[..]

> He said virtually all equipment on any commercial airplane, including the various sensors, is reliable enough to meet the “major failure” requirement, which is that the probability of a failure must be less than one in 100,000. Such systems are therefore typically allowed to rely on a single input sensor.

> But when the consequences are assessed to be more severe, with a “hazardous failure” requirement demanding a more stringent probability of one in 10 million, then a system typically must have at least two separate input channels in case one goes wrong.

https://www.seattletimes.com/business/boeing-aerospace/faile...

chrisseaton · on March 20, 2019

Those are silly examples and clearly not what anyone meant in terms of components.

tim333 · on March 20, 2019

Also there is a kind of redundancy in the wings say by making the internal structure much stronger than generally needed. You can have a fracture in one part without the wings falling off.

logfromblammo · on March 20, 2019

There is also much more wing than is strictly needed to keep the plane aloft and stable. They're optimized for low fuel consumption at a design-specified cruising altitude and speed, with variable geometry, so they can still take off and land at lower speeds near ground level.

From a certain perspective, the flaps give the plane redundant wings. One pair for low and slow, and another pair for high and fast.

gtirloni · on March 20, 2019

The usual, cost. But I wonder how much the redundant sensor costs for airlines to ditch it.

cjbprime · on March 20, 2019

Every 737 has two AoA vanes. They are redundant and the redundancy is not optional. MCAS just doesn't take advantage of there being two sensors.

gtirloni · on March 20, 2019

I think you're right. I misread it. My point about cost still stands.

"Cockpit displays and a warning light intended to flag problems with angle-of-attack sensors in flight were optional on the Lion Air jet that crashed, according to people familiar with the matter. The carrier, like some others, chose not to purchase the feature, people familiar with the matter said, so pilots didn’t receive any such alerts."

https://www.wsj.com/articles/maintenance-lapse-identified-as...

jimktrains2 · on March 20, 2019

> That's east to say from an armchair but on an aircraft everything is "safety critical" to some extent and you have to choose what gets redundancy.

In flight entertainment, non-emergency lighting, food prep, &c aren't "safety critical" any more than the TVs in hospital rooms are.

As a layman, I'd've imagined that all avionics and control surface control was redundant. Don't most large planes have redundant hydraulic systems and even a deployable wind turbine to run avionics and hydraulic systems under total power loss scenarios? (And hasn't that turbine been used a few times?)

Yes, weight is always a factor, but that doesn't mean that there aren't already multiple redundant, heavy, systems on aircraft.

What puzzels me is that there are 2 angle of attack sensors, but they're only connected to one of the flight computers each, with the other flight computer being the redundant one. What's more Southwest ordered the optional disagree alert, so there is some way to tie and compare these sensors.

Edit:

> The argument for considering MCAS, a system which is not necessary to fly the plane safely, to be "safety critical" is much weaker.

It controls the control surfaces; I'm not sure how it wouldn't count.

dsfyu404ed · on March 20, 2019

But this isn't really "avionics and surface control" more than automatic lane keeping is a critical control system for the car. MCAS is a convenience feature that counteracts the plane's tendency to pitch up more than you want when climbing. Sure it can improve safety, so can lane keeping in a car. Neither are critical to operation. You can fly/drive perfectly safely without them so they need not be super hardened against failure because you can just switch them off.

jimktrains2 · on March 20, 2019

> MCAS is a convenience feature that counteracts the plane's tendency to pitch up more than you want when climbing.

It's still controlling control surfaces.

> Sure it can improve safety, so can lane keeping in a car. Neither are critical to operation. You can fly/drive perfectly safely without them so they need not be super hardened against failure because you can just switch them off.

I'm not quite sure where to begin. Just because the absence of something would make the plane flight worthy doesn't mean the addition of it keeps the plane flight worthy.

Take your lane following example. Sure, they can be turned off, but that didn't help the person whose Tesla ran straight into a jersey barrier because it got confused by it and where the lane was. (https://www.popularmechanics.com/technology/infrastructure/a...)

If a mechanisms can control the vehicle, it is safe critical. It needs to fail safe under all conditions. The requirement to fail safe is part of what makes it a safety critical system.

In the most extreme of examples, adding a lane following module to a car that randomly swerves into jersey barriers once at speed and if the barrier is close enough would be a clear example of a situation where the automatic controls can cause a situation that a human could not possibly react to; hence, the system itself needs to be held to much higher standards.

inferiorhuman · on March 20, 2019

MCAS is a convenience feature that counteracts the plane's tendency to pitch up more than you want when climbing.

Nope, MCAS is required to meet the requirements set forth by the FAA. That's not a convenience thing. Various nannies may seem like convenience things in a well balanced car, but they become far more important in a powerful, poorly balanced car like a Porsche 911 or Dodge Viper — cars that have earned reputations as widowmakers.

euyyn · on March 20, 2019

Whatever is a car nanny?

joecool1029 · on March 20, 2019

Traction and stability control in modern cars.

cerebellum42 · on March 20, 2019

>But this isn't really "avionics and surface control" more than automatic lane keeping is a critical control system for the car.

Even if we suppose that is true and a fair comparison (which I wouldn't), the way failure modes are handled is key. If there is uncertainty about the sensors that control the feature which controls the avionics the system needs to halt. This is like keeping the lane control active when the computer vision algorithm used to detect the lanes is uncertain about where the lane is. Chances are it'll steer you into the next available tree and kill you.

cjbprime · on March 20, 2019

I'm afraid everything you're saying is just very wrong:

> [Boeing self-assessed] a failure of the [MCAS] system as one level below “catastrophic.” But even that “hazardous” danger level should have precluded activation of the system based on input from a single sensor — and yet that’s how it was designed.

-- https://www.seattletimes.com/business/boeing-aerospace/faile...

Even at Boeing's understated safety risk, redundancy is required. And the system actually has much more risk than they stated, since it will eventually totally deflect the stabilizer -- as happened to both fatal flights, with the jackscrews found in their farthest position in the wrecks.

cjbprime · on March 20, 2019

But there are already two AoA sensors on the plane. They already made it redundant. They just didn't write code to perform an agreement check as an input to MCAS.

I don't think any aviation expert would agree that the case for requiring redundancy in the MCAS system is weak.

dralley · on March 20, 2019

The pilots know that the trim is moving, it's connected to a wheel in the cockpit that's super obvious.

But the normal way of stopping it (the yoke) doesn't necessarily work, and the pilots wouldn't necessarily think to physically stop the wheel with their hands.

segmondy · on March 20, 2019

This 3rd pilot was probably able to figure out the issue due to clarity of mind to not having to fly the plane, pay attention to all the instrument gauges while things are going wrong, trying to diagnosis and correct the issue.

sterlind · on March 20, 2019

Seems like pilots are overburdened by so much automation, as alluded to by the famous "Children of the Magenta" video[0].

I don't get why the plane doesn't just say what it's doing, and why it's doing it, and have a big red button to put the plane into a safe-mode alternate law. 737s already say warnings like "BANK ANGLE", couldn't it just say "DANGEROUS CLIMB DETECTED, TRIMMING NOSE DOWN. PRESS RED BUTTON TO CANCEL."

0. https://vimeo.com/159496346

Someone1234 · on March 20, 2019

Boeing offered that, but it was a paid extra that few airlines purchased.

https://theaircurrent.com/aviation-safety/southwest-airlines...

cjbprime · on March 20, 2019

This is grossly overstating the optional feature. As I understand it, "AOA disagree" is a single light near the AoA indicator itself. It's not an explanation, or even an alert. It's not where anyone would reasonably look when trying to diagnose an uncommanded flight control problem.

snowwindwaves · on March 20, 2019

When lights are normally not lit and one lights up a pilot or other machine or plant operator should take notice.

A detailed explanation should not be required since they should know what the different warning lights mean and what can cause them to be lit.

I personally would like to see a list of the airlines that bought the more expensive 737 max that included the additional safety features.

cjbprime · on March 20, 2019

The North American airlines bought it and no others did. It was not described as a safety feature -- if it was it wouldn't be permissible to make it optional.

Even the current training does not tell pilots to know that an AoA disagree light is an emergency because it can cause the plane to enter an uncommanded nosedive via MCAS. I really don't think it's reasonable to expect the pilots to know things that Boeing is not even trying to tell them.

dingaling · on March 20, 2019

> When lights are normally not lit and one lights up a pilot or other machine or plant operator should take notice.

The crew monitor the central EICAS display for fault indications, not random locations around the cockpit.

If you start bolting-on additional check locations you increase crew workload, particularly if it's not a 'dark cockpit' like an Airbus type.

cryptonector · on March 20, 2019

Yeah, maybe it shouldn't be an extra.

briandear · on March 20, 2019

> The Indonesia safety committee report said the plane had had multiple failures on previous flights and hadn’t been properly repaired.

It probably wouldn’t have mattered. Lion Air couldn’t even maintain their planes properly; assuming an extra indicator would help would be charitable at best.

AsyncAwait · on March 20, 2019

The Ethiopian plane was new and this was still a problem.

inferiorhuman · on March 20, 2019

But the normal way of stopping it (the yoke) doesn't necessarily work, and the pilots wouldn't necessarily think to physically stop the wheel with their hands.

Well, no. Pulling on the yoke traditionally moves the elevator. MCAS adjusts the horizontal stabilizer. You can adjust the stabilizer with switches on the yoke or with the trim wheels by your knee. MCAS will pause for five seconds if the pilot hits one of the switches on the yoke (and the Lion Air pilots did this until that stopped working).

Someone1234 · on March 20, 2019

I believe you may have misunderstood the comment you were replying to.

They were saying you cannot counteract MCAS's control inputs with the elevator alone. Which is a somewhat unconventional design, in a lot of aircraft the elevator can overpower the horizontal stabilizer, whereas with a bad sensor MCAS will continue to move the stabilizer until you cannot overcome it.

To use a bad analogy, in a car the break is stronger than the accelerator, so if the peddle sticks you can still stop. In other aircraft the elevator is more powerful than the horizontal stabilizer.

inferiorhuman · on March 20, 2019

They were saying you cannot counteract MCAS's control inputs with the elevator alone.

Pulling on the yoke is not how you counteract a runaway stabilizer on the 737. I've pasted the relevant part of the QRH in a few previous comments. Yes, the stabilizer ultimately has more pitch authority under some circumstances. That may be what happened here, but if I'm interpreting the graphs on the preliminary report correctly I wonder about mechanical failure of some sort.

This gets a bit more complex with the 737 because moving the yoke WILL actually stop one of the stabilizer trim algorithms, but not MCAS.

Someone1234 · on March 20, 2019

> Pulling on the yoke is not how you counteract a runaway stabilizer on the 737. I've pasted the relevant part of the QRH in a few previous comments.

It is how pilots learn to counteract nose down day one of pilot training. In many aircraft hard elevation will overpower even a faulty horizontal stabilizer. If the QRH was a panacea we would have 348 fewer loses today.

> That may be what happened here, but if I'm interpreting the graphs on the preliminary report correctly I wonder about mechanical failure of some sort.

There was a mechanical failure, the AoA sensor. I'm skeptical there needs to be more going on than MCAS due to the "repeated correction" unauthorized change Boeing made.

> “The FAA believed the airplane was designed to the 0.6 limit, and that’s what the foreign regulatory authorities thought, too,” said an FAA engineer. “It makes a difference in your assessment of the hazard involved.”

inferiorhuman · on March 20, 2019

In many aircraft hard elevation will overpower even a faulty horizontal stabilizer

In the 737 you can get into situations where the elevator has insufficient authority to overcome a stabilizer. Excessive pitch up (leading to a potential stall) that you can't counter by pushing on the yoke is exactly what MCAS is designed to prevent.

There was a mechanical failure, the AoA sensor.

A fixed offset from reality is an interesting failure mode, especially in two separate sensors (Lion Air replaced the alpha vane before flight 610), and even more interesting as it's the same alpha vane used in the 737 NG. The left alpha vane was being interpreted as almost exactly twenty degrees higher than the right.

6nf · on March 20, 2019

The left alpha vane was being interpreted as almost exactly twenty degrees higher than the right.

Is that because the plane was in a banking maneuver at the time maybe? I dont know anything about planes but I heard that when you're turning the two sensors will disagree by some amount

cjbprime · on March 20, 2019

I think it's been disclosed that in Lion Air the sensors were twenty degrees apart even when sitting on the runway before the flight. It is shocking that nothing checked for disagreement or communicated it to the pilots.

inferiorhuman · on March 20, 2019

Is that because the plane was in a banking maneuver at the time maybe? I dont know anything about planes but I heard that when you're turning the two sensors will disagree by some amount

The difference in angle of attack was consistent throughout the entire flight (well up until the crash where the values began to converge). The threshold for the optional 'angle-of-attack disagree' warning is, I think, ten degrees. It seems very unlikely that the plane had a twenty degree bank angle for two entire flights.

eli · on March 20, 2019

I don’t think it’s clear if it is a single point of failure or if a faulty sensor is just one precondition.

WalterBright · on March 20, 2019

The trim switches on the control column can override it.

cameldrv · on March 20, 2019

They will override it for 10 seconds, then MCAS gets back up like a zombie and tries to kill you again. If you want it dead you need to flip two cutout switches behind the throttle quadrant.

jquery · on March 20, 2019

Runaway trim is supposed to be part of the "memory checklist" for pilots. The symptoms of MCAS are the same as runaway trim and the fix is the same (which is why Boeing didn't feel like extra pilot training was needed), so I'm curious to see the most recent investigation and hear the black box voice recorder. Did they not know they were dealing with runaway trim? Did they think it was something else? Did they forget the memory checklist? Was there not enough height to deal with runaway trim regardless? Were the symptoms different than runaway trim, confusing the pilots?

The black boxes will be very illuminating on this respect, especially since we never recovered the Lion Air black box voice recorder.

salex89 · on March 20, 2019

True... Now I'm wondering has the third pilot been able to help because he was trained better, didn't forget the memory items, or was just in that position. He was likely sitting far back, he wasn't wrestling the plane (I'm not sure how composed is the cabin during those maneuvers). He might been in a better position to see whatever was happening.

nas · on March 20, 2019

This Seattle Times article is quite informative:

https://www.seattletimes.com/business/boeing-aerospace/faile...

It seems the MCAS had a flawed design where it would move the trim further than expected and would also trigger multiple times. Based on my understanding, the MCAS would have been quite safe if it didn't have these flaws (pilots deal with a failed AoA sensor as another case of "runaway trim").

ricardobeat · on March 20, 2019

> [MCAS] it’s not stopped by the pilot pulling the yoke, which for normal trim from the autopilot or runaway manual trim triggers trim hold sensors

This implies that 'normal' runaway trim can be stopped by pulling the control yoke. Maybe pilots simply have no idea what is going on once they realise that action has no effect with the MCAS?

inferiorhuman · on March 20, 2019

This implies that 'normal' runaway trim can be stopped by pulling the control yoke.

Pulling on the yoke will only stop the 737's computers from moving the stabilizer IF it's the speed trim system (STS) that's moving the stabilizer. Otherwise the yoke is intended to adjust the elevator not the stabilizer

argd678 · on March 20, 2019

> “After this horrific Lion Air accident, you’d think that everyone flying this airplane would know that’s how you turn this off,” said Steve Wallace, the former director of the U.S. Federal Aviation Administration’s accident investigation branch.

I suspect the other factors will answered when this question is answered too.

cybrjoe · on March 20, 2019

A quick search indicates the CVR for Lion Air 610 was recovered. Did I miss something?

jquery · on March 20, 2019

You're right, looks like it was recovered earlier this year, and they haven't released the transcript yet.

https://www.reuters.com/article/us-indonesia-crash/no-public...

qyv · on March 20, 2019

You are correct. And if two crews in 5 months had the same issue either identifying or dealing with the same problem, then perhaps there is an design or training problem error here.

ineedasername · on March 20, 2019

No need to choose, it seems like design and training can share the spot light, with their third friend sensor failure.

inferiorhuman · on March 20, 2019

The symptoms of MCAS are the same as runaway trim

No, they're not.

kayfox · on March 20, 2019

So, how are they not?

inferiorhuman · on March 20, 2019

So, how are they not?

When you hit the trim buttons on the yoke MCAS stops for five seconds. What Boeing considers runaway trim would not stop when you counter with the trim buttons.

blackflame7000 · on March 20, 2019

Yea the interval is actually 20 seconds which is a really long time to a pilot thinking he is in an emergency. If you look at the period of the ups and downs of the Ethiopian flight they roughly correspond to that 20-second interval. Just when he thinks he's fixed it, it strikes again and again, taking a more aggressive nose down attitude each time.

kayfox · on March 20, 2019

So, if the trim were running away based on an intermittent fault, the same situation could be encountered because the pilots don't need to execute the runaway trim checklist?

Someone1234 · on March 20, 2019

It feels different from their runaway stabilizer training. MCAS has a delay between each motion, whereas a runaway stabilizer is much more aggressive.

raihansaputra · on March 20, 2019

There's a thread on twitter with a pretty good analysis of what's happening with 737MAX. The 'Swiss Cheese' model here starts from its redesign by Boeing.

https://twitter.com/trevorsumner/status/1106934362531155974

tluyben2 · on March 20, 2019

From all I read; this including the optional disagree indicator, I still say Boeing is should be held responsible for this: all point to economic reasons which means they decided these things and fully knew the potential consequences.

torqueTorrent · on March 20, 2019

The saddest thing, as many HN users should know all to well, is that there can be no excuse for automated systems like airliners to experience catastrophic failure and loss of life, if only due to the availability and application of modern SDLC principles and CI/CD etc.

Smoke testing could have been performed such that all possible combinations of transducer input could be considered and evaluated thoroughly for closed-loop effect at runtime.

These types of integration tests should have been performed repeatedly, seemingly endlessly in the quest for bugs and analysis of the full spectrum of runtime results and effects.

In my experience in the software industry, I've always done this for applications that have infinitely more trivial effect and results than an airliner at altitude containing hundreds of souls.

One potential counterpart to the seemingly infinite greed we see exponentially increasing could be the old adage that karma is a bitch.

philpem · on March 20, 2019

Speaking as someone who's done this (though not on a something as big as an airliner!)

Yes, you can test control loops -- you can even turn it into a unit test. At least in theory.

The problem is that to do the test you need either a working, physical system or a good model. So if you're making a shutdown valve for a chemical plant, you need a physical build of that control valve. Even on that scale, you're talking about something that could potentially fill an engineering lab, be quite noisy and have a considerable amount of stored pneumatic or hydraulic energy. It's possible, but not exactly practical.

The alternative is to model the system, but now the question changes: how can you be certain that your model is accurate and models all the variables? Say your valve is slower when it's cold and you don't model that -- now you have a false positive result ("it works" -- but nobody realised that "temperature" was a dependent variable).

So you take the middle ground - you can have the test jig for a week, so you record the inputs and outputs for a week under varying software conditions. But those recordings are only valid for that specific timing -- if you change the software and change the timing (maybe you move the trim motor slower), you get a model change and a false positive or negative.

It's certainly possible, but it's only possible with a good sized team, and supportive management who realise that the test is absolutely necessary.

SamuelAdams · on March 20, 2019

> It's certainly possible, but it's only possible with a good sized team, and supportive management who realise that the test is absolutely necessary.

Agreed. According to other sources [1], management rushed the development work so they could come out ahead of one of their competitors.

"But several FAA technical experts said in interviews that as certification proceeded, managers prodded them to speed the process. Development of the MAX was lagging nine months behind the rival Airbus A320neo. Time was of the essence for Boeing."

[1]: https://www.seattletimes.com/business/boeing-aerospace/faile...

torqueTorrent · on March 20, 2019

I agree wholeheartedly and routinely run concurrent intensive smoke tests on real-world HW as well as smoke tests on finely-modeled virtualized environments.

Even with the best modeling and virtualization, a true and thorough, 100% 1:1 approximation with the real world at runtime can likely never be attained for a myriad of reasons.

However, when lives are on the line, this gap must be closed in some manner so as to provide a greater degree of confidence.

Even the most thirsty organizations with lesser consequences for their failures are usually conservative enough and risk-averse enough to know better than to release without thorough (and relatively inexpensive) testing.

My old boss used to tell stories about back in the mainframe days whereby he would send customers fancy, branded and shrink-wrapped finished-product but containing blank tapes for the latest release in order to buy a couple of weeks of extra dev time if he thought the software wasn't ready to escape.

xvf22 · on March 20, 2019

An extra set of eyes saved them, it's a shame that there wasn't any way for them to include a reminder in the next crews flight plans.

sundvor · on March 20, 2019

It is pretty shocking this wasn't noted as a serious incident needing investigation before more flights were undertaken.

amanzi · on March 20, 2019

There's a really good "The Daily" podcast (~20 minutes) about these crashes that answers a lot of the questions on this page.

https://www.nytimes.com/2019/03/19/podcasts/the-daily/boeing...

logfromblammo · on March 20, 2019

A machine that relies on sensors has two ways to detect when a sensor has failed: another sensor, or human observer input.

I don't know how avionics hardware engineers do it, but in this neighborhood of the Internet, we don't trust inputs, and especially human user inputs. Because every unverified, unsanitized input is an attack vector for bringing down our software and the system it runs on.

From what I have seen, the MCAS in the crashed planes relies on a single sensor--the AOA vane in the nose--and was almost solely responsible for catastrophic loss of altitude. This model of passenger jet has a paid upgrade option to add a second sensor, with disagreement detection.

My question is why don't the yoke inputs from the pilots count as disagreement with the AOA sensor? If the yoke is consistently counteracting the action of the MCAS, why can't it disable itself automatically and illuminate a light to indicate it has failed?

I'm guessing the pilots would have more time to search through manuals in-flight to clear the fault and re-enable the system than they would trying to disable it while it's stubbornly trying to crash the plane due to a single point of failure.

It's not hard to adopt the defensive mindset that your users (or your professional testers) are maliciously trying to destroy your beautiful program with a combination of stupidity and cleverly designed unanticipated inputs. When hardware gets involved, one can personify Entropy as a being that is trying to destroy everything you love and kill you.

How would Entropy take down a plane and kill all passengers? How about it freezes the AOA sensor in the "nose is at +90 degrees pitch" position? How do we defend against that attack vector? Pilot training? Oops! Entropy also made them forget that page out of thousands of possible pages of procedural training during the critical seconds they needed to remember it. The only way to fight Entropy is by making random events more independent, rather than causally linked in a failure cascade.

I don't think this course towards blaming Boeing's lack of documentation and/or pilot training is helpful. I don't think there's any option for Boeing but to immediately recall and retrofit all aircraft to the multiple AOA-sensor option, at their expense, and refund every airline that actually paid extra for it.

fjfaase · on March 20, 2019

I am almost sure that some engineer of Boeing has noticed that there was a major design flaw with the function of the MCAS, but that he was overruled by a less technical (and probably younger) superior.

kkarakk · on March 20, 2019

more likely the flaw was noted but it would be more expensive to redesign than to eat the cost of whatever lawsuit they'd be hit with

CivBase · on March 20, 2019

> A malfunctioning sensor is believed to have tricked the Lion Air plane’s computers into thinking it needed to automatically bring the nose down to avoid a stall.

That is ridiculous logic to implement in a "safety" system. An automated system should never cause a plane to dive unless it also knows that it has enough altitude to safely do so - much less in a way that makes it difficult for pilots to override.

proaralyst · on March 20, 2019

Given the system assumed it was in stall, which means loss of altitude anyway, surely it's safer in general to go nose-down to avoid the stall? At least then you have a chance of recovery, which you don't in a stall. (Except of course, going nose-down.)

CivBase · on March 20, 2019

If you don't have enough altitude to afford going nose-down, let the pilot handle it. If your system can't come up with a safe solution, do not override the pilot's controls in favor of a dangerous solution.

A computer should never assume it knows better than a pilot. A computer is only as good as the data it gets and the software it runs. Sensors fail. Data gets corrupted. In the current state of the industry, software bugs are inevitable.

Airplane software is supposed to help pilots, not hinder them. In light of these events, I'm thinking twice about wanting a self-driving car in the near future.

cmurf · on March 20, 2019

Elsewhere I've read MCAS does take altitude into account, as well as flaps, i.e. it's only active above a certain altitude, and only when flaps are retracted. So... yeah, we don't have the full story. And also in another thread, it's reported from the flight prior to Lion Air 610 (same plane) there were airspeed and altitude disagreements. I'm not at all clear from available reporting whether airspeed, altitude, and angle of attack were inconsistent, if that was a source of either autopilot confusion, and then pilot confusion, whether pilots did set stabilizer trim to cutoff and when and whether it was too late.

I'm a pilot (former CFII) and the whole automation fail danger strikes me as terrible. John Q Public says "I want the automation to override the pilot's mistakes!" What? OK fine. What about Asiana Airlines Flight 214 where the pilot advanced throttles, an explicit intent input, and yet autothrottles were set so the automation said nope. And then John Q Public are all, well the pilot should have KNOWN!

It's like it's a game where the pilot is only there as the last resort to be blamed if they too fail, even after a sequence of automation failures. Automation betraying pilots at low altitude is in my view functionally equivalent to an in-flight breakup. And automation in the cockpit mentality in the face of failures has been, for 20+years, "add another button, add another feature, add another routine" to tack on all the others.

And yes this absolutely makes me think of autonomous driving as total b.s. Airplanes are in a standardized system, with far bigger budgets for automation and yet we still have to fall back to human pilots for routine procedures like parking, taxiing, VFR approaches and landings, and telling the automation literally every detail it needs to do, ATC communication. It's ripe for end to end automation and yet we still don't do that. Driving cars is wildly more complicated for automation: non-standard streets, paint, signage, laws, pedestrian behavior, bicycles, cars still driven by humans, weather - haha. Sounds nice, great idea, keep trying, but it's complete bullshit.

cmurf · on March 20, 2019

Jump pilot would have had a natural line of sight too the trim wheel, and may have seen it move "unscheduled" at the same time as the nose down. This might have given him a unique suspicion of auto trim.

I expect this will be included in the accident report. Hopefully NTSB will conduct their own first hand interview with this pilot. (I can't think of why they wouldn't.)

reasonablemann · on March 20, 2019

Is it possible for Boeing engineers to lose their professional status as a result of this situation?

LeifCarrotson · on March 20, 2019

The fault doesn't lie with the engineers who built the system...not to mention I would be very surprised if they were professionally certified.

It lies with the managers who wrote the specification that said that for business reasons the new plane must not require any additional training or type certifications, and cut costs by implementing the required systems with a non-redundant sensor.

kayfox · on March 20, 2019

I think this is one of those situations where you may not be able to assign fault to any one set of people. Remember that everyone here has the clear advantage of knowing what went wrong and how, the people who designed this may not have forseen any such situation. Also one of these systems, even something as "simple" as MCAS would have involved dozens if not hundreds of engineers in all the design decisions that lead to this issue, including many people who have already retired or done so long ago (remember, the 737 was originally designed in the 60's).

The desire to assign fault to one individual and punish them is a very emotional response to this situation, it means people will not accept that it was an unforseen or systemic issue (people familiar with air crashes have seen in the past some systemic issues, where everyone did everything how they were supposed to but things still went south) without some individuals to blame. Typically crash investigations try to ferret out all points of failure, you might read "pilot error" in the news, but rarely is that the only cause in the crash report.

For example, this crash: https://en.wikipedia.org/wiki/American_Airlines_Flight_965

It identifies some training issues (automation dependence, speedbrakes still being applied while executing terrain avoidance maneuver) but also identifies issues with the FMS in how it manages waypoints and how they are named.

So, for the current situation there are obviously many aspects to be addressed:

1. MCAS software appears to do more than specified, this is apparently what the software update (delayed by the government shutdown apparently) is to fix.

2. Pilots need to be trained or retained on stabilizer trim runaway.

3. The 3 AoA sensor option might need to be mandatory on the 737 MAX.

4. The FAA may need to review their effective supervision of both Boeing and the air carriers.

magduf · on March 20, 2019

My understanding (from other recent articles and discussions like this one) is that Boeing is self-certifying (thanks FAA!), and because of this, they have at least one engineer, probably a few, who are on staff and who do their certification and are professionally certified themselves to do this. These engineers would therefore be personally liable for this plane's problems, because they signed off on it.

8note · on March 20, 2019

that sounds like the engineers are at fault for not refusing to stamp the thing.

dentemple · on March 20, 2019

The ones at fault at the people at the FAA who let Boeing certify their own planes.

darkpuma · on March 20, 2019

Both are at fault. The existence of the FAA does not absolve Boeing, their executives or their engineers, of blame.

TheSpiceIsLife · on March 20, 2019

Everyone involved can share the fault.

Unfortunately, correctly apportioning all of the blame won’t bring back the souls lost.

magduf · on March 20, 2019

No, but the whole point in assigning blame and severely punishing people found to be at-fault is to prevent things from happening again.

If we just let these people off scot-free, then you can count on more similar things happening.

jimktrains2 · on March 20, 2019

Wouldn't a better point of view be: shouldn't we reëvaluate the items that allow self-certification and fund the FAA properly so that they can certify medium/large change themselves instead of letting the manufacturer?

Self-certification has a place, but it should always be accompanied by random checks and shouldn't be for anything large, critical, or first time through.

Edit: Better in that it helps solve what is largely a political and not an engineering problem.

cjbprime · on March 20, 2019

Are you outside the US? Engineers here do not typically have professional status.

philpem · on March 20, 2019

Same generally applies in the UK.

I can only think of one CEng I know, and he works in civil engineering. I've never met a Chartered/Certified software or electronics engineer (that I know of).

cloakandswagger · on March 20, 2019

[flagged]

sokoloff · on March 20, 2019

I agree that the pilots likely shoulder some of the blame (and in an NTSB-investigated crash, I'd expect their failure to follow the non-normal checklist memory items to be the primary cause), it's not enough to say that this was simple pilot error and poor maintenance.

Boeing's going to wear some of the blame here, as is proper, IMO.

_lqaf · on March 20, 2019

You seem very certain of the cause(s) of the crash. Care to share the source of your knowledge?

rootusrootus · on March 20, 2019

[flagged]

_lqaf · on March 20, 2019

Partly my point. Plenty if internet experts to go around.

ceejayoz · on March 20, 2019

From the article:

"There have been no reports of maintenance issues with the Ethiopian Airlines plane before its crash."

samfisher83 · on March 20, 2019

Suprised they don't have a lessons learned portal. Would have saved some lives.

zkms · on March 20, 2019

That's the sort of thing the NASA report system is for, though IDK the latencies involved in the system: https://en.wikipedia.org/wiki/Aviation_Safety_Reporting_Syst...

> The Aviation Safety Reporting System, or ASRS, is the US Federal Aviation Administration's (FAA) voluntary confidential reporting system that allows pilots and other aircraft crew members to confidentially report near misses and close calls in the interest of improving air safety.

WrtCdEvrydy · on March 20, 2019

Can you imagine if the public saw how often planes were close to disaster?

inamberclad · on March 20, 2019

Don't confuse apathy with unavailability. The public (and it sounds like you as well) just doesn't look. Here's NASA's safety database: https://asrs.arc.nasa.gov/

userbinator · on March 20, 2019

They already can: http://avherald.com/

JMTQp8lwXL · on March 20, 2019

Only as useful as people actually reading from such a portal.

ratsimihah · on March 20, 2019

It's one thing to memorize things, it's another thing to be able to use that knowledge in the right context and situation, particularly when under panic.

abbadadda · on March 20, 2019

What are the odds this is completely fabricated by Boeing? Not saying I think this, but if it was a movie and this was a cover-up, this would be a great plot twist. I suppose I'm maybe just a little jaded from all the fake news these days.

jagthebeetle · on March 20, 2019

There's a difference between jaded and cynical :) Not that I know the answer to your question any better. We'll have to wait for the HBO documentary.

To play devil's advocate anyway, as someone who has not been following this actively, I find this article to cement the idea in this reader's mind that a Boeing malfunction is involved in all three incidents. Is this even conclusively established? Would Boeing want this spin at this point?

The suggestion that an extra brain might randomly have averted two multi-fatal crashes and that this error mode has occurred at least three times seems like it would be a bit pyrrhic for the PR people at this juncture, no?

fxfan · on March 20, 2019

The MIC is too powerful and influential over both parties.

While I can sometimes like trump for not submitting to anybody - even he bows to MIC

shiven · on March 20, 2019

From my point-of-view, two opinion points:

1. I am glad that 737 MAX has been grounded. May it stay that way, globally, until this issue is provably resolved.

2. The entire Boeing chain of management that resulted in these crashes should be publicly flogged, their remuneration & benefits clawed back & subject to a mandatory minimum prison sentence.

Who the hell am I kidding! Neither is very likely to happen in the present day US. Carry on then, I guess. Just make sure to sign your Last Will & Testament before taking that next flight.

rvolkan · on March 20, 2019

What happens if your #2 is applied to doctors, car/ship manufacturers, food producers, grocery stores, house builders, taxis, restaurants, software engineers, medical device producers and so on? Every profession caused accidental deaths.

"Legal action" against bad decisions is a must. However, mandatory prison sentence for accidents is a terrible idea.

kunkurus · on March 20, 2019

If Boeing knowingly exposed the passengers to the risk of injury it's criminal negligence and usually the punishment is imprisonment: https://en.wikipedia.org/wiki/Criminal_negligence

rvolkan · on March 20, 2019

Exactly my point. Imprisonment should come into play when the accidents are proven to be caused by Boeing's negligence.

ibejoeb · on March 20, 2019

Is it negligence or just a bad design? Who decides? The thing is starting to look like Boeing thought that MCAS failure was similar to and corrected by the same procedure as runaway trim. Time will tell if that is the case, but if it does, should the pilots be posthumously tried for negligence?

Wowfunhappy · on March 20, 2019

Especially in light of the current size of the US's prison population. We should be very careful in general about advocating for more prison sentences. It's an easy thing to do, but the societal outcome is a lot more complicated.

magduf · on March 20, 2019

Oh please. If there's one demographic we don't have much of in our prison system, it's upper-class corporate executives. We could stand to let some non-violent drug offenders out early to make room for them.

veryworried · on March 20, 2019

One thing that bothers me about this generation is this thirst for infinite punishment.

People hunger for someone to blame, rattling off a long list of maladies that should befall that person, until they have been thoroughly satisfied, but they are never satisfied. They always feel there should be someone else, something more, something deserved.

The truth is, there is no point to such a punishment here. It is unlikely that any individual plotted to kill people by pushing some faulty code out of malice. These were people simply doing their best and they failed.

nine_k · on March 20, 2019

While I agree that thirst for punishment is counter-productive, I'm not sure that people did their best, or rather that the criteria of the "best" were right.

I remember that the aircraft in question was tweaked beyond stability in order to reuse the existing type certificate. This procedure need scrutiny, likely both on Boeing's and FAA sides.

jyounker · on March 20, 2019

The news that I'm hearing now is that Boeing has been working on a software fix for this problem since at least January.

Where were the glaring safety warnings to the airlines, their customers?

veryworried · on March 20, 2019

The thing about a software fix is that you never know when the solution is near. It could be fixed next week, or it may require an entire rewrite of critical systems. You just don't know until it's fully diagnosed. So why sound the alarm when plenty of flights have gone without problems and a software fix might be around the corner, especially if you have all your best men working on the problem?

JMTQp8lwXL · on March 20, 2019

Even if the US doesn't choose to do much (though I find it embarrassing the FAA was one of the last regulatory bodies to respond), Boeing will face a reckoning globally from other regulatory agencies.

Stock is down 15% since March 1. Hard to know what an executives there are thinking, but I hope some folks in the organization genuinely feels some sort of empathy for the families of the deceased on these flights.

markdown · on March 20, 2019

> though I find it embarrassing the FAA was one of the last regulatory bodies to respond

The top 3 officials at FAA are unfilled, with seat-warmers there in an "acting" capacity. I wonder if that's related. https://www.faa.gov/about/key_officials/

jki275 · on March 20, 2019

Honestly, those top positions in almost any organization are often political appointments that have little to do with day to day operations. The current "actings" are generally the ones who "advise" the political appointees on how to handle things. Obviously there are some exceptions, but most bureaucracies tend to run that way.

Not to comment specifically on this as FAA isn't my area, but if the secdef doesn't come to work tomorrow the undersecretary is going to take the same actions he would have. I'd imagine most of those orgs trend that way.

matthewdgreen · on March 20, 2019

The value of having confirmed appointees in those positions is not necessarily their native technical expertise. As you’ve noted, that expertise can be provided by career employees. The value of political appointees is the political clout they carry. Given that they’ve been appointed directly by the President and confirmed by the Senate, it is much harder (or politically fraught) to simply threaten or replace them when they take an unpopular stand like “let’s ground an airplane.”

The US government is a complex system, and like most complex systems it works best when you, a non-expert, don’t randomly yank out pieces and declare them unnecessary.

jki275 · on March 20, 2019

Well, that's not what I said. I've not claimed they're unnecessary, just that they have a slightly different and perhaps less important role than the post I replied to was giving them. The undersecretaries can make the same decisions, and in fact the career personnel in the organization have far more ability to take action without fear of replacement by anyone as they have more protection than a secretary who serves at the pleasure of the appointing authority.

That said, generally the undersecretaries are appointed and confirmed as well, as their role is to step in and backfill if the primary is not available, so that answers that issue.

matthewdgreen · on March 20, 2019

I can’t find any evidence that the current acting FAA administrator was Senate confirmed. Is the Internet just being unreliable here?

I think your notion that career personnel have as much DC political clout as unconfirmed career officials is one of those things that sounds good if one is trying to win a debate, but is unlikely to represent the actual facts on the ground.

jki275 · on March 20, 2019

Just a quick google search turns up this article that says he has been.

https://www.americanshipper.com/news/?autonumber=67974&sourc...

Political clout in DC isn't what runs organizations and gets the day to day business done. It's what plays well on the hill in pointless back and forth BS sessions that a totally ineffective congress likes to have, it probably helps to some extent in budgeting discussions, but it's more posturing than anything else in a lot of ways.

Anywhere you go in the military you'll find all the GOs who are in charge, and powerful, and senate confirmed, and all that good stuff, and they've got a Chief of Staff and an aide who actually run everything, and can continue to do so if their GO walks in front of a bus. It's no different in any huge bureaucracy -- sure, CEOs make decisions, but the day to day business doesn't stop if they don't answer the phone for a while. I would actually posit that if it did the whole organization is dysfunctional to the point of ineffectivity. But I'm being redundant in describing DC that way perhaps.