Hacker News new | past | comments | ask | show | jobs | submit login
Pilot Who Hitched a Ride Saved Lion Air 737 on the Day Before Deadly Crash (bloomberg.com)
327 points by erict15 on March 20, 2019 | hide | past | favorite | 260 comments



> That extra pilot, who was seated in the cockpit jumpseat, correctly diagnosed the problem and told the crew how to disable a malfunctioning flight-control system and save the plane, according to two people familiar with Indonesia’s investigation.

> The presence of a third pilot in the cockpit wasn’t contained in Indonesia’s National Transportation Safety Committee’s Nov. 28 report on the crash and hasn’t previously been reported.

So the NTSC explicitly chose to exclude this and then two whistleblowers went to Bloomberg? That is fucking wild.


It didn't need to be included in the preliminary report since it is contextual/background information.

They did include in the report that on a previous flight the same sensor error occurred and the pilots resolved it by disabling auto-trim. The fact there was a third pilot there is definitely interesting, but they didn't make anyone less safe by not including it in the report.

There's no bombshell here. Previous problems were well known/reported before today.


This is very important, as it shows the ratio of pilots aware of the mitigation is low, and/or the stress of fighting with the the computer makes you forget you training amendments

Edited to add it’s a training amendment


Actual instrument-rated pilot here.

What people (including pilots) do under stress is resort to simple actions drilled in by training. That's why emergency procedure training is a major part of pilot training curriculum.

What goes away first under stress is capability for complex reasoning, which is what one would need to figure out what the heck is going on and take appropriate action. In this case an a/c doing uncommanded dive can be a result of three things: elevator controls malfuncion (unlikely in modern jets, the mechanisms for transmitting control forces from the yoke to the elevator are multiply redundant and well-understood to be critical), structural breakdown (empennage falling off would do that), or run-away elevator or stabilizer trim.

In most a/c the runaway trim can be only caused by malfunctioning auto-pilot (that's why APs have multiple redundant ways to disable them), so the pilots faced this situation react by trying to disable AP in all available ways and then immediately jumping to conclusion that there's something catastrophically wrong with elevator system or even airframe integrity. The thought that there could be yet another system which can do major trim adjustment doesn't even enter their consciousness.

Training (and/or being in a position where there's no immediate stress of wrangling misbehaving a/c) would help to get to the correct solution (disabling trim motor by pulling the circuit breaker).

That said, flying an a/c seriously out of trim is hard, and can cause an accident on its own (the fine pitch control is gone, and you can even be so far out of trim that you cannot do landing flare no matter how hard you pull up, which in heavy jets would result in structural breakdown when the plane slams into runway).


My question is why doesn't the "omg something is seriously fucked up give me manual control NOW" button disable the MCAS as well? That seems like a UX failure.


That would be the autopilot disconnect button or, in this case, the trim switch, the pressing of which is the correct immediate response to this occurrence, followed by the disabling of the system using switches on the center pedestal. While it's certainly hazardous for Boeing to knowingly expose pilots to this occurrence, I'm also conflicted by the fact that they failed to respond properly. The procedure is similar on Boeing, Airbus, Embraer and Bombardier aircraft. Standardized, in a way.


MCAS is designed to work only with autopilot disengaged. With autopilot and autothrottle engaged there would be similar logic at play because the general goal would be to not have the automation induce a stall. With everything disabled there is a risk that applying full throttle will result in the airplane pitching up dramatically beyond the ability of the elevator to pitch the airplane back down and potentially inducing a stall. This is an inherent problem with putting big engines under the wing (and is a problem in the 737 Classic[0], NG, and MAX).

You could almost certainly disable just about anything by pulling the appropriate circuit breaker. You don't want to do that because the plane may be left in a state where you can't control it manually and you only have a few thousand feet of altitude to recover.

There are, in fact, two switches to turn off the electronic control of the horizontal stabilizer. The previous Lion Air crew used them after the pilot riding in the jumpseat went back and grabbed his copy of what I'd assume was the flight manual and had an ah-hah moment[1]. The flight 610 crew were pouring over something (quick reference handbook?) but didn't express any awareness of the stabilizer being trimmed[2].

Depending on how bad things got there are a few problems with this. First, electrically moving the stabilizer is much faster than moving it manually. MCAS can operate at two speeds, IIRC both are slower than the speed dictated by the buttons on the yoke (plus the buttons on the yoke will pause MCAS). But even MCAS will move the stabilizer much faster than a pilot could by cranking the manual wheels. I believe that moving the stabilizer from one extreme at the fastest speed to the other takes about a minute. You can find videos of the stabilizer mechanism on a 777 on youtube, it's just not something that's designed to move quickly.

The next problem is that if the stabilizer is pitched down sufficiently (perhaps by MCAS), your first instinct may be to pull back on the yoke (to move the elevator) to regain pitch. Well, once you've done that the aerodynamic forces may be such that you can't move the stabilizer into a nose up position until you let go of the yoke and let the plane pitch down further (Boeing references needing to unload the stabilizer in some documents). That's something you've gotta have cat-like reflexes for when you're already flying so low.

The really sad part is that the pilots could have disabled MCAS and retained electric control over the stabilizer by extending the flaps. But how are the pilots going to know this when Boeing refused to document anything?

0: http://avherald.com/h?article=419f2f9e

1: https://www.grid.id/read/04966850/deretan-kejanggalan-yang-d...

2: https://www.straitstimes.com/asia/se-asia/cockpit-voice-reco...


Thank you for your post. I've read that Boeing removed the "yoke jerk" function on the 737 MAX. The flight 302 PIC might have been trying to use it and I have not read whether that functional change was clearly explained to pilots. Many commentators don't seem to realize the extreme low altitude at which this particular struggle was occurring.


>The pilots of a doomed Lion Air Boeing 737 Max scoured a handbook as they struggled to understand why the jet was lurching downwards, but ran out of time before it hit the water, three people with knowledge of the cockpit voice recorder contents said. (straitstimes)

I can't help thinking things would be safer and the Ethiopian crash may not have happened if they just published all the info rather leaving this kind of thing to anonymous leaks to the press.


If they had published this info, then they couldn't have sold the jet to airlines as just another 737 that any other 737-rated pilot could fly without additional training.


Boeing was right not to document anything, because then they might have to get a new certification (which is very expensive) for the plane, and pilots would need more training, which would make it hard or impossible for them to sell this plane to airlines as "just another 737".

Remember, a few fatal crashes are worth it as long as it means higher corporate profits.


I think I saw that movie. "Fight Club"? Or was this the less well known sequel, "FLIGHT Club"?


That's very informative, thank you.


Well, there is no such button. Perhaps there should be though a better solution might be to fix the automation. Airbus planes seem to work ok without a button but don't have a crash the plane if one sensor fails system.


There an AirFrance (Airbus) flight a few years back that crashed over the Atlantic that was partially the result of a stuck pitot tube. I remember at the time one of the debates was how automated Airbus planes were over Boeing. Pilots seemed split as to which system they preferred, but generally wanted more control, not less.

https://en.m.wikipedia.org/wiki/Air_France_Flight_447


That's an excellent question, and will likely be one of the conclusions of the accident investigations.


Could be related to the issue that MAX is unstable in flight with that system off. Unlike previous 737s.


No, under normal conditions the MAX isn't unstable. The instability comes only at high angles of attack (close to stalling) and only there the MCAS system should kick in. It was meant to be the equivalent to ESP systems in cars.


Yep. Except that the "exceptional" conditions under which the MCAS kicks in can also "occur" (=seem to occur) due to sensor failure.


Do we know, exactly, how much of the 737 Max 8's flight envelope was lost compared to earlier models? I'm getting the feeling that the instability happens far closer to the regular maneuvering envelope than is being alluded to in the press.


It's a single data point / anecdote. These investigations are incredibly thorough, precise, and authoritative so it makes sense they'd seek to exclude that kind of information until they knew for sure.


> It's a single data point / anecdote.

A single data point is still a data point.


And a very relevant one, in this case: trim runaway with pilots not understanding it's a runaway. The visiting pilot does recognise the situation and restores expected flight dynamics. What went wrong, why did not two pilots realise something that is traditionally a known case and well understood? Because it didn't look like a traditional case.


An irrelevant one in this case. This report needs to find out the technical/training/... reason, not why the other pilot needed a ride. I'm sure they'd have included it if they wanted to recommend having three pilots on every flight.


I'm not sure that thorough and precise implies they would seek to leave out things. In fact, one would expect the opposite.


One data point may lead you to bad conclusions, like in this case "one third of pilots don't know how to disengage the MCAS" -- the fact is we don't know what ratio. It might be higher, it might be lower, but one data point isn't valuable or thorough. In the preliminary report they mentioned someone else disabled it. In their final, I'm sure we'll get the details.


It also shows that pilot awareness (and therefore training) should have been sufficient. The key deficiency here was Boeing's. The agency not disclosing this detail is a problem too.


That.. sounds like the literal opposite of what it shows, no?


I think you misunderstood. Had the pilots been trained, they could have dealt with it -- that the off-duty pilot was able to shows it was possible to respond correctly, while that the others weren't shows the lack of training is a real problem.


Thanks, I see. I don't think it does much good to notice that a small percentage of pilots can be capable of figuring it out, if they have no other duties at the time.

That doesn't tell us that it's reasonable to expect pilots in command under stress to always figure it out. And they have to figure it out 100% of the time to avoid statistically unacceptable crashes. So I don't agree that pilot training is an acceptable answer given what we know so far.


Yes it does: it shows that there is a procedure that would have worked had the pilots recognized the issue and known about the procedure.

The pilots were stressed in part because they didn't know -- they had plenty of time to recognize the issue and take action had they known, but they didn't, so no amount of time would have helped them. We don't know if one more thing to know about would too much, but I rather doubt that.


It suggests that no LionAir crews knew how to handle this.


US pilots say they didn't either. The training wasn't out there.


The runaway stabilizer trim procedure is how it was expected to be handled:

https://www.flightglobal.com/news/articles/faa-order-tells-h...

https://www.youtube.com/watch?v=3pPRuFHR1co

The specifics of MCAS weren't out there because Boeing expected the pre-existing procedure to have been sufficient. That procedure wasn't sufficient, or the Lion and Ethiopian Air pilots were somehow not aware of it, or they forgot their training in a stressful situation.

To be clear: the planes should not have put these pilots into the situation they were in, but unless the recovery procedure doesn't work, these planes should not have crashed.

[One thing I've been wondering: somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there? It's as if they understood how to correct the stabilizer trim, but weren't aware of the cut-off switches.]


somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there?

The MCAS activated several times, each time ratcheting the stabilizer down by 2.5 degrees. Every time the pilots tried to correct for it, it would engage again. That is what caused the characteristic up/down flight pattern.

There are a lot of details in this story: https://www.seattletimes.com/business/boeing-aerospace/faile...


I've read that but it still leaves me confused. Say trim is at 0°. MCAS sets it to nose down pitch of 2.5°. Pilot pulls back yoke. Why doesn't pilot then also look over at the stab indicator, see that it's at 2.5° nose down and reset it with the electric trim switch back to 0°? If the pilot had done that, when MCAS runs again, doesn't it set the stab back to 2.5° nose down? So the pilot and MCAS should be fighting over 2.5° of trim.

What it seems like happened is MCAS ran the stabilizer to 2.5°, but the pilot didn't reset it back to 0, but just back enough he could counter with the elevator. So MCAS keeps cranking the stab each time a bit further nose down and the pilot keeps only partially countering MCAS, instead of running the stab fully back to 0.

The pilot was clearly aware stab trim had something to do with the situation because there were stab adjustments from the pilot. What I can't make sense of is the pilot only partially countering MCAS. I realize the Lion pilots didn't know about MCAS and so their mental model must have been flawed. I'm even more confused about the Ethiopian Air pilots who should have known about MCAS.

edit: also this - https://news.ycombinator.com/item?id=19442160


One thing I've been wondering: somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there? It's as if they understood how to correct the stabilizer trim, but weren't aware of the cut-off switches.

My understanding is that every time the MCAS activated, its effect was incremental. It could command only a limited deflection of the relevant control surface each time -- less than 1 degree? -- but the cumulative effect of multiple activations would eventually exceed the ability of the manual controls to override it.


From the reports I've read it was 2.5 degrees, even though Boeing had produced safety docs to the FAA only mentioning their initial design when they were only applying 0.6 degrees per activation.


That procedure wasn't sufficient

Winner, winner, chicken dinner. The existing runaway stabilizer trim checklist does not fit the MCAS failure to a T. The procedure dictates that you stop if the trimming action stops when you counter it with the switches (MCAS pauses when you do this). Unfortunately when you're taught to only think inside the box, you'll go as far down that checklist as you need to and no further.

Beyond that, just after takeoff is one of the busiest part of the flight. In fact, if you look at the graphs you'll see that the stick shaker on the left side was going crazy almost immediately after takeoff (and before MCAS kicked in as a result of retracting the flaps). The pilots of flight 610 were able to reign in MCAS with the trim buttons and almost certainly continued to do so in order to focus on the other failures and their relevant checklists (stick shaker, unreliable airspeed, elevator feel system non-op).

[One thing I've been wondering: somehow the pilots were able to bring the nose back up... it was only (apparently) after multiple nose downs that they eventually failed to recover the plane. What was going on there? It's as if they understood how to correct the stabilizer trim, but weren't aware of the cut-off switches.]

Yep, to me, that looks an awful lot like some sort of hardware failure. Something to look for in the subsequent and final reports.


Runaway stabilizer feels nothing like MCAS. They likely didn't associate the training they received with the problem at hand.


Can you explain how they are different?

Both cause the stabilizer to tilt. Both cause the stabilizer wheels next to the pilot/co-pilot to turn. If the plane is pitching down, and you pull back the yoke and that still doesn't recover, don't you at some point check the stabilizer angle? If it's wrong, you use the stabilizer trim switch on the yoke under your thumb to correct it. Won't you see the stabilizer wheel turn when you do that? If you let go and the wheel immediately starts turning again the wrong way (pitching the plane down), doesn't that look like runaway stabilizer?

Also, how did the off-duty pilot figure it out if they are nothing alike?

BTW, when you write "runaway stabilizer feels nothing like MCAS" are you speaking from personal experience?


Runaway stabilizer causes a more dramatic sudden movement than MCAS. MCAS's start, wait 5 seconds, re-start, motion is more gradual. For example you could look at the wheel and it has stopped, only to re-start after you look away (add noise cancelling Bose headphones and you may not hear it)

Pilots are trained well for runaway stabilizer, MCAS was essentially the same thing, but it didn't FEEL like the same thing, so a stressed pilot's mind may not immediately go to that solution/memory item.


Basically, the pilots would get more and more confused as their plane went more and more out of trim.

Now things are starting to make sense.

Boeing claimed (and F.A.A agreed) that pilots didn't need training because they already had training for a memory procedure which would have solved the issue.

They both appear to have overlooked is the MCAS symptoms were sufficiently different to the runaway stabiliser scenarios pilots trained on, that pilots are having problems knowing which procedure to apply.


That might be the ultimate answer to why the accidents did happen. That while pilots should be well trained to deal with runaway stabilizers, they didn't recognize the situation in time (which probably was extremely short with the second crash). Which can only be explained that the way the events unfolded, distracted the pilots from dealing with a runaway stabilizer. The third man might just have not been "distracted" by piloting the airplane and thus could see it.

Besides obviously making the sensors used by MCAS truely redundant and limiting its extreme behavior of moving the trim up to the stops, a large part of fixing the MAX might be just having a big warning light for MCAS operating.


My guess would be perspective, two kinds.

First he’s gonna be behind the throttle quadrant. The stabilizer wheels are roughly in the center of his vision, so he will notice them moving and when they start and stop. The pilots have them down near their thighs, so they’re not even necessarily in their peripheral vision. They will be less likely to notice every time they start and stop.

Second the jumpseat guy has no flight responsibilities. He can focus on thinking about what’s wrong without dealing with all of the other stuff you’re doing on departure. Radio calls, navigation, performance monitoring, configuration changes or being agitated by the stick shaker.


Pilot priorities are: aviate, navigate, communicate. If you listen to incident recordings often pilots tell tower to wait and do not respond immediately. So radio calls are always low priority compared to actual flying. But I agree with point of view argument.


Also, how did the off-duty pilot figure it out if they are nothing alike?

My assumption is that the jupmseat guy took a very generous interpretation of the checklist. But there may have been more panic about the trim on the earlier flight. If you compare trim inputs on flight 610 to the previous flight, the pilots on the earlier flight were making much shorter, more rapid inputs than the pilots on flight 610. The plane also pitched down far more dramatically than on flight 610.

The graphs for flight 610 seem to indicate that the pilots thought they had the trim situation under control (and they did right up until they didn't).

https://reports.aviation-safety.net/2018/20181029-0_B38M_PK-...


Thank you for the preliminary report link. It’s the first I’ve seen it.


A runaway stabilizer is a single motion. The control surface moves, you react by shutting down the motor or fighting it by hand, and the situation should calm down. A malfunctioning MCAS will feel like multiple different runaways, in various directions. Once you react to the first, the system could be start pulling things the other way. Rather than a steady pull one way or another, you get into a back-and-forth fight.


> The specifics of MCAS weren't out there because Boeing expected the pre-existing procedure to have been sufficient.

This was argued by Boeing to the FAA, who accepted the reasoning.

It should be noted that, if it had been determined the pre-existing procedure wasn't sufficient, then all pilots on the 737 Max would've had to undergo re-training. This would have been expensive, and probably hurt adoption of the new airframe.


> This would have been expensive, and probably hurt adoption of the new airframe.

What alternative did the airlines have though? The A320neo? Cause that would've required re-training too.

Probably Boeing should've pushed back against the airlines: we've made you a more fuel efficient 737, but it's going to require re-training.

Someone or someones may have thought: "MCAS is the only thing that requires retraining? Can't we fudge that?"

And the thing is, it still seems like MCAS is a reasonable solution, if poorly implemented.


all pilots on the 737 Max would've had to undergo re-training. This would have been expensive, and probably hurt adoption of the new airframe.

Am I the only one hoping that in the end, the costs for Boeing will be more expensive than the global re-training would have been?


It's not the cost of the training; that would have been borne by the airlines, not Boeing, I think. The problem is that if this plane required extra training, this would have been an additional cost to the airlines, meaning they might not have bought the plane in the first place, and would have bought the competing plane from Airbus. Also, not only is it an additional cost, it means they would have had to have a different type-rating for this plane (a pilot rated for a regular 737 wouldn't be allowed to fly this version).


The article contradicts itself on this. Early on, it says the rescuing pilot referred to "part of a checklist that all pilots are required to memorize".

Later, it quotes claims that it "isn't in the documentation".

I would consider that published, official checklists are the most imminently critical form of documentation.


The fact that "AoA disagree" light and logic was an optional feature seems criminal enough to me. A sensor with no failover unless you pay for an option. Who the hell thought this was a good idea or approved it? WTF!

500 people are already dead. Boeing should be brought to the coals. It probably takes longer to go through the checklist than it does for everyone to die.

I promise if the audio recordings are ever released from CVR they will be absolutely damning. Pilots trying to make it through a loss of control checklist as they dive to their doom. A lot of those checklists have 50+ steps. Imagine trying to make it through that while fighting the plane and descending at over 3x "maximum design descent rate".

I'm sure the fucking alarms we're blaring and pilots cursing the system carrying them toward certain death.


The checklist in question has three steps. Step two is "move stab trim to cutout," which disables automatic systems that adjust the stabilizer. The pilots in lion air had ~10 minutes to do this.

It is extremely unlikely that the pilots were trying to work through the checklist. More likely they simply did not know what to do.


This one's actually a memory item, not a checklist. But it's the memory item for "runaway trim", which is a very different qualitative experience than the slow march of an MCAS system that you didn't know existed.


This doesn't make jive with disaster being averted by a third pilot. Assuming the third was totally dedicated to checklist vs preventing the plane from diving, it was only his insight that stopped the plane from going down.

The MCAS system apparently increased downward trim without any speed considerations, to over 2.5 degrees in 10 seconds. I don't have the full flight control details but it sure sounds like pilots would lose control within minutes at most. In the LionAir crash the pilot reported control problems and asked to return to airport within 3 minutes, and they slammed into the ocean in 12.

Not sure where you're getting this info but I'm more than sure they knew something was wrong in the last 2 minutes (while they were heading into the earth at almost the speed of sound).

You really think they have ten minutes to react when by then everyone on LionAir was doomed to die?


This "human" factor was also characterized in the movie Sully when it was apparent that just a few extra seconds for pilots to process the issue made all the difference between landing safely at a nearby airport vs. landing in the Hudson River.


Hopefully whatever engineer signed his name to this goes to prison, and gets personally sued by the families of the victims.


Engineers don't make decisions like this. Not providing documentation of the feature or providing training to pilots was the problem, and it's caused by multiple management failures. The lack of sensor redundancy likely also is due to management.


The lack of sensor redundancy HAD to be signed off by a professional engineer. It doesn't matter if management pushed it; it's the engineer's (PE's) responsibility to refuse to sign off on it.


I’m not convinced sensor redundancy is the issue. The plane has two sensors; for some reason, however, the package that includes the AoA disagreement light costs extra.

It seems that lack of training and documentation was the reason these planes crashed though. The pilots didn’t know how to recognize the issue or resolve it.


You seriously think an engineer didn't raise this concern? This reeks of high management telling engineers to know their place.


A PE's job is to refuse to sign off on things that aren't safe.


So a single point of failure (malfunctioning sensor) can engage the horizontal stabilizer without notifying pilots, in a way that the control yoke can't override.

What the hell did Boeing think was going to happen?


All fly by wire jets have movements based on sensors. In the F-16 a faulty AOA probe is okay. 2x Faulty AOA probes are okay as long as they disagree. 2x faulty AOA probes that agree (frozen) are a very bad thing. But the loss is just a jet with an ejected seat if it can't be fixed, not nearly 200 people which is terribly sad. The worst part is from a previous Hacker News post listed below. In particular, the "Economic Problem" which states "Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light, which lets pilots know that this problem was happening. Both 737MAXes that crashed were delivered without this option. No 737MAX with this option has ever crashed."

https://twitter.com/trevorsumner/status/1106934369158078470?...


> No 737MAX with this option has ever crashed.

I mean there are only two 737MAXes that have crashed in total, right? It's not like we have such a huge sample to work with for this to carry that much weight.


And the only airlines that bought the option are North American, so we are immediately confounding with pilot training differences.


I guess when ordering a plane or a car you assume you don't need to order optional extras to stop the thing killing you.


I mean, some safety features such as automatic emergency braking are available but optional on many cars.


True but this is more like it may steer into a tree if you don't have the optional steering plus package.


Would you personally board the plane whose crew didn't have this training option?


I'm not sure I'd board any 737-MAX's at this point.


>But the loss is just a jet with an ejected seat if it can't be fixed, not nearly 200 people which is terribly sad.

Not necessarily true. If that F-16 is flying over a populated area, it can kill numerous people on the ground when it crashes.


Do we have a confirmed source that it can't be override by pilot input? My understanding was that using the stick could overcome it with other control surfaces and that the controls for trim can be set/reset by the pilot.

I think Boeing has handled the aftermath (and much of the lead up since the release of the plane) very, very poorly. I, as a layman to aviation, am not willing to bet that Boeing knew the true likelihood of a problem and didn't tell anyone or had a whistle blower over it.

However, if the issue with a non-redundant hydraulic value in the original 737 didn't teach us the lesson, this should: no matter the likelihood of failure, safety critical systems should always be redundant.

(Also, Boeing didn't handle that original issue very well either.)


The trim adjustments by MCAS can be disabled by setting the two stabilizer trim cutout switches[1] to cut-out. The recognition that trim is being automatically adjusted outside of their ability to control the aircraft and how to correct it is considered so important to be a "memory item" or something pilots should know how to do without referring to the quick reference handbook.

1. https://aviation.stackexchange.com/questions/58798/why-doesn...


> Do we have a confirmed source that it can't be override by pilot input?

Pilot input works... then MCAS silently does it again with up to +2.5 degrees additional adjustment, until after enough times it's maxed out the full rotation of the tail flap. See: https://www.seattletimes.com/business/boeing-aerospace/faile...


>safety critical systems should always be redundant.

That's east to say from an armchair but on an aircraft everything is "safety critical" to some extent and you have to choose what gets redundancy. Something where without it you can't fly the plane sure, that make sense. The argument for considering MCAS, a system which is not necessary to fly the plane safely, to be "safety critical" is much weaker. The Lion Air crash wouldn't have happened had the pilots disabled MCAS instead of fighting it into the drink.


I'm fairly certain that the characteristic "If this component malfunctions loss of life is one potential outcome" is a solid case for the component actually being safety critical. The pilots would never have needed to disable MCAS had it not malfunctioned in this manner. I'm not sure what redundancy has to do with this, but clearly there was a failure in a safety-critical system.


>I'm fairly certain that the characteristic "If this component malfunctions loss of life is one potential outcome" is a solid case for the component actually being safety critical.

The same thing can be said for literally tons of components on a plane. You can't have them all be redundant. At some point you have to pick and choose. A sensor for an obviously supplemental feature seems like a pretty obvious one to choose not to be redundant.

>I'm not sure what redundancy has to do with this,

I think the part in the GP comment where he/she state that "no matter the likelihood of failure, safety critical systems should always be redundant" has something to do with it.


> You can't have them all be redundant.

Why not?


Redundant wings? Fuselage? Just 2 obvious examples, back to my armchair.


True, there are some parts which are both safety critical and fundamentally unable to be made redundant, at least not without drastically compromising the overall design. But that doesn’t justify leaving out redundancy in parts which are able.

In any case, the actual regulations in play here would have required redundancy if the consequences of failure had been properly categorized:

> [I]n normal flight, an activation of MCAS to the maximum assumed authority of 0.6 degrees was classified as only a “major failure,” meaning that it could cause physical distress to people on the plane, but not death.

[..]

> He said virtually all equipment on any commercial airplane, including the various sensors, is reliable enough to meet the “major failure” requirement, which is that the probability of a failure must be less than one in 100,000. Such systems are therefore typically allowed to rely on a single input sensor.

> But when the consequences are assessed to be more severe, with a “hazardous failure” requirement demanding a more stringent probability of one in 10 million, then a system typically must have at least two separate input channels in case one goes wrong.

https://www.seattletimes.com/business/boeing-aerospace/faile...


Those are silly examples and clearly not what anyone meant in terms of components.


Also there is a kind of redundancy in the wings say by making the internal structure much stronger than generally needed. You can have a fracture in one part without the wings falling off.


There is also much more wing than is strictly needed to keep the plane aloft and stable. They're optimized for low fuel consumption at a design-specified cruising altitude and speed, with variable geometry, so they can still take off and land at lower speeds near ground level.

From a certain perspective, the flaps give the plane redundant wings. One pair for low and slow, and another pair for high and fast.


The usual, cost. But I wonder how much the redundant sensor costs for airlines to ditch it.


Every 737 has two AoA vanes. They are redundant and the redundancy is not optional. MCAS just doesn't take advantage of there being two sensors.


I think you're right. I misread it. My point about cost still stands.

"Cockpit displays and a warning light intended to flag problems with angle-of-attack sensors in flight were optional on the Lion Air jet that crashed, according to people familiar with the matter. The carrier, like some others, chose not to purchase the feature, people familiar with the matter said, so pilots didn’t receive any such alerts."

https://www.wsj.com/articles/maintenance-lapse-identified-as...


> That's east to say from an armchair but on an aircraft everything is "safety critical" to some extent and you have to choose what gets redundancy.

In flight entertainment, non-emergency lighting, food prep, &c aren't "safety critical" any more than the TVs in hospital rooms are.

As a layman, I'd've imagined that all avionics and control surface control was redundant. Don't most large planes have redundant hydraulic systems and even a deployable wind turbine to run avionics and hydraulic systems under total power loss scenarios? (And hasn't that turbine been used a few times?)

Yes, weight is always a factor, but that doesn't mean that there aren't already multiple redundant, heavy, systems on aircraft.

What puzzels me is that there are 2 angle of attack sensors, but they're only connected to one of the flight computers each, with the other flight computer being the redundant one. What's more Southwest ordered the optional disagree alert, so there is some way to tie and compare these sensors.

Edit:

> The argument for considering MCAS, a system which is not necessary to fly the plane safely, to be "safety critical" is much weaker.

It controls the control surfaces; I'm not sure how it wouldn't count.


But this isn't really "avionics and surface control" more than automatic lane keeping is a critical control system for the car. MCAS is a convenience feature that counteracts the plane's tendency to pitch up more than you want when climbing. Sure it can improve safety, so can lane keeping in a car. Neither are critical to operation. You can fly/drive perfectly safely without them so they need not be super hardened against failure because you can just switch them off.


> MCAS is a convenience feature that counteracts the plane's tendency to pitch up more than you want when climbing.

It's still controlling control surfaces.

> Sure it can improve safety, so can lane keeping in a car. Neither are critical to operation. You can fly/drive perfectly safely without them so they need not be super hardened against failure because you can just switch them off.

I'm not quite sure where to begin. Just because the absence of something would make the plane flight worthy doesn't mean the addition of it keeps the plane flight worthy.

Take your lane following example. Sure, they can be turned off, but that didn't help the person whose Tesla ran straight into a jersey barrier because it got confused by it and where the lane was. (https://www.popularmechanics.com/technology/infrastructure/a...)

If a mechanisms can control the vehicle, it is safe critical. It needs to fail safe under all conditions. The requirement to fail safe is part of what makes it a safety critical system.

In the most extreme of examples, adding a lane following module to a car that randomly swerves into jersey barriers once at speed and if the barrier is close enough would be a clear example of a situation where the automatic controls can cause a situation that a human could not possibly react to; hence, the system itself needs to be held to much higher standards.


MCAS is a convenience feature that counteracts the plane's tendency to pitch up more than you want when climbing.

Nope, MCAS is required to meet the requirements set forth by the FAA. That's not a convenience thing. Various nannies may seem like convenience things in a well balanced car, but they become far more important in a powerful, poorly balanced car like a Porsche 911 or Dodge Viper — cars that have earned reputations as widowmakers.


Whatever is a car nanny?


Traction and stability control in modern cars.


>But this isn't really "avionics and surface control" more than automatic lane keeping is a critical control system for the car.

Even if we suppose that is true and a fair comparison (which I wouldn't), the way failure modes are handled is key. If there is uncertainty about the sensors that control the feature which controls the avionics the system needs to halt. This is like keeping the lane control active when the computer vision algorithm used to detect the lanes is uncertain about where the lane is. Chances are it'll steer you into the next available tree and kill you.


I'm afraid everything you're saying is just very wrong:

> [Boeing self-assessed] a failure of the [MCAS] system as one level below “catastrophic.” But even that “hazardous” danger level should have precluded activation of the system based on input from a single sensor — and yet that’s how it was designed.

-- https://www.seattletimes.com/business/boeing-aerospace/faile...

Even at Boeing's understated safety risk, redundancy is required. And the system actually has much more risk than they stated, since it will eventually totally deflect the stabilizer -- as happened to both fatal flights, with the jackscrews found in their farthest position in the wrecks.


But there are already two AoA sensors on the plane. They already made it redundant. They just didn't write code to perform an agreement check as an input to MCAS.

I don't think any aviation expert would agree that the case for requiring redundancy in the MCAS system is weak.


The pilots know that the trim is moving, it's connected to a wheel in the cockpit that's super obvious.

But the normal way of stopping it (the yoke) doesn't necessarily work, and the pilots wouldn't necessarily think to physically stop the wheel with their hands.


This 3rd pilot was probably able to figure out the issue due to clarity of mind to not having to fly the plane, pay attention to all the instrument gauges while things are going wrong, trying to diagnosis and correct the issue.


Seems like pilots are overburdened by so much automation, as alluded to by the famous "Children of the Magenta" video[0].

I don't get why the plane doesn't just say what it's doing, and why it's doing it, and have a big red button to put the plane into a safe-mode alternate law. 737s already say warnings like "BANK ANGLE", couldn't it just say "DANGEROUS CLIMB DETECTED, TRIMMING NOSE DOWN. PRESS RED BUTTON TO CANCEL."

0. https://vimeo.com/159496346


Boeing offered that, but it was a paid extra that few airlines purchased.

https://theaircurrent.com/aviation-safety/southwest-airlines...


This is grossly overstating the optional feature. As I understand it, "AOA disagree" is a single light near the AoA indicator itself. It's not an explanation, or even an alert. It's not where anyone would reasonably look when trying to diagnose an uncommanded flight control problem.


When lights are normally not lit and one lights up a pilot or other machine or plant operator should take notice.

A detailed explanation should not be required since they should know what the different warning lights mean and what can cause them to be lit.

I personally would like to see a list of the airlines that bought the more expensive 737 max that included the additional safety features.


The North American airlines bought it and no others did. It was not described as a safety feature -- if it was it wouldn't be permissible to make it optional.

Even the current training does not tell pilots to know that an AoA disagree light is an emergency because it can cause the plane to enter an uncommanded nosedive via MCAS. I really don't think it's reasonable to expect the pilots to know things that Boeing is not even trying to tell them.


> When lights are normally not lit and one lights up a pilot or other machine or plant operator should take notice.

The crew monitor the central EICAS display for fault indications, not random locations around the cockpit.

If you start bolting-on additional check locations you increase crew workload, particularly if it's not a 'dark cockpit' like an Airbus type.


Yeah, maybe it shouldn't be an extra.


> The Indonesia safety committee report said the plane had had multiple failures on previous flights and hadn’t been properly repaired.

It probably wouldn’t have mattered. Lion Air couldn’t even maintain their planes properly; assuming an extra indicator would help would be charitable at best.


The Ethiopian plane was new and this was still a problem.


But the normal way of stopping it (the yoke) doesn't necessarily work, and the pilots wouldn't necessarily think to physically stop the wheel with their hands.

Well, no. Pulling on the yoke traditionally moves the elevator. MCAS adjusts the horizontal stabilizer. You can adjust the stabilizer with switches on the yoke or with the trim wheels by your knee. MCAS will pause for five seconds if the pilot hits one of the switches on the yoke (and the Lion Air pilots did this until that stopped working).


I believe you may have misunderstood the comment you were replying to.

They were saying you cannot counteract MCAS's control inputs with the elevator alone. Which is a somewhat unconventional design, in a lot of aircraft the elevator can overpower the horizontal stabilizer, whereas with a bad sensor MCAS will continue to move the stabilizer until you cannot overcome it.

To use a bad analogy, in a car the break is stronger than the accelerator, so if the peddle sticks you can still stop. In other aircraft the elevator is more powerful than the horizontal stabilizer.


They were saying you cannot counteract MCAS's control inputs with the elevator alone.

Pulling on the yoke is not how you counteract a runaway stabilizer on the 737. I've pasted the relevant part of the QRH in a few previous comments. Yes, the stabilizer ultimately has more pitch authority under some circumstances. That may be what happened here, but if I'm interpreting the graphs on the preliminary report correctly I wonder about mechanical failure of some sort.

This gets a bit more complex with the 737 because moving the yoke WILL actually stop one of the stabilizer trim algorithms, but not MCAS.


> Pulling on the yoke is not how you counteract a runaway stabilizer on the 737. I've pasted the relevant part of the QRH in a few previous comments.

It is how pilots learn to counteract nose down day one of pilot training. In many aircraft hard elevation will overpower even a faulty horizontal stabilizer. If the QRH was a panacea we would have 348 fewer loses today.

> That may be what happened here, but if I'm interpreting the graphs on the preliminary report correctly I wonder about mechanical failure of some sort.

There was a mechanical failure, the AoA sensor. I'm skeptical there needs to be more going on than MCAS due to the "repeated correction" unauthorized change Boeing made.

> “The FAA believed the airplane was designed to the 0.6 limit, and that’s what the foreign regulatory authorities thought, too,” said an FAA engineer. “It makes a difference in your assessment of the hazard involved.”


In many aircraft hard elevation will overpower even a faulty horizontal stabilizer

In the 737 you can get into situations where the elevator has insufficient authority to overcome a stabilizer. Excessive pitch up (leading to a potential stall) that you can't counter by pushing on the yoke is exactly what MCAS is designed to prevent.

There was a mechanical failure, the AoA sensor.

A fixed offset from reality is an interesting failure mode, especially in two separate sensors (Lion Air replaced the alpha vane before flight 610), and even more interesting as it's the same alpha vane used in the 737 NG. The left alpha vane was being interpreted as almost exactly twenty degrees higher than the right.


The left alpha vane was being interpreted as almost exactly twenty degrees higher than the right.

Is that because the plane was in a banking maneuver at the time maybe? I dont know anything about planes but I heard that when you're turning the two sensors will disagree by some amount


I think it's been disclosed that in Lion Air the sensors were twenty degrees apart even when sitting on the runway before the flight. It is shocking that nothing checked for disagreement or communicated it to the pilots.


Is that because the plane was in a banking maneuver at the time maybe? I dont know anything about planes but I heard that when you're turning the two sensors will disagree by some amount

The difference in angle of attack was consistent throughout the entire flight (well up until the crash where the values began to converge). The threshold for the optional 'angle-of-attack disagree' warning is, I think, ten degrees. It seems very unlikely that the plane had a twenty degree bank angle for two entire flights.


I don’t think it’s clear if it is a single point of failure or if a faulty sensor is just one precondition.


The trim switches on the control column can override it.


They will override it for 10 seconds, then MCAS gets back up like a zombie and tries to kill you again. If you want it dead you need to flip two cutout switches behind the throttle quadrant.


Runaway trim is supposed to be part of the "memory checklist" for pilots. The symptoms of MCAS are the same as runaway trim and the fix is the same (which is why Boeing didn't feel like extra pilot training was needed), so I'm curious to see the most recent investigation and hear the black box voice recorder. Did they not know they were dealing with runaway trim? Did they think it was something else? Did they forget the memory checklist? Was there not enough height to deal with runaway trim regardless? Were the symptoms different than runaway trim, confusing the pilots?

The black boxes will be very illuminating on this respect, especially since we never recovered the Lion Air black box voice recorder.


True... Now I'm wondering has the third pilot been able to help because he was trained better, didn't forget the memory items, or was just in that position. He was likely sitting far back, he wasn't wrestling the plane (I'm not sure how composed is the cabin during those maneuvers). He might been in a better position to see whatever was happening.


This Seattle Times article is quite informative:

https://www.seattletimes.com/business/boeing-aerospace/faile...

It seems the MCAS had a flawed design where it would move the trim further than expected and would also trigger multiple times. Based on my understanding, the MCAS would have been quite safe if it didn't have these flaws (pilots deal with a failed AoA sensor as another case of "runaway trim").


> [MCAS] it’s not stopped by the pilot pulling the yoke, which for normal trim from the autopilot or runaway manual trim triggers trim hold sensors

This implies that 'normal' runaway trim can be stopped by pulling the control yoke. Maybe pilots simply have no idea what is going on once they realise that action has no effect with the MCAS?


This implies that 'normal' runaway trim can be stopped by pulling the control yoke.

Pulling on the yoke will only stop the 737's computers from moving the stabilizer IF it's the speed trim system (STS) that's moving the stabilizer. Otherwise the yoke is intended to adjust the elevator not the stabilizer


> “After this horrific Lion Air accident, you’d think that everyone flying this airplane would know that’s how you turn this off,” said Steve Wallace, the former director of the U.S. Federal Aviation Administration’s accident investigation branch.

I suspect the other factors will answered when this question is answered too.


A quick search indicates the CVR for Lion Air 610 was recovered. Did I miss something?


You're right, looks like it was recovered earlier this year, and they haven't released the transcript yet.

https://www.reuters.com/article/us-indonesia-crash/no-public...


You are correct. And if two crews in 5 months had the same issue either identifying or dealing with the same problem, then perhaps there is an design or training problem error here.


No need to choose, it seems like design and training can share the spot light, with their third friend sensor failure.


The symptoms of MCAS are the same as runaway trim

No, they're not.


So, how are they not?


So, how are they not?

When you hit the trim buttons on the yoke MCAS stops for five seconds. What Boeing considers runaway trim would not stop when you counter with the trim buttons.


Yea the interval is actually 20 seconds which is a really long time to a pilot thinking he is in an emergency. If you look at the period of the ups and downs of the Ethiopian flight they roughly correspond to that 20-second interval. Just when he thinks he's fixed it, it strikes again and again, taking a more aggressive nose down attitude each time.


So, if the trim were running away based on an intermittent fault, the same situation could be encountered because the pilots don't need to execute the runaway trim checklist?


It feels different from their runaway stabilizer training. MCAS has a delay between each motion, whereas a runaway stabilizer is much more aggressive.


There's a thread on twitter with a pretty good analysis of what's happening with 737MAX. The 'Swiss Cheese' model here starts from its redesign by Boeing.

https://twitter.com/trevorsumner/status/1106934362531155974


From all I read; this including the optional disagree indicator, I still say Boeing is should be held responsible for this: all point to economic reasons which means they decided these things and fully knew the potential consequences.


The saddest thing, as many HN users should know all to well, is that there can be no excuse for automated systems like airliners to experience catastrophic failure and loss of life, if only due to the availability and application of modern SDLC principles and CI/CD etc.

Smoke testing could have been performed such that all possible combinations of transducer input could be considered and evaluated thoroughly for closed-loop effect at runtime.

These types of integration tests should have been performed repeatedly, seemingly endlessly in the quest for bugs and analysis of the full spectrum of runtime results and effects.

In my experience in the software industry, I've always done this for applications that have infinitely more trivial effect and results than an airliner at altitude containing hundreds of souls.

One potential counterpart to the seemingly infinite greed we see exponentially increasing could be the old adage that karma is a bitch.


Speaking as someone who's done this (though not on a something as big as an airliner!)

Yes, you can test control loops -- you can even turn it into a unit test. At least in theory.

The problem is that to do the test you need either a working, physical system or a good model. So if you're making a shutdown valve for a chemical plant, you need a physical build of that control valve. Even on that scale, you're talking about something that could potentially fill an engineering lab, be quite noisy and have a considerable amount of stored pneumatic or hydraulic energy. It's possible, but not exactly practical.

The alternative is to model the system, but now the question changes: how can you be certain that your model is accurate and models all the variables? Say your valve is slower when it's cold and you don't model that -- now you have a false positive result ("it works" -- but nobody realised that "temperature" was a dependent variable).

So you take the middle ground - you can have the test jig for a week, so you record the inputs and outputs for a week under varying software conditions. But those recordings are only valid for that specific timing -- if you change the software and change the timing (maybe you move the trim motor slower), you get a model change and a false positive or negative.

It's certainly possible, but it's only possible with a good sized team, and supportive management who realise that the test is absolutely necessary.


> It's certainly possible, but it's only possible with a good sized team, and supportive management who realise that the test is absolutely necessary.

Agreed. According to other sources [1], management rushed the development work so they could come out ahead of one of their competitors.

"But several FAA technical experts said in interviews that as certification proceeded, managers prodded them to speed the process. Development of the MAX was lagging nine months behind the rival Airbus A320neo. Time was of the essence for Boeing."

[1]: https://www.seattletimes.com/business/boeing-aerospace/faile...


I agree wholeheartedly and routinely run concurrent intensive smoke tests on real-world HW as well as smoke tests on finely-modeled virtualized environments.

Even with the best modeling and virtualization, a true and thorough, 100% 1:1 approximation with the real world at runtime can likely never be attained for a myriad of reasons.

However, when lives are on the line, this gap must be closed in some manner so as to provide a greater degree of confidence.

Even the most thirsty organizations with lesser consequences for their failures are usually conservative enough and risk-averse enough to know better than to release without thorough (and relatively inexpensive) testing.

My old boss used to tell stories about back in the mainframe days whereby he would send customers fancy, branded and shrink-wrapped finished-product but containing blank tapes for the latest release in order to buy a couple of weeks of extra dev time if he thought the software wasn't ready to escape.


An extra set of eyes saved them, it's a shame that there wasn't any way for them to include a reminder in the next crews flight plans.


It is pretty shocking this wasn't noted as a serious incident needing investigation before more flights were undertaken.


There's a really good "The Daily" podcast (~20 minutes) about these crashes that answers a lot of the questions on this page.

https://www.nytimes.com/2019/03/19/podcasts/the-daily/boeing...


A machine that relies on sensors has two ways to detect when a sensor has failed: another sensor, or human observer input.

I don't know how avionics hardware engineers do it, but in this neighborhood of the Internet, we don't trust inputs, and especially human user inputs. Because every unverified, unsanitized input is an attack vector for bringing down our software and the system it runs on.

From what I have seen, the MCAS in the crashed planes relies on a single sensor--the AOA vane in the nose--and was almost solely responsible for catastrophic loss of altitude. This model of passenger jet has a paid upgrade option to add a second sensor, with disagreement detection.

My question is why don't the yoke inputs from the pilots count as disagreement with the AOA sensor? If the yoke is consistently counteracting the action of the MCAS, why can't it disable itself automatically and illuminate a light to indicate it has failed?

I'm guessing the pilots would have more time to search through manuals in-flight to clear the fault and re-enable the system than they would trying to disable it while it's stubbornly trying to crash the plane due to a single point of failure.

It's not hard to adopt the defensive mindset that your users (or your professional testers) are maliciously trying to destroy your beautiful program with a combination of stupidity and cleverly designed unanticipated inputs. When hardware gets involved, one can personify Entropy as a being that is trying to destroy everything you love and kill you.

How would Entropy take down a plane and kill all passengers? How about it freezes the AOA sensor in the "nose is at +90 degrees pitch" position? How do we defend against that attack vector? Pilot training? Oops! Entropy also made them forget that page out of thousands of possible pages of procedural training during the critical seconds they needed to remember it. The only way to fight Entropy is by making random events more independent, rather than causally linked in a failure cascade.

I don't think this course towards blaming Boeing's lack of documentation and/or pilot training is helpful. I don't think there's any option for Boeing but to immediately recall and retrofit all aircraft to the multiple AOA-sensor option, at their expense, and refund every airline that actually paid extra for it.


I am almost sure that some engineer of Boeing has noticed that there was a major design flaw with the function of the MCAS, but that he was overruled by a less technical (and probably younger) superior.


more likely the flaw was noted but it would be more expensive to redesign than to eat the cost of whatever lawsuit they'd be hit with


> A malfunctioning sensor is believed to have tricked the Lion Air plane’s computers into thinking it needed to automatically bring the nose down to avoid a stall.

That is ridiculous logic to implement in a "safety" system. An automated system should never cause a plane to dive unless it also knows that it has enough altitude to safely do so - much less in a way that makes it difficult for pilots to override.


Given the system assumed it was in stall, which means loss of altitude anyway, surely it's safer in general to go nose-down to avoid the stall? At least then you have a chance of recovery, which you don't in a stall. (Except of course, going nose-down.)


If you don't have enough altitude to afford going nose-down, let the pilot handle it. If your system can't come up with a safe solution, do not override the pilot's controls in favor of a dangerous solution.

A computer should never assume it knows better than a pilot. A computer is only as good as the data it gets and the software it runs. Sensors fail. Data gets corrupted. In the current state of the industry, software bugs are inevitable.

Airplane software is supposed to help pilots, not hinder them. In light of these events, I'm thinking twice about wanting a self-driving car in the near future.


Elsewhere I've read MCAS does take altitude into account, as well as flaps, i.e. it's only active above a certain altitude, and only when flaps are retracted. So... yeah, we don't have the full story. And also in another thread, it's reported from the flight prior to Lion Air 610 (same plane) there were airspeed and altitude disagreements. I'm not at all clear from available reporting whether airspeed, altitude, and angle of attack were inconsistent, if that was a source of either autopilot confusion, and then pilot confusion, whether pilots did set stabilizer trim to cutoff and when and whether it was too late.

I'm a pilot (former CFII) and the whole automation fail danger strikes me as terrible. John Q Public says "I want the automation to override the pilot's mistakes!" What? OK fine. What about Asiana Airlines Flight 214 where the pilot advanced throttles, an explicit intent input, and yet autothrottles were set so the automation said nope. And then John Q Public are all, well the pilot should have KNOWN!

It's like it's a game where the pilot is only there as the last resort to be blamed if they too fail, even after a sequence of automation failures. Automation betraying pilots at low altitude is in my view functionally equivalent to an in-flight breakup. And automation in the cockpit mentality in the face of failures has been, for 20+years, "add another button, add another feature, add another routine" to tack on all the others.

And yes this absolutely makes me think of autonomous driving as total b.s. Airplanes are in a standardized system, with far bigger budgets for automation and yet we still have to fall back to human pilots for routine procedures like parking, taxiing, VFR approaches and landings, and telling the automation literally every detail it needs to do, ATC communication. It's ripe for end to end automation and yet we still don't do that. Driving cars is wildly more complicated for automation: non-standard streets, paint, signage, laws, pedestrian behavior, bicycles, cars still driven by humans, weather - haha. Sounds nice, great idea, keep trying, but it's complete bullshit.


Jump pilot would have had a natural line of sight too the trim wheel, and may have seen it move "unscheduled" at the same time as the nose down. This might have given him a unique suspicion of auto trim.

I expect this will be included in the accident report. Hopefully NTSB will conduct their own first hand interview with this pilot. (I can't think of why they wouldn't.)


Is it possible for Boeing engineers to lose their professional status as a result of this situation?


The fault doesn't lie with the engineers who built the system...not to mention I would be very surprised if they were professionally certified.

It lies with the managers who wrote the specification that said that for business reasons the new plane must not require any additional training or type certifications, and cut costs by implementing the required systems with a non-redundant sensor.


I think this is one of those situations where you may not be able to assign fault to any one set of people. Remember that everyone here has the clear advantage of knowing what went wrong and how, the people who designed this may not have forseen any such situation. Also one of these systems, even something as "simple" as MCAS would have involved dozens if not hundreds of engineers in all the design decisions that lead to this issue, including many people who have already retired or done so long ago (remember, the 737 was originally designed in the 60's).

The desire to assign fault to one individual and punish them is a very emotional response to this situation, it means people will not accept that it was an unforseen or systemic issue (people familiar with air crashes have seen in the past some systemic issues, where everyone did everything how they were supposed to but things still went south) without some individuals to blame. Typically crash investigations try to ferret out all points of failure, you might read "pilot error" in the news, but rarely is that the only cause in the crash report.

For example, this crash: https://en.wikipedia.org/wiki/American_Airlines_Flight_965

It identifies some training issues (automation dependence, speedbrakes still being applied while executing terrain avoidance maneuver) but also identifies issues with the FMS in how it manages waypoints and how they are named.

So, for the current situation there are obviously many aspects to be addressed:

1. MCAS software appears to do more than specified, this is apparently what the software update (delayed by the government shutdown apparently) is to fix.

2. Pilots need to be trained or retained on stabilizer trim runaway.

3. The 3 AoA sensor option might need to be mandatory on the 737 MAX.

4. The FAA may need to review their effective supervision of both Boeing and the air carriers.


My understanding (from other recent articles and discussions like this one) is that Boeing is self-certifying (thanks FAA!), and because of this, they have at least one engineer, probably a few, who are on staff and who do their certification and are professionally certified themselves to do this. These engineers would therefore be personally liable for this plane's problems, because they signed off on it.


that sounds like the engineers are at fault for not refusing to stamp the thing.


The ones at fault at the people at the FAA who let Boeing certify their own planes.


Both are at fault. The existence of the FAA does not absolve Boeing, their executives or their engineers, of blame.


Everyone involved can share the fault.

Unfortunately, correctly apportioning all of the blame won’t bring back the souls lost.


No, but the whole point in assigning blame and severely punishing people found to be at-fault is to prevent things from happening again.

If we just let these people off scot-free, then you can count on more similar things happening.


Wouldn't a better point of view be: shouldn't we reëvaluate the items that allow self-certification and fund the FAA properly so that they can certify medium/large change themselves instead of letting the manufacturer?

Self-certification has a place, but it should always be accompanied by random checks and shouldn't be for anything large, critical, or first time through.

Edit: Better in that it helps solve what is largely a political and not an engineering problem.


Are you outside the US? Engineers here do not typically have professional status.


Same generally applies in the UK.

I can only think of one CEng I know, and he works in civil engineering. I've never met a Chartered/Certified software or electronics engineer (that I know of).


[flagged]


I agree that the pilots likely shoulder some of the blame (and in an NTSB-investigated crash, I'd expect their failure to follow the non-normal checklist memory items to be the primary cause), it's not enough to say that this was simple pilot error and poor maintenance.

Boeing's going to wear some of the blame here, as is proper, IMO.


You seem very certain of the cause(s) of the crash. Care to share the source of your knowledge?


[flagged]


Partly my point. Plenty if internet experts to go around.


From the article:

"There have been no reports of maintenance issues with the Ethiopian Airlines plane before its crash."


Suprised they don't have a lessons learned portal. Would have saved some lives.


That's the sort of thing the NASA report system is for, though IDK the latencies involved in the system: https://en.wikipedia.org/wiki/Aviation_Safety_Reporting_Syst...

> The Aviation Safety Reporting System, or ASRS, is the US Federal Aviation Administration's (FAA) voluntary confidential reporting system that allows pilots and other aircraft crew members to confidentially report near misses and close calls in the interest of improving air safety.


Can you imagine if the public saw how often planes were close to disaster?


Don't confuse apathy with unavailability. The public (and it sounds like you as well) just doesn't look. Here's NASA's safety database: https://asrs.arc.nasa.gov/


They already can: http://avherald.com/


Only as useful as people actually reading from such a portal.


It's one thing to memorize things, it's another thing to be able to use that knowledge in the right context and situation, particularly when under panic.


What are the odds this is completely fabricated by Boeing? Not saying I think this, but if it was a movie and this was a cover-up, this would be a great plot twist. I suppose I'm maybe just a little jaded from all the fake news these days.


There's a difference between jaded and cynical :) Not that I know the answer to your question any better. We'll have to wait for the HBO documentary.

To play devil's advocate anyway, as someone who has not been following this actively, I find this article to cement the idea in this reader's mind that a Boeing malfunction is involved in all three incidents. Is this even conclusively established? Would Boeing want this spin at this point?

The suggestion that an extra brain might randomly have averted two multi-fatal crashes and that this error mode has occurred at least three times seems like it would be a bit pyrrhic for the PR people at this juncture, no?


The MIC is too powerful and influential over both parties.

While I can sometimes like trump for not submitting to anybody - even he bows to MIC


From my point-of-view, two opinion points:

1. I am glad that 737 MAX has been grounded. May it stay that way, globally, until this issue is provably resolved.

2. The entire Boeing chain of management that resulted in these crashes should be publicly flogged, their remuneration & benefits clawed back & subject to a mandatory minimum prison sentence.

Who the hell am I kidding! Neither is very likely to happen in the present day US. Carry on then, I guess. Just make sure to sign your Last Will & Testament before taking that next flight.


What happens if your #2 is applied to doctors, car/ship manufacturers, food producers, grocery stores, house builders, taxis, restaurants, software engineers, medical device producers and so on? Every profession caused accidental deaths.

"Legal action" against bad decisions is a must. However, mandatory prison sentence for accidents is a terrible idea.


If Boeing knowingly exposed the passengers to the risk of injury it's criminal negligence and usually the punishment is imprisonment: https://en.wikipedia.org/wiki/Criminal_negligence


Exactly my point. Imprisonment should come into play when the accidents are proven to be caused by Boeing's negligence.


Is it negligence or just a bad design? Who decides? The thing is starting to look like Boeing thought that MCAS failure was similar to and corrected by the same procedure as runaway trim. Time will tell if that is the case, but if it does, should the pilots be posthumously tried for negligence?


Especially in light of the current size of the US's prison population. We should be very careful in general about advocating for more prison sentences. It's an easy thing to do, but the societal outcome is a lot more complicated.


Oh please. If there's one demographic we don't have much of in our prison system, it's upper-class corporate executives. We could stand to let some non-violent drug offenders out early to make room for them.


One thing that bothers me about this generation is this thirst for infinite punishment.

People hunger for someone to blame, rattling off a long list of maladies that should befall that person, until they have been thoroughly satisfied, but they are never satisfied. They always feel there should be someone else, something more, something deserved.

The truth is, there is no point to such a punishment here. It is unlikely that any individual plotted to kill people by pushing some faulty code out of malice. These were people simply doing their best and they failed.


While I agree that thirst for punishment is counter-productive, I'm not sure that people did their best, or rather that the criteria of the "best" were right.

I remember that the aircraft in question was tweaked beyond stability in order to reuse the existing type certificate. This procedure need scrutiny, likely both on Boeing's and FAA sides.


The news that I'm hearing now is that Boeing has been working on a software fix for this problem since at least January.

Where were the glaring safety warnings to the airlines, their customers?


The thing about a software fix is that you never know when the solution is near. It could be fixed next week, or it may require an entire rewrite of critical systems. You just don't know until it's fully diagnosed. So why sound the alarm when plenty of flights have gone without problems and a software fix might be around the corner, especially if you have all your best men working on the problem?


Even if the US doesn't choose to do much (though I find it embarrassing the FAA was one of the last regulatory bodies to respond), Boeing will face a reckoning globally from other regulatory agencies.

Stock is down 15% since March 1. Hard to know what an executives there are thinking, but I hope some folks in the organization genuinely feels some sort of empathy for the families of the deceased on these flights.


> though I find it embarrassing the FAA was one of the last regulatory bodies to respond

The top 3 officials at FAA are unfilled, with seat-warmers there in an "acting" capacity. I wonder if that's related. https://www.faa.gov/about/key_officials/


Honestly, those top positions in almost any organization are often political appointments that have little to do with day to day operations. The current "actings" are generally the ones who "advise" the political appointees on how to handle things. Obviously there are some exceptions, but most bureaucracies tend to run that way.

Not to comment specifically on this as FAA isn't my area, but if the secdef doesn't come to work tomorrow the undersecretary is going to take the same actions he would have. I'd imagine most of those orgs trend that way.


The value of having confirmed appointees in those positions is not necessarily their native technical expertise. As you’ve noted, that expertise can be provided by career employees. The value of political appointees is the political clout they carry. Given that they’ve been appointed directly by the President and confirmed by the Senate, it is much harder (or politically fraught) to simply threaten or replace them when they take an unpopular stand like “let’s ground an airplane.”

The US government is a complex system, and like most complex systems it works best when you, a non-expert, don’t randomly yank out pieces and declare them unnecessary.


Well, that's not what I said. I've not claimed they're unnecessary, just that they have a slightly different and perhaps less important role than the post I replied to was giving them. The undersecretaries can make the same decisions, and in fact the career personnel in the organization have far more ability to take action without fear of replacement by anyone as they have more protection than a secretary who serves at the pleasure of the appointing authority.

That said, generally the undersecretaries are appointed and confirmed as well, as their role is to step in and backfill if the primary is not available, so that answers that issue.


I can’t find any evidence that the current acting FAA administrator was Senate confirmed. Is the Internet just being unreliable here?

I think your notion that career personnel have as much DC political clout as unconfirmed career officials is one of those things that sounds good if one is trying to win a debate, but is unlikely to represent the actual facts on the ground.


Just a quick google search turns up this article that says he has been.

https://www.americanshipper.com/news/?autonumber=67974&sourc...

Political clout in DC isn't what runs organizations and gets the day to day business done. It's what plays well on the hill in pointless back and forth BS sessions that a totally ineffective congress likes to have, it probably helps to some extent in budgeting discussions, but it's more posturing than anything else in a lot of ways.

Anywhere you go in the military you'll find all the GOs who are in charge, and powerful, and senate confirmed, and all that good stuff, and they've got a Chief of Staff and an aide who actually run everything, and can continue to do so if their GO walks in front of a bus. It's no different in any huge bureaucracy -- sure, CEOs make decisions, but the day to day business doesn't stop if they don't answer the phone for a while. I would actually posit that if it did the whole organization is dysfunctional to the point of ineffectivity. But I'm being redundant in describing DC that way perhaps.


Amusing example, considering the Secretary of Defense has been vacant since January.


Well, Shanahan is acting and will be confirmed when the senate gets around to it... that's kind of the point though. DOD is still running and will continue to run, exactly as it has, with only minor political differences from the top. They have a lot of theoretical power, but little ability to actually change day to day operations of anything.


The purpose of confirmed political appointees is to create a layer of empowered leaders who can do more than simply steer the ship in a straight line, or react slavishly to orders from above. The confirmation process serves two purposes: (1) it ensures that relatively independent thinkers with high political capital are in those spots, and they see their allegiance to the entire system and not just one man, (2) it provides a safety valve (via resignation) when the confirmed appointee does not agree with orders from above. The danger of the DoD is that it’s an agency that can give the appearance of running itself when there’s no crisis, but may need expert leadership when there is one. Since the entire purpose of the DoD is to manage crises, lack of high-level leadership is a serious concern. Ditto the FAA.


As I posted in response to your other comment, you're ignoring the fact that the undersecretaries who backfill those positions in the absence of the primary are also appointed and confirmed, and the career personnel have less incentive to bow to political pressure.

DOD has far more "expert leadership" in the form of the FO/GO community than the other gov agencies as well -- the secretaries exist to implement policy, not run day to day operations.


Oh, sure. I just enjoyed the combination of the example being a bit off, because it sounded like a hypothetical, but also a perfect real-world demonstration of the idea.


I'd fly on a 737 MAX tomorrow. But I might ask the crew during boarding if they're familiar with the stab trim cutoff switches.


I've flown on a 737 MAX a couple times this year. Smooth, comfortable, quiet flight. Although the failure of the MCAS system has been catastrophic, fortunately for Boeing the fix doesn't seem difficult... make an extra AoA vane or two mandatory and add a warning if they disagree, and require MAX pilots to sim train an MCAS failure.

The planes seem eminently airworthy, so far it appears they weren't brought down by anything that's terribly difficult to engineer out of. Unfortunately for Boeing and the FAA, nothing is more costly than an accident, it will take years to earn back the public's trust. Even if it's found the pilots were downright negligent in their handling of the MCAS failure, that won't make the general flying public feel any better about it, and it won't bring back the dead, may they rest in peace.


There's already two AoA vanes mandatory on every plane (even non-MAX planes). There is a warning if they disagree in the optional package that the North American airlines bought, but not any other airlines. The warning is not much of a warning, just a disagree light. It would have been impossible for the Lion Air pilots to benefit from this disagree light, because they didn't know that it was hooked up to a control surface and neither did any other pilots, it seems.

> The planes seem eminently airworthy

An uncontrolled nosedive caused by a single sensor failure is not in anyone's definition of airworthy. It must be fixed. This lack of airworthiness was not the pilots' fault.


> There's already two AoA vanes mandatory on every plane

And all Airbus types have three or four. Boeings only have two, even on the 787 ( on which one is vulnerable to damage from jetbridges ).

It is disappointing that a manufacturer would cut corners on sensors for a $100 million aircraft.


Airbus needs more because there's no mechanical backup to the flight control computers.


I'd bet more than a few of them are pretty familiar now


>1. I am glad that 737 MAX has been grounded. May it stay that way, globally, until this issue is provably resolved.

No, it should stay grounded permanently. Who wants to risk their lives in one of these things now, with the reputation that Boeing has now earned? The airlines should be able to return these things to Boeing and get their money back. If that means Boeing goes under, then so be it.


IMO, I don't want MAX to be resurrected. If the design is flawed, let it be and shut it down. But this ain't gonna happen because how expensive aircrafts are.

Oh well.


The headline is really confusing; I thought I was perhaps reading it wrong, but it's sort of impossible to read correctly until you realize they are talking about a separate incident than the well-known crash.

It would have been clearer if they had written something simpler, like "Lion Air 737 Nearly Crashed One Day Before Deadly Accident"


We added "on the" to try to make that clearer above.


Agreed. I ended up reading a second article on the topic to ensure I was understanding it correctly (the prior day's flight involving the same identical aircraft, not a different one of the same model).


I still can't tell if it says whether or was actually the same individual aircraft.


Same aircraft


I read it 5 times and I am glad I wasn't the one. I thought it was 737 days before.


> [T]hey got help from an unexpected source: an off-duty pilot who happened to be riding in the cockpit. That extra pilot, who was seated in the cockpit jumpseat [...]

Am I missing something here? Isn't it normal for off-duty pilots to ride in the jump-seat?


Riding dead head as a pilot is normal. They do it all the time to get back home or wherever their next flight is from.

Being in the right place, and happening to know exactly how to deal with what would otherwise kill the pilot and all the passengers, is incredibly fortunate.


I suppose similar in terms of the right place a the right time would be the QF32 incident[0], where by chance there were two additional pilots in the cockpit; a check captain, and a supervising check captain who was training that check captain.

[0]: https://en.wikipedia.org/wiki/Qantas_Flight_32


More to the point, this reflects positively on Lion Air because it means that the pilots in the forward facing seats properly took advice into account. This is good crew resource management.

Check out the Asiana crash at SFO for an example of where CRM failed hard. The pilot flying in that case was told by the other two pilots in the cockpit how to avoid the crash and he STILL flew a perfectly serviceable plane into the ground in perfect weather on an extremely easy visual approach.


> More to the point, this reflects positively on Lion Air because it means that the pilots in the forward facing seats properly took advice into account. This is good crew resource management.

It's good the pilots took advice -- but they should not have needed that advice because they should have been trained to operate the system properly.

The whole thing reflects very poorly overall:

- The pilots were not trained properly on the new system, and a third pilot who happened to be hitching a ride had to tell them how to operate the plane.

- The plane was allowed to fly again the next day despite the malfunction, none of the pilots flying it were properly trained, and everybody died.

- The incident was not reported properly at the time, nor was it reported after the same plane had crashed the next day.


The pilots were not trained properly on the new system

It's hard to train for something that doesn't exist according to Boeing

The plane was allowed to fly again the next day despite the malfunction

The plane was serviced between flights, and assuming that Lion Air actually did what they claimed, Lion Air maintenance followed the Boeing instructions by the book.

The incident was not reported properly at the time

The malfunction was written up in the maintenance log and the issues were addressed. Unfortunately because Boeing refused to disclose the existence of MCAS the pilots wrote it up both as what they saw (EFS non-op, IAS disagree), and if memory serves, they also suspected the one algorithm they knew about (STS)[1].

1: https://www.pprune.org/showthread.php?p=10295557


> It's hard to train for something that doesn't exist according to Boeing

From the article:

> The so-called dead-head pilot on the earlier flight from Bali to Jakarta told the crew to cut power to the motor driving the nose down, according to the people familiar, part of a checklist that all pilots are required to memorize.

Clearly only one of the pilots memorized the checklist, but they were all required to do so. Presumably the pilots who crashed the plane the next day didn't memorize it either.


Clearly only one of the pilots memorized the checklist

I've already posted the checklist. If you've memorized the checklist you'd know that it says to stop if the trimming stops once you hit the push buttons on the yoke. MCAS stops trimming when you manually input opposite trim (and you can see this on the black box graphs from the preliminary report).


Possibly a better example, with a DC-10 instructor on the plane as a passenger being pressed into service:

https://en.wikipedia.org/wiki/United_Airlines_Flight_232


Sure. I don't think they're implying otherwise. The lucky coincidence is that he happened to be aware of the issue and the fix.


Yes. If the seat is unoccupied, airlines will usually let off duty pilots use it to get between different airports. If I'm remembering correctly, it's done for pilots from other airlines as well.


Well, you're missing the point of the story which has nothing to do with the frequency of off-duty pilots hitching a ride.


> Isn't it normal for off-duty pilots to ride in the jump-seat?

Absolutely not, I know several airline pilots and they deadhead in economy seats.


If it's a full flight. Otherwise, they'd be given a seat with passengers.


Here again. Downvotes for someone asking a bona fide question. Place has changed. It's a shame.

Downvote away...


Would you please not break the site guidelines like this? Assuming you're right, making HN worse still is no way to go.

https://news.ycombinator.com/newsguidelines.html


Ironically, I almost never got saw your above request because...your comment above was collapsed from the downvotes, even as I perused my own "threads". For some masochistic reason, I clicked the "+2" link from a comment almost two weeks ago, and ... there was your comment and request. It's fairly lucky I saw it at all.

So yes, I will try not to break the site guidelines again (I agree this is a clear case of breaking site guidelines). It just makes me sad how HN used to be a place where people could ask genuine questions and not be downvoted to invisibility, for whatever reason.


[flagged]


>In fact, I thought these crash recovery systems had to be written by 3 different teams in 3 different languages or architectures, and only activate when 2 of the 3 signal true. Or there's some manufacturer that does avionic software like this, I forget who.

Redundant systems with majority voting are common, and this can include multiple implementations of the same spec, but I'm not aware of any manufacturer that intentionally uses three different languages or architectures.


I don't know about 3 languages, but https://www.fastcompany.com/28121/they-write-right-stuff implies that there is software written by at least two different groups running on the shuttle.

I've also heard anecdotally that some train signalling and on-board control systems run software simultaneously in linuc and freebsd to oessen the chance of simultaneous memory faults. (I didn't get a chance to ask for more details or clarification.)


Do we know if the issue is bad software or bad specs? Even formal methods won't save you from a bad spec.


Yes, and I mentioned this in my second paragraph. And no, we the public know very little about this point about the current alleged MCAS problem, but we know that many problems with safety critical software comes from implementations, not specs.


I don't think we've seen anything to indicate that the problem is not a bad spec.


We haven't seen anything either way. Even the allegation that it's the MCAS system is totally unconfirmed to the public.


> The Indonesia safety committee report said the plane had had multiple failures on previous flights and hadn’t been properly repaired.

Lots of blame for Boeing, but the real criminals are Lion Air who apparently don’t know how to maintain airplanes. Compare their safety record with Southwest Airlines. Lion Air shouldn’t be allowed to fly.


It also seems like this was not properly reported to safety agencies at the time, nor was it reported when that plane crashed in a subsequent flight.

Say what you will about Boeing, but this could have been avoided if the airline had better safety practices. Every pilot on that plane should have been trained on the new system. And the malfunction, if reported properly, should have caused that particular plane to be grounded for a mechanical inspection. Instead it flew again and hundreds of people died because they weren’t lucky enough to have one of the pilots who was trained properly.


Every pilot on that plane should have been trained on the new system.

How is that supposed to work when Boeing didn't inform any of the airlines of this system?


Boeing did inform them. Otherwise how would the third pilot have known what to do when he saved the plane?


Various pilot unions aren't mincing words. Here's one quote:

"This is the first description you, as 737 pilots, have seen,” the message from the pilots association at American reads. “It is not in the American Airlines 737 Flight Manual … nor is there a description in the Boeing FCOM (Flight Crew Operations Manual). It will be soon.”

It doesn't seem unreasonable to me that the jumpseat rider made an educated guess based on some prior experience. Aided by the fact that he didn't have other things to do besides observing.


It doesn't seem unreasonable to me that the jumpseat rider made an educated guess based on some prior experience. Aided by the fact that he didn't have other things to do besides observing.

Or he went back into the cabin and pulled out a copy of the FCOM to do some emergency diagnostic work.

https://www.grid.id/read/04966850/deretan-kejanggalan-yang-d...


The "runaway stabilizer trim" section. Which didn't mention MCAS at all at the time. So, if you guess correctly, you can find the right procedure.


"New plane, but flies the same as old plane! No retraining necessary!"

2 pilots crash the plane...

"Why didn't you learn how to fly with the new system?!?!"


[flagged]


Please don't break the site guidelines regardless of how wrong other commenters or voters may be. That only makes this place worse.

https://news.ycombinator.com/newsguidelines.html


As opposed to posting alternative facts and shallow dismissals? HN brings out a lot of knowledgeable folks, but on these 737 related posts the difference between HN and aviation centered forums is pretty stark.


Sure, it's not surprising that a forum of specialists would do better. If you know more, it's great if you share what you know so people can learn. Please just follow the site guidelines also.


> The deadheading pilot recognized a problem with something controlling the stabilizer and went off script.

It's not like he was pushing random buttons. He knew what the system was, he knew that it was likely to be the cause of the problem, and he knew how to disable it. He must have been informed about the system.


Yay he solved the puzzle. The reason that pilots train extensively is so they don't have to solve complex puzzles with terminal penalties while in the air. Most airline passengers prefer that.


He knew what the system was

How would he know something that was undisclosed? It's not conjecture here, no airlines and no pilots knew of the presence of MCAS before the Lion Air 610 crash.


The article says that all of the pilots were supposed to know what to do in this situation:

> The so-called dead-head pilot on the earlier flight from Bali to Jakarta told the crew to cut power to the motor driving the nose down, according to the people familiar, part of a checklist that all pilots are required to memorize.

This guy didn't go off script. It seems like he was the only one of the three who knew the script.


This guy didn't go off script. It seems like he was the only one of the three who knew the script.

The script says to stop if the trim adjustments stop when you counter with manual trim input. MCAS, by design, stops. You can see where MCAS stops trimming nose down on the preliminary Indonesian report (they graph both the fatal flight and the previous flight).


> and went off script

From the article:

"told the crew to cut power to the motor driving the nose down, according to the people familiar, part of a checklist that all pilots are required to memorize."

Emphasis mine.


Yeah, that's part of the checklist for runaway trim (which I've posted in a previous comment). That same checklist says to stop if the stabilizer stops moving after you counter with trim button input. MCAS stops after you counter with the trim buttons.


Its in the type certificate and the maintenance manuals.


Nope. This has been the subject of a ton of teeth gnashing by the American pilot unions (the head of American Airlines' APA union has been among the most vocal critics of Boeing in this case). In fact if you look at the ANAC's (Brazil) version of the "OPERATIONAL EVALUATION REPORT" and compare it to the rest of the world you'll find only the Brazilian version references MCAS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: