737 Max Explanation by a Software Engineer

JorgeGT · on March 17, 2019

I'm an assistant professor of aerospace engineering and I find this analysis quite spot on, in which this is representative of a much larger issue of economic and regulatory negative incentives, rather than just a "software issue" as some news outlets have reported. What I find downright criminal is this:

> Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light

The fact that the redundancy of a sensor on which a system capable of sudden, large control inputs relies is an optional package to be purchased separately... I simply have no words.

How was this package advertised in the brochure? Pay extra and when the airplane nosedives at high speed, this useful indicator will helpfully warn you it's because AoA reading disagreement?

Animats · on March 17, 2019

I was amazed at that. Boeing used to be known for overdesign for safety. The B-747 had four redundant hydraulic systems. Here's a 787 doing aerobatics at the Farnborough air show to show that it can operate way outside the normal passenger aircraft flight envelope.[1]

Boeing used to be an engineering-first company. HQ was at Boeing's own airport near Seattle. Then they got new management and moved corporate HQ to Chicago.

[1] https://www.youtube.com/watch?v=vzr313wSY_Y

xyzzy123 · on March 17, 2019

I have noticed “safety fade” can happen in projects and organisations.

In particular when you are doing something for the first time, uncertainty creates a space for a safety culture to operate.

Design, process and safety emphasis lead to operational success.

Over time the original designers and engineers leave, new staff join. Field experience shows that most of the safety problems that were anticipated in design never happen. The organisation starts to become complacent.

All the safety features and procedures start to be considered, psychologically, as a “safety margin” or a kind of budget that can now be spent by taking more risks. Triple or quadruple redundancy just starts looking like extra weight.

Within the org, pressures to lower costs and deliver faster start to win out...

mannykannot · on March 17, 2019

You have independently discovered a phenomenon that is known as the normalization of deviance. It has been seen in many engineering failures, such as both Space Shuttle crashes, the Deepwater Horizon explosion, etc.

https://flightsafety.org/asw-article/normalization-of-devian...

walterbell · on March 18, 2019

> Over time the original designers and engineers leave, new staff join. Field experience shows that most of the safety problems that were anticipated in design never happen. The organisation starts to become complacent.

This lifecycle pattern can be found at other organizational scales, including countries, e.g. soldier -> law -> artist -> merchant -> warlord -> soldier ..

knowledgepowers · on March 19, 2019

Where does this soldier -> lawyer etc formulation come from? Interesting idea!

walterbell · on March 19, 2019

Inspired by this John Adams quote from 1780, which covers the first three phases, https://www.masshist.org/digitaladams/archive/doc?id=L178005...

> I must study Politicks and War that my sons may have liberty to study Mathematicks and Philosophy. My sons ought to study Mathematicks and Philosophy, Geography, natural History, Naval Architecture, navigation, Commerce and Agriculture, in order to give their Children a right to study Painting, Poetry, Musick, Architecture, Statuary, Tapestry and Porcelaine.

xxpor · on March 17, 2019

FWIW "Boeing Field" is known as that because it just happens to be named after William Boeing, not because it was owned or operated by Boeing. It's owned by King County. In fact, the real name is King County International Airport.

pyman · on March 18, 2019

[flagged]

epse · on March 18, 2019

That has nothing to do with the comment above or the conversation in general

Erwin · on March 17, 2019

I was at Farnbourough in 2014 and the A380 was even more amazing: https://www.youtube.com/watch?v=Ew_kzgqCD2k

And the military A400M -- 4 engine turbprop -- https://www.youtube.com/watch?v=voK2o62KpqI -- quite amazing to see when you are used to those tiny regional turboprops.

jacquesm · on March 17, 2019

That A380 take off is very impressive. Pity in 10 years time we will probably feel similar to it the way we feel about Concorde today. Thanks for posting.

stephen_g · on March 18, 2019

Not quite - they're still manufacturing A380s until 2021. Those will probably still be flying for at least 15 years. It won't be a distant memory like the Concorde for 25-30 years.

the_duke · on March 17, 2019

Yeah that take off roll is crazy short for such a huge aircraft.

Impressive performance with the plane empty and probably only an hour or two of fuel.

JauntyHatAngle · on March 18, 2019

Ignorant here, why will we feel about A380s the same way as the Concorde? What is happening to it?

obtino · on March 18, 2019

Airbus have scrapped it.

adwww · on March 18, 2019

I was driving past on the motorway and remember nearly crashing in shock at the aggresive maneuvering of something so big - first time seeing either an airliner do a display, or the A380.

tyingq · on March 17, 2019

Similar video with a test pilot pushing a 707 pretty hard. Includes a barrel roll, which isn't hard on the aircraft, but unusual to see with a passenger aircraft.

https://youtu.be/Ra_khhzuFlE

The 707 was pretty similar to the 737...same main fuselage dimensions, similar pax capacity, etc.

BuildTheRobots · on March 19, 2019

Tex Johnston, the test pilot on that flight, did that off his own back and left president of Boeing fuming as it really wasn't part of the plan.

I believe they fired him for it - though then hired him back after 24 hours and the realisation that it was a stunning piece of advertising.

https://www.seattletimes.com/seattle-news/60-years-ago-the-f...

nickgrosvenor · on March 18, 2019

Love the 4k footage.

tyingq · on March 18, 2019

Ah, well, it is from 1955.

scottie_m · on March 17, 2019

All of that changed when McDonnel Douglas merged with Boeing.

Aloha · on March 18, 2019

Why do you pin it there?

CaptainZapp · on March 18, 2019

When Boeing merged with McDonnell Douglas in 1997 essential McDonnell's management moved into Boeing's c-suite.

What essentially was a very engineering driven culture was replaced by finance and marketing geeks that very much moved away from this approach (and moved Boeing's headquarters to Chicago in 2001 and in the process management away from engineering and construction).

Sadly (and arguably) culminating in the two crashes we read about recently.

WalterBright · on March 18, 2019

> quite spot on

> Nowhere in here is there a software problem.

I worked on the design of the 757 stab trim system, am trained in aerospace engineering, and am a programmer. I'm not a pilot. I am not explicitly familiar with the 737 stab trim system.

It is indeed at least partially a software problem. I had thought of two possible improvements: one is to limit the authority of the MCAS's commands to the trim, and the other is to not issue further commands if the pilot fights the trim commands. I read today that Boeing's proposed software fix includes both of those.

JorgeGT · on March 18, 2019

In that sense, I agree with you. I meant to say that this wasn't a software bug or programmer's fault like "oh, someone wrote max_authority = 6º instead of 0.5º in line 8745679", but something bigger: the very design and specs of the system and a certification process that didn't challenge those aspects.

sgustard · on March 18, 2019

I have to agree. If you're saying "the software is working to spec, but the specs are bad," then you're still describing a software problem.

speedplane · on March 18, 2019

> If you're saying "the software is working to spec, but the specs are bad," then you're still describing a software problem.

This is a pretty bold statement. The spec includes many non-software related features.

cashsterling · on March 18, 2019

In aerospace, lack of proper requirements in the flight software would be a system engineering problem... a semantic difference really, but important nonetheless. System engineering and validation has a metric ton of 'tools' to catch bad system design and improper algorithm implementation...

I really question what the hell happened here: - Did they just pencil whip the FMEA on this or what? (failure mode and effects analysis)

- What happened with the Hardware-in-the-loop flight simulation when they tested the scenario where the AoA sensor givesg spurious data, both high and low (but especially high)? I mean... they did test this, right?

m463 · on March 19, 2019

I would imagine having not one, not two, but three sensors would help (it would have redundancy and software could determine which one was malfunctioning)

bitminer · on March 17, 2019

What concerns me is the evidence of regulatory capture. Boeing was telling the FAA what the approval schedule was going to be. And the FAA sucked it up. Instead of saying "our review capacity is x pages per day. You will receive our reviews on such-and-such a date."

(And yes I am oversimplifying.)

bgorman · on March 17, 2019

Boeing has spent more money lobbying the US federal government over the last 20 years than any other company. Last year they had the government kill Bombardier's attempt to break the Airbus/Boeing duopoly. Boeing also prevented the US military from buying an Airbus tanker a few years back. Boeing has definitely captured the US government.

spenczar5 · on March 18, 2019

> Boeing has spent more money lobbying the US federal government over the last 20 years than any other company.

This is simply false. Boeing has spent a lot, but less than General Electric, or the National Association of Realtors, or many (many!) healthcare and pharmaceutical companies.

Source: https://www.opensecrets.org/lobby/top.php?indexType=s

bgorman · on March 18, 2019

While your source seems to suggest that my statement is false, a closer looks shows that the only corporation (I.e. not an industry special interest group like the national association of realtors) that spent more money than Boeing was General Electric. So I would say your statement is even more misleading than mine. General Electric has fallen off a cliff since 1998. I suppose my "20" years statement is false according to "OpenSecrets", but if you change my statement to "15" years, it is probably correct, since General Electric did most of their lobbying closer to 1998 than 2018.

spenczar5 · on March 18, 2019

> I would say your statement is even more misleading than mine.

Fair enough, I guess. I think of groups like Blue Cross/Blue Shield or the NAR as representing corporations, but I can see how you'd disagree on a strict reading. I didn't mean to mislead.

> General Electric has fallen off a cliff since 1998.

In 1998, GE spent $7.28m on lobbying. Nearly every year since then, they have spent more than that, generally staying north of $16m per year, with the exception of the last three years.

Your statement is true only for the last three years.

Same source: https://www.opensecrets.org/lobby/clientsum.php?id=D00000012... for GE, https://www.opensecrets.org/lobby/clientsum.php?id=D00000010... for Boeing.

bgorman · on March 19, 2019

I appreciate your fact-checking. I do not mean to mislead anyone.

barney54 · on March 18, 2019

What is the evidence for this claim? It certainly may be true, but I’d like to see evidence.

bitminer · on March 18, 2019

You are not clearly questioning me or the other responder.

bitminer · on March 18, 2019

Additional comment:

After regulatory capture, it is not clear to me that any subsystem reviews are valid.

MCAS is not the only subsystem revised (or introduced) for the MAX series.

All "FAA" so-called reviews need to be inspected for validity. (Quotes because Boeing not the FAA did the reviews). And recertification will take months.

cashsterling · on March 18, 2019

I, too, have no words to communicate my astonishment.

A phrase comes to mind: "To screw up this bad you need an MBA." Which is not really intended as a dig on MBA's... but rather to indicate that this debacle has the feeling of business decisions overruling sound engineering design.

This seems poor system engineering on multiple levels:

- the whole airframe / engine thrust composite flight characteristic requiring an agressive AoA limiting safety... on a commercial airliner? what?

- The default MCAS relying on a single sensor, which seems to be occasionally quite unreliable, without sensor truth'ing based on other measurements (i.e. dividing ground speed by rate of altitude change to estimate AoA, then also figuring for control surface position feedback.

- The MCAS not having proper situational behavior: i.e. "I'm only 500 feet of the deck... maybe this not a great time to try to tip the nose down too hard... you know, in case that stupid sensor is wrong. Oh... and the pilot is also pulling back on the yoke trying to correct something I'm doing. yeah... my bad, MCAS will chill right now and assume the pilot is NOT trying to fly this thing like a stunt plane 500 feet off the deck".

chipsa · on March 19, 2019

Angle of Attack cannot be inferred from ground speed and vertical speed. It's quite possible to have a massively negative vertical speed (10000 feet per minute descent), and have an Angle of Attack greater than 45 degrees. Angle of Attack is strictly the angle of the airflow compared to the aircraft.

cashsterling · on March 20, 2019

We could argue about this (technically, you are not wrong) but the point is, the aircraft MCAS should be able to do this robustly with available sensor data.*

Without going into a lengthy analysis... you can infer AoA from groundspeed, rate of climb/descent, pitch/yaw/roll, and position feedback of control surfaces. Recent information on these data points can be used to calculate present and near future flight dynamics. The fields of Kalman filtering, adaptive control, and optimal control as applied to aerospace engineering are decades old.

There are many examples where adaptive flight controls can fly an aircraft that has inherently unstable flight dynamics... In fact most modern jet fighter require computer controlled flight controls to maintain stable flight.

Applying these algorithms to an airliner for sensor truth'ing is a cake walk.

* I think you actually can infer situational AoA from ground speed and altitude data: An example you could build a regression model of standard take-off scenarios and using recent groundspeed, altitude change, and the rate of change of (groundspeed/altitude). You could build this off of flight test data and simulation. In practice, there may be no point to doing this because a wealth of additional flight sensor information is available.

ams6110 · on March 17, 2019

This isn't exactly right. The aircraft has two sensors, but the system in question (MCAS) is only ever looking at one of them. That is part of what the software fix is reportedly going to address.

im_down_w_otp · on March 17, 2019

For a system with the responsibilities and impact on the safety-case that MCAS has, not even a 1oo2D design would be sufficient, since there's no clear way to determine which input is correct or incorrect if they diverge, but worse still is that MCAS doesn't seem to leverage even that basic, yet still insufficient, safety design from what I've been able to find on it.

At the end of the day the fail-safe is to throw control back to the human. And, then of course pin all the liability onto the human operator when things go wrong.

This was the problem I had with the Lion Air conclusion. Sure, it may be the case that a pilot can potentially override this system with situational awareness and training. However, the pilot didn't create this band-aid pile in the first place.

tzs · on March 17, 2019

> For a system with the responsibilities and impact on the safety-case that MCAS has, not even a 1oo2D design would be sufficient, since there's no clear way to determine which input is correct or incorrect if they diverge

Wouldn't that depend on how often MCAS is actually needed, and how it handles divergent sensors?

My understanding as a layman is that there are situations where a 737MAX is more likely to stall than a "regular" 737 would, but I haven't seen anything on how often those occur. Are they expected to routinely occur, so a fully functioning MCAS is important for safe operation of the plane?

Or are they supposed to be rare, one in several million events, and the MCAS is there to make handling those events the same as they would be handled in a "regular" 737, to minimize the need for retraining?

If the later, then I'd expect two sensors would be fine, or even just one sensor IF there was a reliable way to tell if it had failed, as long as the MCAS response to a sensor problem is to disable itself and let the pilots know they have to stay out of those "more likely to stall" situations for the rest of the flight.

salawat · on March 17, 2019

The problem comes in where the MCAS use should be infrequent, unless the sensor is borked.

In computing, we have a principle stated Garbage in, Garbage out. Circuits and electronics don't think. They compute. There is no error checking except that which is specifically designed and implemented into the system.

If you're getting data from a biased sensor with +/- 10 degrees AoA, a 10 degree actual AoA (well within the safe operating envelope) suddenly appears to MCAS as a 20 degree AoA (oh shit territory).

The system therefore engages doing exactly what it was designed to do.

Thus is the crux of the matter. The pilot was flying safely while his AoA sensor was telling a safety system he knew nothing about that he was flying dangerously.

The AoA system on earlier models of aircraft that the MAX was based on were a functional luxury/situational awareness aid. The flyability was not impacted by a horked sensor.

That changed once they had to add a software driven mechanism to keep the flight characteristics similar enough with the old airframe to be able to release the aircraft to 737 trained pilots, and not have to worry about retraining. Maintenance and pilot alike both needed to be aware that the AoA sensor became a safety critical component due to a failure or miscalibration jeopardizing the controllability of the airframe.

If they had gone through a full recert of this airframe, and not an expedited self-certify/grandfathering, these tragedies would have had a much smaller likelihood of occurring due to the increased scrutiny. Props on Brazil for doing their own footwork, and not blindly trusting the FAA's delegation to the manufacturer.

Personally, as a software engineer and quality assurance specialist, if I'd seen anything remotely like this come across my desk, I'd be raising hell, even if it meant yanking someone into a VP/C's office and giving them a dressing down for skimping on a cross-cutting safety critical concern, deadlines be damned.

Johnny555 · on March 18, 2019

his AoA sensor was telling a safety system he knew nothing about that he was flying dangerously.

Is that true with the Ethiopian Airlines crash? I can believe it was true with Lion Air, but that crash was so widely publicized that I'm surprised that any passenger airline captain had not heard of the MCAS system.

chopin · on March 18, 2019

You need to realize this problem before it is too late. If there is full trim down when you realize it there might not be enough room left to offload the stabilizer and trim back (manually, no less). Afaik if you cut off stab trim you can only trim back manually by moving the trim wheels.

I'd not be surprised if there will be another problem with MCAS detected with the ET flight where the pilots had effectively no chance despite knowing of the Lion air crash.

dilap · on March 17, 2019

yeah, I think you're right -- since you don't normally need the MCAS, you could probably get away w/ 2 AoA sensors and just disengage MCAS when they don't agree. you'd only get into trouble when you need MCAS and the AoA sensors are busted, which is a (rare) * (rare) event, so probably low enough odds to tolerate (i.e., probably never over lifetime of plane).

(just speculation of course.)

lutorm · on March 17, 2019

There are other ways of estimating the AoA. If you know the airspeed and the load factor, you can back out AoA to a reasonable accuracy. That's less precise than a dedicated sensor, but it should be good enough to tell which sensor is the good one if they diverge.

taneq · on March 18, 2019

> since there's no clear way to determine which input is correct or incorrect if they diverge

In systems with dual redundant channels you don't try to determine which one is correct (unless it's obvious, like lost comms to one sensor), you just throw an error and stop trusting both sensors.

im_down_w_otp · on March 18, 2019

It depends on whether the system needs to be Fail-Safe or fault-tolerant.

But, yes. Most of these systems are not designed to be resilient to that class of issues, and just send control back to the operator.

JorgeGT · on March 17, 2019

That... makes it even worse? Why would someone on Earth disregard an available extra sensor for a system that has unlimited authority to bring the horizontal stabilizer to full deflection?

ams6110 · on March 17, 2019

Yeah I don't disagree. I don't fly, but have sort of been following this out of curiosity. One comment I've seen from some pilots is basically "ehh, runaway trim isn't a new thing, we train for it in the sim, and there's a standard way to deal with it (disengage the automatic system and trim manually)."

So perhaps Boeing felt that this didn't really change anything in that regard. They seem to have been proven wrong.

inferiorhuman · on March 17, 2019

Technically MCAS activation isn't runaway stabilizer trim, and I'm sure that tripped up the Lion Air pilots. Check out the quick reference for a runaway stabilizer[1]. Note the steps:

* Control airplane pitch attitude manually with control column and main electric trim as needed

* If the runaway stops, stop.

Well, electric trim input on the yoke will stop MCAS temporarily. Of course if these guys had any experience on an NG they're already used to the computers trimming the plane in a counterintuitive manner via the speed trim system (STS). So not only is MCAS not a runaway trim situation, but pilots flying the NG will get used to the computer trimming the stabilizer "at random".

1: http://www.737ng.co.uk/737-800%20Quick%20Reference%20Handboo...

mannykannot · on March 17, 2019

This sort of MCAS failure presents itself to the crew as a trim runaway, and the instructions in the document continue:

  4 If the runaway continues:

      STAB TRIM CUTOUT switches (both)   . . CUTOUT

      If the runaway continues:

        Stabilizer trim wheel    . . . . .   Grasp and hold

  - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  5  Stabilizer    . . . . . . . . . . . .   Trim manually

(That bit about grasping and holding the trim wheel was a surprise to me.)

This is what the Lion Air crew of the flight before the one that crashed did, in response to the same sort of MCAS failure, and completed the flight without further trouble. While Boeing made a serious error in hiding the differences between this variant and its predecessors, it seems that the prior trim runaway procedure does work for this sort of MCAS failure (unless evidence to the contrary comes out of the Ethiopian Airways investigation.)

erichocean · on March 17, 2019

Boeing is "right" in the sense that, if you follow the normal 737 flight manual, you'll be fine, even with MCAS enable or even MCAS misbehaving due to the sensor readings being bad or whatever.

I think the problem is that the symptoms are different from other 737s, and if you don't KNOW about the MCAS stuff, that's very surprising. So as humans, even though logically the same procedure applies, instinctively you don't think to do it. And given the very short time frames involved, and the margin of error, it's not too surprising that these events have occurred.

Perhaps the need-to-retrain criteria needs to include the following amendment:

If the craft exhibits new symptoms, even if the response to those symptoms are covered correctly be the old operations manual, pilots will need to be retrained to account for the new symptoms.

inferiorhuman · on March 18, 2019

This sort of MCAS failure presents itself to the crew as a trim runaway

No, it doesn't and that's the problem. A runaway won't stop when you hit the trim buttons.

WalterBright · on March 18, 2019

> A runaway won't stop when you hit the trim buttons.

Depends on what causes the runaway. If the autopilot goes berserk, the trim buttons will override it (if I recall correctly, it's been a long time).

salawat · on March 17, 2019

No simulator accounted for a borked AoA sensor being able to nosedive the plane, or the altered aerodynamics from the different engine configuration.

So while yes, runaway trim is a thing, the circumstances under which an error could happen are substantially different.

For instance, a pilot could check the maintenance log book and see some work or inspection was done on the auto-trim subsystem. This would prime the pilot to be more on the alert for the possibility of trim misbehavior during the flight. Crisis averted, right?

But no pilot would look at an entry for an AoA sensor being off or worked on, and would think, "Boy, better look out for that safety system I never knew existed that could cause a trim runaway because my previously airworthyness agnostic AoA sensor is unreliable."

Flying and complex system diagnosis in time sensitive conditions requires extensive ahead of time mental model creation ahead of time.

lutorm · on March 17, 2019

It is weird, though, that this was not caught in simulations (not training simulations, but in design simulations.) I would assume that running simulations to see how the design behaves when sensors fail would be an absolutely standard thing to do. I don't work on airplanes, but we routinely simulate sensor failures and their effect on software systems and you'd think this would not be optional for passenger aircraft certification.

salawat · on March 17, 2019

When all of your testing is in house, and you don't have people who are willing to put the project at jeopardy for the sake of doing it right, you'll be amazed what can get overlooked.

If anything, the process of engineering is most difficult in that answering most questions tends to be straightforward (not easy, but straightforward once you know the right methods to apply), while figuring out if you've asked all the right questions is the thing that keeps me awake at night. If you don't ask and dig in until you've answered it fully, it's easy to get blind sided.

I have difficulty believing it could have gone so horribly wrong now, but I can't deny that just from the information available, even if MCAS isn't the root cause of the Ethiopian Air crash, there is some egregious failures in sound practice going on with Boeing to have slipped up this badly.

That's the thing with sound Engineering, when you do it, it just verks. When you don't...

CaptainZapp · on March 17, 2019

Key quote is training on a simulator

That's a tall order when the manufacturer doesn't inform their customers out of commercial expediency.

wahern · on March 18, 2019

If you only have two sensors and you're taking input from both sensors, assuming each sensor has the same probability of malfunction then you're doubling the chance of processing bad input.

You could choose to select the input that results in the least-worst bad outcome. But that may not bring the total risk below the increased risk of utilizing both sensors. The sensors are doing something purposeful, after all.

So it may be logical what they're doing. OTOH, the logic may make sense only in a path dependent sort of way, after a series of bad decisions that put them in a corner with poor options. There probably really should be 3 or more sensors.

chopin · on March 18, 2019

You take input from both sensors only when they don't disagree (by a margin). There may be failure modes where both have the same bad input but I don't think that this doubles your chances of processing bad input.

wahern · on March 18, 2019

You have double the chance that one of the sensors is bad. Which one do you choose? The one that says AoA is normal? Maybe that's the one that is bad. Worse, maybe such a bad input is more likely in a stall scenario.

Without any ability to differentiate bad from good input, you may be better off simply relying on a single sensor at any one time and completely ignoring the other, at least as far as controlling MCAS is concerned. Tossing up a warning that says the sensors disagree is another matter.

chopin · on March 20, 2019

I meant disabling the system if they disagree.

I agree that one can argue that this doubles the chance of system shutdown compared to properly detecting that one sensor fails. That's obviously only ok if the system is expendable to a certain degree.

inferiorhuman · on March 17, 2019

Because there was no available extra sensor. Trevor gets a few other things wrong too.

The elevator feel system (EFS) uses multiple sensors, in fact it has its own pitot probes[1] (two of them) on the elevator. It's not clear if it uses both, but I suspect it also uses both alpha vanes up front like MCAS should be doing. Input from the alpha vane (AoA sensor) is new on the MAX, IIRC. His tweets were a bit ambiguous, but the elevator feel system does not change the trim at all (also EFS was non-op on the Lion Air bird).

It's almost certainly NOT a sensor problem. Look at the graphs from the black box on the Lion Air flight. The angle-of-attack was almost exactly offset by twenty degrees left to right[2]. That sensor was working just fine, but something else on the path was fucked up (bad wiring? bad ADIRU? both have happened before). The fact that the alpha vane was replaced and that didn't fix the angle-of-attack readings supports this. Also, the 737 NG and 737 MAX use identical alpha vanes. While the angle-of-attack data is less important on an NG, we haven't seen people come out of the woodwork talking about how common alpha vane failures are on an NG. I'm inclined to believe that they're not all that common.

The Lion Air flight crew did write up the sensor problems[3].

I saw another one of these "I'm a software engineer so I've got great insight" posts[4] and chuckled. It's a bit of swing and a miss as well. MCAS is hardly silent or subtle. The same trim wheels his Cessna have are present on the 737 and they're noisy[5]. MCAS doesn't quite override the pilot either, and you can see that on the black box graphs from the Lion Air flight[2]. Twenty times the pilot flying entered opposite trim to counteract MCAS and twenty times MCAS paused for five seconds. On the twenty-first time something changed and the plane crashed. It's not quite that simple and that's why there are active investigations.

1: https://4.bp.blogspot.com/-ZsL1asAUBRU/W-pSdWs2dYI/AAAAAAAAF...

2: https://reports.aviation-safety.net/2018/20181029-0_B38M_PK-...

3: https://theaircurrent.com/wp-content/uploads/2018/10/jt43-jt...

4: https://medium.com/@jpaulreed/the-737max-and-why-software-en...

5: https://www.youtube.com/watch?v=3pPRuFHR1co

cameldrv · on March 18, 2019

I’m the OP Dave BTW. Trevor copied a Facebook post I made a couple of days ago into those tweets. I don’t know all of the details about the EFS system but I’d be pretty sure it doesn’t use data from both sides for anything, because that would be against the brick wall isolation concept in the air data and flight control computers.

You say that the MCAS is neither silent or subtle, but there is no light or master caution or anything, just the trim wheel turning and clicking. I’m not sure you’d notice it if you were wearing a headset and were being distracted by the stick shaker.

The reason they died on activation #21 and not #1 is that the effects are cumulative. Every time it activates it dials in another 2.5 degrees of nose down trim. Eventually you can’t pull on the yoke hard enough to hold altitude.

I agree that the jury is still out as to whether it’s a sensor issue per se or some sort of ADC or data transmission issue.

inferiorhuman · on March 18, 2019

The reason they died on activation #21 and not #1 is that the effects are cumulative.

The stabilizer position was graphed on that preliminary report (see: pitch trim position). There was a slight trend towards nose down pitch, but no, at the time they lost control there's a sharp decrease in pitch that doesn't correlate with MCAS. If you look at the "trim manual" line you'll see that the pilot flying was continuing to input nose up pitch on the switches to no effect. Something changed and it wasn't simply that the pilot was overridden by MCAS. After 21 even MCAS continued to counter the pitch up inputs as the stabilizer continued to contribute additional nose down pitch.

I agree that the jury is still out as to whether it’s a sensor issue per se or some sort of ADC or data transmission issue.

To be clear I highly doubt the alpha vane was defective. It's a possibility, but look at that graph. The left "angle of attack indicated" mirrors the right one almost exactly except that there's a near constant twenty degree offset. If that's a sensor problem in two separate sensors I'll pull a Werner Herzog and eat my freaking shoe.

engire · on March 23, 2019

I totally agree here. Im an AME with Max Licence and i would have to say. Im quite sure this wasnt an AOA sensor. The FCC thats active with MCAS as its function (both have this Function) would have been looking at the right vane on one flight and then the left vane on the next flight, however on the last 3 legs the fault was the same. And trust me on this one. May have been different indications bbut all pointed at the same fault. I suspect the FCC or ADIRU was at fault or something it aircraft installation that was inducing the 20 degree difference between the 2.

autopilotsw · on March 25, 2019

I don't believe there is a brick wall design here since the FCC outputs are not compared so why would you isolate inputs. I believe it's just a bad design to use the single AOA input.

inferiorhuman · on March 19, 2019

You say that the MCAS is neither silent or subtle, but there is no light or master caution or anything, just the trim wheel turning and clicking. I’m not sure you’d notice it if you were wearing a headset and were being distracted by the stick shaker.

BTW the trim wheel has little stripes on it, moves quickly when being operated automatically, and is fairly noisy. It will get your attention. However, my suspicion is that pilots experienced on the NG will mistake the trim wheel doing counterintuitive things for normal STS operation (especially after takeoff). I don't know if the Cessna trim wheel(s) are any quieter or more subtle, but the Cessna certainly doesn't have all the band-aids that a 737 NG or MAX would.

To this point here's a post from a 737 NG pilot[1]:

As a long-time 737 driver I'll just chime-in a few points

[...]

(1) Just after takeoff there is a lot going on with trim, power, configuration changes, and as noted above, the darn speed trim is always moving that trim wheel in seemingly random directions to the point that experienced NG pilots would treat its movement as background noise and normal ops. Movement of the trim wheel in awkward amounts and directions would not immediately trigger a memory item response of disconnecting the servos. No way.

(2) The pilots could very reasonably not have noticed the stab trim movement. Movement of the stab trim on the 737 is indicated by very loud clacking as the wheel rotates. On the -200 it was almost shockingly loud. On the NG, much less so. HOWEVER, the 737 cockpit is NOISY. It's one reason I am happy to not be flying it any more. The ergonomics are ridiculous. Especially at high speeds at low altitudes. With the wind noise, they may not have heard the trim wheel moving. The only other way to know it was moving would be yoke feel and to actually look at the trim setting on the center pedestal, which requires looking down and away from the windows and the instruments in a 'leans'-inducing head move. On the 717, for example, Ms. Douglas chimes in with an audible "Stablizer Motion" warning. There is no such indication on the 737.

[...]

Finally, runaway stab trim is a very, very rare occurence up until now. We trained it about once every other year in the sim because it is so rare. And when we did it was obvious. The nose was getting steadily heavier or steadily lighter with continuous movement of the trim wheel. That is a VERY different scenario than what these pilots faced.

We also trained for jammed stabilizer, the remedy for which is overcoming it with force. The information they were faced with could very reasonably have been interpreted that way, too.

1: https://www.pprune.org/rumours-news/619272-ethiopian-airline...

bjowen · on March 18, 2019

One thing I’ve seen kicking around is that MCAS alternates input between the left FCC and right FCC, which might help understand why the JT43 only had to fight the stick shaker the whole flight (!) but JT610 had the automated trim input as well.

m463 · on March 19, 2019

serious question: which sensor has failed?

you need at least three sensors. (although the "sensors disagree" option you can purchase is better than nothing)

clavoie · on March 17, 2019

So, a modern aircraft probably have a few thousands sensors (if not way, way more), not strictly redundant but overlapping in some fashion -- I have to believe there's been some modelling trying to check that sensor input is consistent; if only to have some kind of way for the computer to figure out that a sensor has gone crazy (say your compass suddenly detects a significant change in heading, but no sensor has detected even a degree of roll, a change in speed, heading computed by GPS over time is stable... compass might be borked, whaddya say?)

Why isn't such modelling used to let sensors be disabled when they're clearly sending bogus input?

rocqua · on March 17, 2019

My guess is that such systems are complex, hard to model for safety assurances, and hard to reason about in the moment.

Yes, it will probably make the system safer, but your guarantees become a lot more fuzzy. This is not the field for fuzzy guarantees.

de_watcher · on March 18, 2019

> Why isn't such modelling used to let sensors be disabled when they're clearly sending bogus input?

You need to crash a lot of planes to train that NN.

m463 · on March 19, 2019

there should be at least 3 AOA sensors - so you can not only detect a mismatch, but figure out which sensor is most probably incorrect.

cameldrv · on March 17, 2019

We will see. Part of the avionics design is that there is a “Brick Wall” between the two sets of sensors, air data computers, and flight control computers. If the FCC uses both for MCAS, that safety concept doesn’t exist anymore. That is certainly the reason they didn’t compare the AoA sensors in the MCAS logic. I think they could probably use pitot/static as a cross check on AoA, but it has some tricky edge cases.

cmurf · on March 17, 2019

Reports say it alternates each flight between them; if that's correct, in the case of one good sensor and one bad, that you wouldn't see back to back flights with anomalous behavior related to a bad sensor. Otherwise, perhaps the problem isn't with the sensor itself, but with some part of how sensor information is communicated to the interpretive computer.

blattimwind · on March 17, 2019

AIUI the entire FCC only uses one set of sensors and which set of sensors is used is switched together with the FCC?

michaelfeathers · on March 17, 2019

I'd go further.

Use of software to make a plane airworthy when it isn't natively airworthy is a problem. Civilian aircraft should have a different standard than fighter aircraft: one that allows graceful degradation under failure.

clavoie · on March 17, 2019

Being a total layman in terms of aircraft design -- I've always wondered why they weren't built more akin to distributed systems. Instead of 1-3 large fuel tanks, have hundreds of small ones, so losing one is not a disaster. Similar with hydraulics and electronic wiring -- instead of a single cable running along wings, why not a redundant network that can survive everything short of a wing being sheared off?

BillinghamJ · on March 17, 2019

The things you've mentioned are likely to have significant downsides specifically in terms of weight and complexity

inetsee · on March 17, 2019

The A-10 Close Air Support aircraft manages to survive in exceptionally stressful circumstances. The flight control systems are triply redundant: two independent hydraulic systems, and a set of cables if both hydraulic systems fail.

Having a second angle of attack sensor on the Boeing 737 MAX (which is critical to aircraft safety) be an extra cost option is unconscionable.

nutjob2 · on March 18, 2019

They are built with extensive redundancy. Airbus A320s have 5 separate computer systems, programmed by 4 different teams, for aircraft control, of which you only need one to fly the plane.

See https://pdfs.semanticscholar.org/d4ac/17fcc0db3396a3219785bd...

It's a question of what design choices you make. For instance Boeing only has three in its comparable 777 fly by wire system.

kenneth · on March 18, 2019

That's pretty much the case. Fuel tanks are split into multiple smaller tanks in the wings, and most systems have one or multiple redundancy (multiple hydraulics systems, multiple flight computers, multiple black boxes, graceful failure protocols, etc.)

sangnoir · on March 18, 2019

The answer is economics - complex systems are expensive to build and maintain, and the air travel industry is notoriously low-margin. I don't see travelers accepting higher ticket prices because the airline bought (costlier) airframes with more redundancies. Any airline that attempts this will be undercut by those that do not.

Aloha · on March 18, 2019

How do you fill hundreds of small fuel tanks?

sudhirj · on March 18, 2019

You can link them with one-way seals, make them fireproof, etc. But I don't think you need hundreds. The A-10 example is great, that's an example of a plan that's designed to be shot at from close range and still fly.

AstralStorm · on March 18, 2019

Same way, with more pressure. They have to be connected anyway. (Honeycomb pattern, probably.) The real question is how to seal them on failure without huge weight cost.

hn_throwaway_99 · on March 17, 2019

Yes, was just about to post that exact section before I saw your comment. I've been following this story fairly closely, but this was news to me. I've seen other discussions that were along the lines of "We're not 100% sure why the planes crashed, so now that they've been grounded without a clear root-cause understanding it may take a long time before they are certified to fly again because we don't know exactly what to fix."

I call BS on that line of thinking. At the very least it seems like an easy decision to require redundancy in the AoA vane before letting the planes fly again.

neurotech1 · on March 18, 2019

Airbus A320 used to have GPWS (Ground Proximity Warning System) as an optional extra. A budget airline didn't order it, and quite possibly encouraged pilots to make excessively fast approaches, and led to a CFIT (Controlled Flight Into Terrain) crash of a perfectly good jet.[0]

Another crash related to the 737 MAX crash, is Scandinavian 751 [1]. On a MD-81 airliner, with "Automatic Thrust Restoration", A system the pilots were not trained on, caused the thrust to be increased, when the pilot tried to throttle back the damaged engines and clear the compressor stall. "Pilot Betrayed" was the Air Crash Investigations episode.

[0] https://en.wikipedia.org/wiki/Air_Inter_Flight_148

[1] https://en.wikipedia.org/wiki/Scandinavian_Airlines_Flight_7...

[2] https://www.imdb.com/title/tt1862850/

sitkack · on March 17, 2019

Not quite as egregious, but Amazon will sell you an Amazon Store Credit Card and then continually pressure you into buying the account protection.

Giving away the liability and then up selling the cure is immoral.

quickthrower2 · on March 17, 2019

Wait what? Aren’t there consumer credit laws that offer the protection you need?

ficklepickle · on March 18, 2019

It's probably insurance that pays your outstanding balance if you are badly injured or get a terminal disease or something. It's usually nearly impossible to claim and therefore a scam.

cameldrv · on March 17, 2019

(I originally wrote what my Brother in law posted on Twitter) I believe that the package is for Cat III hand flown landings. I think that to do that, you need AoA on the HUD, and in the event of an AoA failure you have to go around. It is strange for sure that you need AoA consistency checks for that, but not for MCAS.

jacurtis · on March 18, 2019

For anyone who is old enough to remember when airbags first came onto the scene in cars, they were sold as optional packages.

Most car manufacturers called these "safety packages". These optional packages were sold basically in the same way as you jokingly mentioned above. "Buy this optional package and when you hit a tree at 50mph, this will keep you from flying through the windshield and will probably save your life."

I actually remember a conversation that my parents had with eachother when I was young about whether it was "worth it" to buy airbags in their new car. Yes, that was a REAL conversation that Millions of people had and a decision they had to make for nearly 25 years before airbags became standard in cars in the late 90s.

Lots of people felt that the optional saftey package weren't worth the money and many people died through the '80s and '90s from car crashes that they would have survived had they splurged for the "safety package" when they purchased their car from the dealer.

It turns out that when people are buying cars, they aren't planning on crashing them. So spending extra money that only helps them when they crash felt unnecessary. These packages could cost in the neighborhood of $2,500 extra. Which on a $20,000 car is over 10% extra! Would you pay 10% extra for a safety package if cars today didn't come with standard safety equipment?

This is what led to the USA making airbags mandatory in 1998, forcing this safety feature on manufacturers in order to save lives.

I imagine the 737 AoA sensors were sold as the same thing. Boeing sold this as a "additional redundancy package" or "safety package" and probably included many other features as part of a larger package.

Just like thrifty car buyers of the 70s, 80s, and 90s who figured they didn't need airbags in their cars and opted out of the safety package. The budget airlines of Lion Air and Ethiopian Airlines had a good history with the normal 737 planes and figured this additional safety package wasn't worth the money.

heavenlyblue · on March 18, 2019

You do understand that a single light in an aircraft doesn’t cost 10% of it’s total budget?

That’s like saying that car manufacturers decided to add a light saying their ABS is disengaged depending on whether you paid for it.

That’s more like it.

Reason077 · on March 18, 2019

> "The fact that the redundancy of a sensor on which a system capable of sudden, large control inputs relies is an optional package to be purchased separately... I simply have no words."

As far as I'm aware, all Boeing 737 aircraft have 2 AoA sensors. There isn't an option to add a 3rd redundant one like the Twitter poster suggests.

From what I understand, what is optional is the display of the AoA on the main screen and/or HUD, and the "AOA DISAGREE" warning which cross-checks the data from the 2 sensors.

(Airbus A320s have 3 AoA sensors. A380, A350, and Bombardier C-Series, aka A220, have 4.)

gaius_baltar · on March 18, 2019

I got this in the other way: there is no actual redundancy even with the optional package, the new sensor is only used to compare inputs with the one used by the control system and warn the pilots if they disagree.

Pretty sure I'm misinterpreting things (I hope!).

In the other hand, if there are two sensors, what the control system does when they disagree and there is no hardware to show this to the pilots?

Reason077 · on March 18, 2019

Right. The optional packages don't add additional sensors and don't add redundancy. But they may improve situational awareness by drawing pilot's attention to a sensor problem faster.

The MCAS system, apparently, always takes its input from one of the two AoA sensors - it alternates each flight. Lots of details here: http://www.b737.org.uk/mcas.htm

Gibbon1 · on March 18, 2019

I'll also point this out "Unlike the EFS system, MCAS can make huge nose down trim changes."

That's a potential reason why an unaware pilot wouldn't disengage the auto trim system. Because he wouldn't think the trim system as being capable of causing an extreme pitch down.

Also this response.

> Interesting. So how did Boeing get a plane without redundancy in a critical sub system approved?

Answered own question, by hiding it in what is normally a non-critical subsystem.

engire · on March 23, 2019

Can I first correct you on two major errors in your statement. TWO AOA vanes are NOT an option on any aircraft. Every aircraft have two vanes as standard. the option is for a small AOA position indicator icon on the top corner on the main screen for the captains side. also there is no AOA disagree light. This is an amber AOA disagree text that appears on the screen. I can only say that on my airline this is standard as it was on the 737 NG in our fleet.

phkahler · on March 17, 2019

One would hope MCAS would not do anything if the AoA input was known bad. This looks really bad for the company. I'd be asking for FMEAs and/or FTAs on this, what the conclusions were, and how they intended to mitigate this exact failure - because it's got to be in there.

zaphar · on March 18, 2019

If you've only got 2 sensors and you aren't even reading from one of them then "bad" input is hard to know. You can do nothing if they both disagree but you can't actually know which sensor is bad and which is good without a tie breaker.

ddalex · on March 18, 2019

You don't need to automatically fix the disagreement. You just need to stop taking nose-down action and inform the human in charge .

Problem is, if you need to inform the human, you also need to train the human, which leads to extra costs.

Gibbon1 · on March 18, 2019

Who knows but it the chatter makes it sound like the system doesn't have redundant sensors. You have two systems with one sensor each. One system is active and the other is inactive. And no voting logic.

ip26 · on March 17, 2019

Read somewhere a few days ago that Southwest, operator of maybe the most 737's in the world, didn't like the MCAS on the MAX and that extra "package" was essentially customizations/modifications created specifically to make them happy.

neuronic · on March 17, 2019

Might as well make the trim wheel or flap control optional packages as well at that point.

Except they can't because all pilots know about those since day 1 of their career as opposed to the MCAS sensor-software system designed to deal with airframe engineering issues.

Alex3917 · on March 18, 2019

> What I find downright criminal is this

Not much different than how with most car companies, the only way to get headlights that don't blind oncoming cars is to pay to upgrade to leather seats. C.f.:

https://www.iihs.org/iihs/news/desktopnews/night-vision-head...

gok · on March 18, 2019

Safety as an option is pretty standard in aviation, unfortunately. For example, collision avoidance systems have been in development since the 1950s, but TCAS remains only partially required to this day.

m3kw9 · on March 18, 2019

Stuff like this is usually triple redundant

Scoundreller · on March 17, 2019

Could it be an ETOPS requirement?

tntn · on March 17, 2019

How is this different from extra safety features available as option packages for automobiles?

simion314 · on March 17, 2019

If my airbag is missing my car won't crash into traffic by itself, if the airplane system fails the airplane points you into the ground.

therealx · on March 17, 2019

No, but, say you don't pay for the automatic breaking system. You may rear end someone that the car might have otherwise saved you from.

function_seven · on March 17, 2019

The difference here is that MCAS is an automatic feature that Boeing needed to mimic the existing 737 flight characteristics. They Created the need for the automatic system in the first place, then sell the safety redundancy as an option.

To extend (torture?) the automotive analogy, it would be as if the manufacturer substituted a new braking system that required preheating the rotors, installed software to automatically ride the brakes for the first 2 miles of driving in order to get that preheat done, then sold an optional safety feature to verify that the brakes were at the proper temperature.

The old way worked just fine without need for “emulation”. The new airframe required some software to mimic the old one, but they decided to charge you for the add-on to make sure that their band-aid worked reliably.

gulikoza · on March 17, 2019

Again, yes, you may rear end someone, but this is not the correct analogy here. In this situation, the car rear ends someone while you are fully breaking because you didn't pay for the automatic breaking upgrade. This is a fail.

acqq · on March 17, 2019

But here it was: pay to know if the automatic braking system will speed up the car for himself every time you first press the brake and then stop pressing it as you see that it's seems to be better:

https://www.nytimes.com/interactive/2019/03/13/world/boeing-...

Look at the charts above. Both pilots "fought" with the controls, correcting the plane course, then believing "it's corrected now" and then the plane sunk again and again.

alistairSH · on March 17, 2019

Most of the safety features that keep the car out of an accident (ABS, ESC, rear cameras, TPMS) are mandated by law in the EU and USA.

Yes, lane-keeping and auto-braking are not mandated. I expect they will be within a few years.

breatheoften · on March 18, 2019

Somewhat related to this thread and topic ... I'm driving a rental car right now and its the first time I've driven a car with a lane-assist feature (sensors help steer car back into lane when drifting out of lane). In France you are supposed to give 1.5m margin when passing a cyclist so I drifted left and out of my lane to pass - and really freaked out when the lane assist feature kicked in and steered me aggressively back toward the bicyclist...

I think I could have avoided this situation by turning on the left-turn signal to disable the lane-assist and generally its a good idea to use the indicator even when just passing a bike -- but that's not something I normally need to do to avoid this particular failure mode ...

de_watcher · on March 18, 2019

> I think I could have avoided this situation by turning on the left-turn signal

That's exactly what you had to do according to the regulations.

gubbrora · on March 18, 2019

People shouldn't drive drunk, but disabling the airbag when the driver is drunk is not acceptable.

de_watcher · on March 18, 2019

Well, LKA would actually help to stay in lane while drunk, no?

That thing has a good default that is still overridable by the brute force (force that's less than the force needed to steer if the power assist fails altogether).

gubbrora · on March 18, 2019

My point is that unexpected hazards when people break regulations is a problem. Because people will break regulations.

alistairSH · on March 18, 2019

Yeah, that's the way it works on my dad's Subaru.

The adaptive cruise control in my VW actually gets confused sometimes. Drove through a puddle at the beach last year and it jammed on the brakes. Glad nobody rear-ended me.

duado · on March 17, 2019

As soon as an auto safety feature is proven effective for 5+ years, it’s made mandatory.

subcosmos · on March 17, 2019

Rate 1/5 stars. App is crippled without in-flight upgrade.

SMH

gok · on March 17, 2019

I really wish people would wait for the report before drawing conclusions like this. These investigations take a long time, and it's often not the issue that gets circulated on Twitter.

AirAsia 8501 was widely suspected to be caused by a thunderstorm. Wired [1] and WaPo [2] still have articles up blaming the weather. When the investigation came out a year later, it turned out to have nothing to do with weather. The fly-by-wire system malfunctioned and the pilots got confused.

[1] https://www.wired.com/2014/12/airasia-qz8501-thunderstorms/

[2] https://www.washingtonpost.com/news/capital-weather-gang/wp/...

blattimwind · on March 17, 2019

> The fly-by-wire system malfunctioned and the pilots got confused.

The Wikipedia summary of the investigation report sounds quite a bit different.

It says there was an intermittent malfunction that could be cleared by following a procedure, which was done three times during the flight, with no impact on flight safety (AIUI). The fourth time, instead of following procedure, the pilot toggled the flight computer's circuit breaker, which he is not allowed to do in flight, which reset the flight computer completely, disabling various automated systems that they would now have to re-start, which they did not do. Then the plane entered a stall and due to communication issues pilot and co-pilot gave contradictory control inputs which resulted in no control input to the plane.

the_mitsuhiko · on March 17, 2019

The analogy does not make much sense because the majority of what is in this twitter thread is not new information or disputed. We also know that boeing is fixing it by a software patch already.

losvedir · on March 17, 2019

Eh, we still don't really know if MCAS is the cause of the Ethiopian crash, though. Some things point to it (flight fluctuating up and down, jackscrew found with full nose down trim), but some things are different, too, like crazy acceleration and handling issues right from takeoff, when the flaps would still be up and MCAS inactive.

spectramax · on March 17, 2019

Even if it Egyptian Air didn't crash their plane, Lion Air investigation alone exemplifies systemic negligence, not from the software standpoint, but from the top-down executive level and the negligence of the FAA.

So, your point is valid about that we need to wait for Egyptian Air's investigation, but misplaced because of the aforementioned argument.

croisillon · on March 17, 2019

*Ethiopian

gok · on March 17, 2019

All we know is that they're issuing a software update. We don't know that this update actually addresses what caused the crashes.

irq11 · on March 17, 2019

...or that the crashes are related. Too many people are skipping the step where the cause of the second crash is actually determined.

This twitter thread, in particular, is just summarizing information available in news articles, and leaping to the conclusion that crash #2 is the same or related to crash #1.

dmix · on March 17, 2019

The report for crash #1 isn't even out yet. Let alone #2, which hasn't even had it's blackbox fully analyzed.

gist · on March 17, 2019

And how exactly does someone reading this twitter thread know that?

macspoofing · on March 17, 2019

What do you mean? The entire thing is conjecture. We just don't know if there was even an issue. It's possible that the two accidents were uncorrelated. It's possible that the software worked perfectly but the pilots made wrong decisions. It's possible that the software worked perfectly but pilots panicked because they were unaware of what was happening. It's possible the software was at fault .. and on and on ...

He also claimed that there was a sensor problem. Are we sure about that? I have a hard time believing that a critical system would rely on one sensor to function properly. These things are built like space shuttles - with multiple redundancies and a fail-over strategy.

At the end of the day, we just don't know. Smart people are trying to figure it out, and we'll get to the bottom of it. Let's just wait a bit.

>Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light, which lets pilots know that this problem was happening. Both 737MAXes that crashed were delivered without this option. No 737MAX with this option has ever crashed.

Jesus. What an atrocious statement.

RandomTisk · on March 17, 2019

Wanting to make a better and cheaper jet isn't at all controversial, though the twitter thread seems to try to make it appear so.

salex89 · on March 17, 2019

We still don't know whether the pilots reacted properly or not. We don't know even if they had the time to react properly. These tweets are written with some assumptions, but correct, in general.

rdiddly · on March 17, 2019

This is a "Twitter sucks" off-topic rant comment, so if you're not interested in that, just move along. At no point in my reading on this topic or any other, did I say to myself "Boy this thing would be great if it were broken up into a series of small brainfarts and served up one at a time on a bloated, slow-as-molasses web platform." I'm embarrassed every time someone tries to express a complex thought on Twitter. It's like a machine that turns your thoughts instantly into listicles. And every time I go view something there, I'm astonished all over again at the dismal user-experience people put up with in exchange for "access" to a "network." (Facebook is worse... it looks like some crap I built for my big-company employer. sic. I'm not much in the front-end department, so yes, my UI sucks balls. But my users have to use my app, and they get paid for doing so. Facebook users, I can only weep at the thought. But I digress. This was supposed to be about Twitter.)

cameldrv · on March 17, 2019

I actually originally posted this on Facebook and then my brother in law broke it up into tweets and posted on twitter.

dilap · on March 17, 2019

I'll take a contrary opinion -- forcing each thought into a tweet is a nice constraint that compels people to get to the point. This would probably be less well-written as, say, a Medium article.

rdiddly · on March 17, 2019

I actually agree with that - it's an interesting exercise to communicate concisely within a limit. (Sort of like Vine, which Twitter destroyed, but oops there I go getting smart-assy again.)

It's just, if that's your game, stick to the game, don't cheat by sprawling across 14 of those. It fails as an instance of the Tweet artform because cumulatively it's too long, and it fails as a longer-form piece because it's all broken up.

If I strung together 1780 Vines to make Fellowship of the Ring (and yes, nerdily, the math works out there), what have I proven? My powers of conciseness and economy? My respect for rules and limits? My ability to choose the right tool for the job?

thatoneuser · on March 18, 2019

Well yeah the point of medium is sound like you're giving information when really you're just making yourself sound smart. So if this were medium Wed never actually talk about what caused the crash and instead spend an hour going over why you NEED to use some esoteric library.

dredmorbius · on March 17, 2019

https://threadreaderapp.com/thread/1106934362531155974.html

And whilst I'm not a Twitter fan or user, on Mastodon I've found the practice of writing in <500 character chunks, posted publicly (and hence not easily revisable) is an interesting and useful writing vibe. In part because feedback can be specific to a given chunk, letting me see what resonates, or doesn't, what communicates as intended, or not.

Yizahi · on March 17, 2019

Hear hear. This was almost painful to read. Normally I just close twitter streams as soon as as I realize it's Twitter, but this one was interesting enough to try reading. I wish LJ didn't die or people would adopt Dreamwidth or similar "normal" blog platform. Twitter is simply horrible in UX.

VBprogrammer · on March 17, 2019

Often times as a Software Developer I encounter a bug which has an obvious two line fix. Rather than implementing that though I often spend another few minutes digging into how and why that bug was introduced. Often times I'm left with a greater understanding of the problem or encounter a requirement that the previous developer was trying to implement that my fix would have broken.

Other developers will simply assume the previous developer was an idiot and bash in the fix.

I feel like in this case a lot of people are assuming the engineering team were idiots, or criminally trying to make an aircraft which didn't pass safety standards. Rather than taking a look at what caused the bug in the first place.

rlabrecque · on March 17, 2019

I deal with this constantly. Someone gets a bug report for a crash, let's say a null pointer dereference, so often I see:

> if (pPointer == nullptr) { return; }

> Crash is fixed!

I mean sure... but that's not the problem. Why was pPointer null here in the first place? So few people take the time to understand that :(

aphextron · on March 17, 2019

>So few people take the time to understand that :(

Because "fix this null pointer exception" is ticket number 14 this week, and your PM just wants it checked off. They don't want to hear that you need another week of digging through layers of spaghetti to track down the source; that doesn't bode well for their KPI goals.

This is a systemic issue in the way software companies function.

quickthrower2 · on March 17, 2019

We can blame management but sometimes the developer just doesn’t give a f* or it doesn’t fit their agenda. I guess both are ultimately management issues, but it’s a shared responsibility.

thatoneuser · on March 18, 2019

Ultimately it doesn't matter if your engineers don't give a fuck if management will sabatoge their efforts when they do care. Only if you have management who won't forgo quality for ticks can you really blame the dev.

janpot · on March 17, 2019

OMG, yes. Every time I see this in a code review I go "why is this receiving a null here?" And 90% of the time I get back "dunno, I saw null pointer exceptions in the logs"

jacquesm · on March 18, 2019

Typical. Fix the symptom instead of the cause.

stevenwliao · on March 18, 2019

Sounds like a failure of the type system...

jaclaz · on March 17, 2019

>Often times as a Software Developer I encounter a bug which has an obvious two line fix. Rather than implementing that though I often spend another few minutes digging into how and why that bug was introduced. Often times I'm left with a greater understanding of the problem or encounter a requirement that the previous developer was trying to implement that my fix would have broken. >Other developers will simply assume the previous developer was an idiot and bash in the fix.

That is exactly the case of Chesterton's fence (JFYI):

https://en.wikipedia.org/wiki/Wikipedia:Chesterton's_fence

thatswrong0 · on March 17, 2019

It sounds additionally like the bug, caused by the issues presented in the tweets, made it past the usual safeguards and into production because of an abnormal certification process: https://www.seattletimes.com/business/boeing-aerospace/faile...

So there were failures at almost every level it seems.

e40 · on March 17, 2019

> So there were failures at almost every level it seems.

They aren't failures, they were designed in. There are forced at work, since the 80's, that have been trying to kill effective government institutions because they are seen anti-free market (the exact quote was "My goal is to cut government in half in twenty-five years, to get it down to the size where we can drown it in the bathtub."[1]).

This is the end result, as are E-coli outbreaks, etc. The government is being made ineffective so people can stand and point "see, I told you so, and we need to move this to the private sector were it will be done properly."

[1] https://www.brainyquote.com/quotes/grover_norquist_182534

thatoneuser · on March 18, 2019

To be fair there has been plenty of incompetence in the government. I mean as this or the 2008 financial crash shows us - total free market is folly, but there's some back and forth discussion to be had on how much government is good.

JorgeGT · on March 17, 2019

Wow... thanks for the link. The details are highly troubling. They assessed the safety of the MCAS based on a statement saying it had max. authority of 0.6º, being thus permissible to rely on a single sensor for this system. But in fact it had an initial authority of 2.5º, which is already pretty high by itself, and then that authority increased after each activation until full stabilizer deflection. That they implemented such a high authority system without triple sensor redundancy and without briefing the pilots is just insane.

im_down_w_otp · on March 17, 2019

Yeesh. If true, it seems like it should have been a 3oo4 system, with added heuristics for handling soundness of the sensor inputs, and with easy override defaults (e.g. control surface override) and auto-disable on disagreement.

What I don't get is how this drifting compounding boundary condition wasn't caught during formal modeling?

sitkack · on March 17, 2019

We don't do enough adversarial testing.

We are increasingly reliant on systems that have poor closed loop behavior with little to no memory. We need to design systems that are redundant, that have short and long term memories and have domain knowledge to compare their internal understanding of the universe with. And these autonomic systems need to alert the operators on what they are doing, why they are doing it and how to turn it off.

quickthrower2 · on March 17, 2019

> Other developers will simply assume the previous developer was an idiot and bash in the fix.

I think it’s worth avoiding working in teams with devs that do that. It’s a nightmare.

The only excuse is an inexperienced dev, and this should be picked up during code review and they are told why it’s a bad idea to chuck in fixes without considering surrounding code.

peteradio · on March 17, 2019

I haven't seen anything blaming engineering for these problems, although I'm not saying those comments aren't out there. I think the impression I get is that Executives and Regulators are to blame. Engineering generally isn't in a decision making capacity, they are given problems and provide solutions in a pretty compartmentalized role. Sure they could have resigned if they had known what it was leading to (and perhaps that happened) but that is about the extent of pressure they can exert in these situations.

userbinator · on March 17, 2019

More information about the MCAS than you probably ever wanted to know: http://www.b737.org.uk/mcas.htm

That page includes this noteworthy and unusual design decision:

"MCAS is implemented within the two Flight Control Computers (FCCs). The Left FCC uses the Left AOA sensor for MCAS and the Right FCC uses the Right AOA sensor for MCAS. Only one FCC operates at a time to provide MCAS commands. With electrical power to the FCCs maintained, the unit that provides MCAS changes between flights. In this manner, the AOA sensor that is used for MCAS changes with each flight."

mannykannot · on March 18, 2019

> the AOA sensor that is used for MCAS changes with each flight.

My first thought was that, in the Lion Air case, it happened both on the crash flight and the one before - but an attempt was made to fix the problem between flights, so the FCC may well have been powered down (alternatively, maybe both senors were faulty.)

rwhitman · on March 18, 2019

One of the trends I find most disturbing in business over the last few years is the nonchalant passing of the buck on hard business problems, down the food chain to software engineers.

The Silicon Valley mantra of "software can change the world!" has infected every corner of our lives but frequently people misinterpret this as "software can solve anything! (so I don't have to)".

Software engineers also tend to eagerly say "yes" to solving every problem with code, when sometimes a problem just can't be solved with code. Thus compounding the issue.

I'd argue that many of the macro problems in our world right now stem from this cycle.

My PSA to all devs - if someone asks you to patch a major business problem with software, push back. Sometimes a puzzle to solve, isn't your puzzle to solve. Send it back up the food chain. You don't have to say yes to everything.

thatoneuser · on March 18, 2019

That's easy enough to say while talking about crashing airplanes. Harder when your H1B or family's dinner relies on you keeping your job.

I think everything keeps pointing to more punishment for management and corporate decisions. I mean management doesn't really do the work, they should at least be responsible. Otherwise it's just a system to attenuated blame.

WalterBright · on March 18, 2019

The failure of the MCAS system does not indict using automatic controls to adjust the flight envelope of the airplane. Lots of systems do that already:

1. The autopilot

2. The feel computer

3. The device that reduces elevator authority at high speeds

4. The stall stick pusher

5. Hydraulically boosted controls

Modern jets would not be flyable without these, and the net effect of them is to make the jet much safer.

The failure of the MCAS system does not indict the purpose of the MCAS system, either. The problem with it was it continued operating with a failed sensor.

mcguire · on March 17, 2019

"Hey, Bob, we need you to write the software for this system. It's based on one, non-redundant sensor and can move the elevator trim to an extreme position. Sound good?"

"Sure, no skin off my nose."

Isn't software engineering a wonderful field to be in?

JustSomeNobody · on March 18, 2019

Except that's not really how it works

Bob to team: "What should happen when the two AoA sensors disagree?"

Team: "We should alert the pilot"

Manager: "We can't alert the pilot because the manual will need to change. What if we only use one sensor?"

Bob: SMH

Team: "That's not a good idea. We need redundancy."

Manger: "Well, we're not alerting the pilot. Use the one sensor."

Bob: Writes the code to use only one sensor.

TheHypnotist · on March 18, 2019

This shouldn't happen in a vacuum. Who's writing the requirement? The test case?

sunnyP · on March 17, 2019

AoA = Angle of Attack https://en.wikipedia.org/wiki/Angle_of_attack

credit_guy · on March 18, 2019

Just in case anyone is wondering why more efficient engines are bigger: the energy is quadratic in speed (mv^2/2) while the momentum is linear (mv). For a given amount of energy (which comes from burning fuel) you can choose to push the airplane forward by pushing air back in 2 ways: 1. less mass, more speed, 2. more mass, less speed. It turns out 2 is better, for example you can push 4 times as much mass for half the speed, which results in twice the momentum of the air pushed backwards. Now the amount of air you can push is the amount of air you can get, and that's proportional with the front area of the engine. So, you always want to have as large an engine as possible. Bonus: the larger the engine, the slower the air moves through it, and so the less noise it produces. When you read that engines have become both more efficient and more quiet over the years, the second part was just a nice side-effect of the first.

Gibbon1 · on March 18, 2019

This is one of the reasons I don't poo poo hybrid electric aircraft. With electric you can drive two or more fans off one turbine. Which allows you to increase the bypass ratio. As you mentioned the gains from that are quadratic where the efficiency penalty is linear.

Notable is using larger diameter high bypass ratio engines is what lead to the 373-MAX design compromises.

abalone · on March 18, 2019

Sure, it’s a system failure not strictly a software failure, but I don’t think the Boeing software engineers are off the hook here. Software is where the whole system comes together. Software is what can mitigate sensor failures. Software is the top of the stack that gets certified for reliability.

A good safety culture will not have even a whiff of a “not my job” attitude. The software team should never have signed off if they noticed that a single sensor failure could cause their “correct to spec” program to crash the darn plane (if that’s indeed what happened).

autopilotsw · on March 25, 2019

I think when people say software error it is in a general sense. It mean the problem is in the software as opposed to hardware. The are different types of software errors. A software error can be a coding error or a bad requirement. In the case the requirements are the issue. In my career we had safety, systems and software engineers. An experienced software engineer might have challenged the requirement in this case but the design safety would fall more on the system and safety engineers.

djvu9 · on March 17, 2019

I think a lot of the discussions are missing the point. The mcas system itself is indeed just a duct tape for a known design defect, ie using a new engine on an old body. It is like you replace a part in your car, find it over heating, and put an ice bag on it. The planned software “fix” is something like changing the volume of ice. I think it is a dead end and it is scary.

macspoofing · on March 17, 2019

>ie using a new engine on an old body.

Would you ever be surprised if an old car got brand new tires? No? Then why do you find it so surprising that engine manufacturers would build new engines for existing airliner designs?

sundvor · on March 18, 2019

That's more like fitting oversized wheels / tires that will rub into the well / body every time you hit some proper bumps. Sooner or later they will fail, spectacularly.

macspoofing · on March 18, 2019

Obviously the analogy breaks down once you start unpacking it.

Question to you though, what makes you so sure that this is in fact what happened here?

molotovbliss · on March 17, 2019

Tires, and a powerplant are two different domains. There's lots more at play with structure & what other parts can withstand. Resonance, materials, & aerodynamics all play a factor in the design process. Combine it with flying through the air in a seat instead of the ground, makes it all even more a factor.

Ever see Mazda Miatas with a Chevy LS motor? Kits are sold to adapt, especially now everything is drive by wire / software.

macspoofing · on March 18, 2019

I understand your point, but this is the reality of modern aviation. New engines are released for existing airframes. In this case though, it wasn't an existing airframe. This was a new model built around new engines. Where the analogy starts to break down is that unlike putting new tires on, any face-lift that is done to an airliner is backed by testing and and under the watchful eye of multitudes of regulators.

djvu9 · on March 18, 2019

I would certainly be surprised if after installing the new tires you are required to install a new system to compensate the brake or you car may crash. The MCAS system is a clear indicator that the airframe and the engine are not compatible and yet they decided to do so for profit.

aphextron · on March 17, 2019

All of this stems from a pointy-haired marketing decision to push the MAX as an upgrade to current technology requiring no new training for airlines. If they had made the ethical decision to seek a new type rating and force every pilot to be trained in the MCAS system, 400 people would still be alive. Those executives have blood on their hands, and they know it.

Xixi · on March 17, 2019

The ET pilots were trained with the MCAS system.

But to speculate: ET pilots seem to have experience issues before the MCAS would be active, so it's not unlikely that there was another issue with the plane, compounded by the MCAS kicking in when pilots where already fighting the aircraft...

linuxftw · on March 17, 2019

No new information here, just another person pretending to be some authoritative source on this.

There's 0 proof the software worked correctly or that it's fit for purpose whatsoever.

CodeSheikh · on March 17, 2019

Fixing an "aerodynamic" problem with a "software" solution is already cutting over to a different problem domain and it will lead to unforeseen circumstances. What can go wrong, will go wrong.

People at Boeing who made decisions for this project whether it is a team lead or a test lead or a project manager or a sales exec or a CEO; are all equally blamed for this. These deaths are on their conscience.

macspoofing · on March 17, 2019

>Fixing an "aerodynamic" problem with a "software" solution is already cutting over to a different problem domain and it will lead to unforeseen circumstances.

I have a hard time parsing this. A modern airliner is a conglomerate of physical aerodynamic design, electronics and software. I am not convinced that something like MCAS is so out of the norm from modern aviation design principles.

>People at Boeing who made decisions for this project whether it is a team lead or a test lead or a project manager or a sales exec or a CEO; are all equally blamed for this.

Maybe. Or maybe there is no actual underlying problem. Or maybe the problem has nothing to do with the MCAS system. Let's wait a little and see how it plays out.

mannykannot · on March 18, 2019

For your own peace of mind, I suggest that you look no further into what modern airliners have between the pilots and the control surfaces.

gnulinux · on March 17, 2019

What's the reason people write long stuff like this on Twitter? Literally unreadable.

ksajadi · on March 17, 2019

With this line of argument pretty much nothing is a software issue since software is mostly there to compensate for something else: speed, errors, efficiency, manual labour, etc?

Highlighting the facts behind the design decisions of 737Max 8 is good for general knowledge but doesn’t help with much else in this context.

To follow this line of argument, I’d claim that this is the fault of old airports that didn’t have jetways so 737 had to be designed with lower body to allow folding stairways and so on...

D_Alex · on March 18, 2019

Yes... and furthermore: it seems that a key problem was:

> MCAS can make huge nose down changes

This, to me, is really odd. All the hardware changes could not, AFAICT, require that. It seems really dumb that the MCAS system was made to be capable, in principle, of completely overpowering pilot input.

Is it a software problem...? Well, if the MCAS was limited to making only small changes in the stabiliser position, that could be counteracted by pilot's input to the elevators, these accidents would not have happened. AFAICT. It does seem that software contributed to the accidents.

newscracker · on March 17, 2019

I read the entire thread, but the summary is that this is a harsh indictment of Boeing, its handling of this aircraft and the accidents. It describes Boeing as cutting corners in many places, and makes it seem like what has happened was inevitable (in retrospect).

RandomTisk · on March 17, 2019

We don't have the official word on what happened yet.

gabrielblack · on March 17, 2019

I'm a frequent flier and I'm scared. I try avoid companies with low standards reading all the news about incidents related to the use of used or counterfeited parts, lack of maintenance, etc. But now it's clear that, in times of low cost companies, cheap airplane are requested and even the redundancy in critical subsystems is sacrificed both by the producer and the flight company that didn't pay for a "optional" that actually is a lifesaver. How many critical subsystem haven't redundancy to reduce the costs of airplanes ? Maybe some regulation in this market is needed to avoid other disaster like this one, imposing standard for the critical system and denying the routes to the airplanes that do not meet the specifications. I don't think that the market can play with human lives.

7952 · on March 17, 2019

This kind of compromise is made all the time. In the past you needed 4 engines to cross oceans, now it is normal to just have two. It saves lots of money, and has been a massive success in terms of safety. The industry generally seems to get these compromises right when you look at the amazing safety record.