Boeing changing Max software to use two computers

redis_mlc · on Aug 4, 2019

>“Only one computer was used in the past, because Boeing was able to prove statistically that its system was reliable, the person said.”

Yeah, I have a real problem with that.

A priori of a computer failure, you may have statistical reliability. But once it actually fails, the failure rate is 100% and a backup computer is needed.

Also, external AoA sensors fail all the time (like when a bird hits them), so the failure probability is very high. So we're not just talking about solid-state computer reliability here.

What also concerns me is that airliners fly at transsonic speeds. How would the AoA calculations be affected when the airplane is in a slip for some reason, and the AoA sensors are reading different values? What happens when MCAS goes berserk at high speeds (ie. near Mach 1.)

If MCAS wasn't even tested on takeoff, it sure as hell wasn't tested at Mach 0.8.

If the stakes are not clear ... besides the loss of life, an airliner crash often results in the bankruptcy of the entire carrier.

Source: commercially-rated airplane pilot who's studied high-speed aerodynamics.

varjag · on Aug 4, 2019

As an embedded systems developer, I have developed mistrust to statistical reliability figures. These days you quite often come across components and subassemblies with MTBF stating (sometimes tens of) millions of hours. When you try to clarify with the vendor how they arrive at these figures there is nothing but general handwaving about using industry tools and statistical models.

But sorry, just can't seriously take it on faith that your product is going to last 11 centuries at 20C.

Quequau · on Aug 4, 2019

Yeah there's a few degrees of separation between the people who wrote the claims (and have to defend/support them) and the people that did the testing... and then substantially more between those that did the testing and those who did the original work showing that the statistical proofs on accelerated ageing were reasonable things to depend on.

By the time it gets to the marketplace nobody knows anything and we're all acting like the monkeys in the story with banana and the ice-water spray.

Hunisgung · on Aug 4, 2019

A couple of years ago, I was working on the verification team for a (non-critical) avionics system for a large commercial aircraft. I noticed one of the components seemed to have more failures than expected.

I reported the issue and the systems architect waved it off arguing that his analysis of the supplier data had proven that the MTBF for this component was over 500 thousand hours.

Few months later, a task force was established by the project management to look for an alternate supplier for this component after it became obvious that its failure rate was significantly higher than advertised.

jhayward · on Aug 4, 2019

Much, perhaps most, of the effect we have as individuals in large organizations is indirect and hard to see or quantify. It's not always as simple as we think and changing course of a group almost always takes time, education, repetition, and some redundant effort. It's still worthwhile and effective.

It is entirely possible that this is a success story of a lower-level engineer raising an issue that the higher levels then addressed.

Just because no one immediately jumps and say "by golly, we have to get rid of component X right now!" doesn't mean that the issue didn't begin percolating vertically and horizontally, ultimately resulting in a large-scale corrective effort.

treypitt · on Aug 5, 2019

in my experience this is by far the less likely scenario, but I would like to hear from OP whether his initial work contributed to the ultimate decision

Hunisgung · on Aug 16, 2019

For more context, I was 99% sure they there was an issue with the component and that a hardware rework would have to be made sooner or later. The systems architect had some seniority over me but I could have pushed the issue hard enough for it to be addressed.

Then I did not take the high road. The guy was very obnoxious and I had other issues to address. So I let it slide to maintain my "this is important" quota for other matters.

And a couple of months later, when some units failed on the test aircrafts (I insist on the fact that this was not a critical system), I had an issue report showing that the issue was observed during supplier-side tests and then ignored.

jacquesm · on Aug 4, 2019

Any MTBF longer than that particular industry has existed should be suspect or invalidated absent an existence proof (which will take some time...).

londons_explore · on Aug 5, 2019

I really wouldn't be surprised if some components don't last that long...

Take a regular ceramic resistor. I'd bet that if you kept it in a temperature controlled sealed box, it would last 2000 years.

I would possibly even reckon it might last 200,000 years.

Ceramics are so so so inert, and as are metals as long as they're away from oxygen.

varjag · on Aug 5, 2019

The 11 century MTBF example was a complete SoM.

crocal · on Aug 4, 2019

Statistics are used all the time in safety critical designs. But, this is key, they are then balanced with the outcome expected due to each statistical failure. The more serious the outcome, the less the tolerated statistical failure.

In the case of MCAS, from what we have read so far, it seems this analysis was biased toward a non-catastrophic outcome (e.g. manageable through pilot action). This may explain why the statistical failure of a single computer was deemed sufficient. Now, assessors have upped the stakes, and the stats do not work anymore.

This indeed reveals a very serious problem in the RAMS process at Boeing, and exceed the scope of a « simple » software fix.

heisenbit · on Aug 4, 2019

You were only mentioning the problems with MCAS when it comes to sensors and decision making. As bad as that is the fact that the MCAS decisions are de-facto locked in by spinning the trim wheel. Undoing those revolutions manually - after locking MCAS out - takes a while. A while that is not available in the flight states MCAS is most likely to act.

rwc · on Aug 4, 2019

MCAS only active at critical phases of flight like takeoff. At cruise, MCAS would not be active.

_ph_ · on Aug 4, 2019

You have it the wrong way around. MCAS is not active while flaps are extended, so not during starts and landings. It is also not enabled while the autopilot is active, as MCAS isn't balancing the airplane but about giving the right rudder feedback to the pilot when manually piloting the airplane. The accidents happened in normal operations after the flaps were retracted.

nemild · on Aug 4, 2019

From the article:

>“Only one computer was used in the past, because Boeing was able to prove statistically that its system was reliable, the person said.”

Years back, my father co-wrote a paper on the Space Shuttle Challenger showing issues with the statistical thinking at NASA:

http://www.math.montana.edu/shancock/courses/stat401/Dalal_e...

I wonder what statistical techniques Boeing used, and how defensible those techniques were.

danjayh · on Aug 4, 2019

For those who are outside of the industry, the article is probably not completely accurate. I don't know specifically about the 737 MAX, but for many of their other newer airframes (787, upcoming 777x), Boeing relies on a concept called 'high integrity at the source'. Essentially, two complete copies of the flight computer hardware are put on a single card and they cross-compare their results. If you're looking for a bit of dense reading material on the subject, you might find a related patent application interesting: https://patents.google.com/patent/US9170907

makomk · on Aug 4, 2019

As described, that provides zero protection against software bugs. Both of the redundant lanes are carrying out identical computations on identical data using identical code and will make identical errors if there's any bug. On paper it's more powerful than the non-synchronized system Airbus uses in that it can stop erroneous computations from being used at all, rather than detecting them after the fact, but it wouldn't be able to detect problems like the Qantas Flight 72 accident in which erroneous data with a particular timing happens to trip a latent bug.

calaphos · on Aug 4, 2019

In Airbus case, who have been doing full fly by wire for a while now, there are at least two completly seperate software implementations which run in parallel and cross compare the result. They also run on redundant flight computers with different hardware architectures.

Boeing probably has a similar thing for the fly by wire fighter jets they are involved in but there passenger planes are still mainly directly controlled by the pilot.

Some material: https://www.researchgate.net/publication/26587285_Challenges... https://www.researchgate.net/publication/220845884_Approache...

JorgeGT · on Aug 4, 2019

One of my professors at uni was involved in the flight computers of the Eurofighter and tells the same thing: different teams were given identical specifications but contact between them was forbidden so they were forced to develop completely different implementations in order to avoid shared bugs that could affect all computers at once.

CriticalCathed · on Aug 4, 2019

What's the upside of two identical computers computing the same input?

I can understand a backup if the first fails, but why two identical systems contemporaneously computing?

DennisP · on Aug 4, 2019

NPR just did a piece on cosmic rays, saying they cause hardware faults way more often than people realize. In one case they switched off a passenger jet's autopilot. They're also getting blamed for the Toyota unintended accelerations.

They said it's common now in critical systems to use three computers and ignore any single computer that disagrees with the other two.

makomk · on Aug 4, 2019

Protects against certain kinds of transient hardware faults that are common enough to worry about in safety-critical systems, I think.

Aardwolf · on Aug 4, 2019

How does it know which of the two is correct?

ethbro · on Aug 4, 2019

See above comment. With 2, you don't. With 3, you do.

But if there's a human in the loop and a manual alternate control pathway, detecting a disagreement allows you to cue the manual operator and transfer control to them. Or fall back to a much simpler system of computer aid.

With 1, hardware failures are extremely hard to detect at all, as even your computational checks for internal consistency are subject to mutation.

noir_lord · on Aug 4, 2019

> See above comment. With 2, you don't. With 3, you do.

Unless all 3 different give different results, two failures and one correct.

IIRC the shuttle had a 3+1 system 3 as a cohort with voting and if they couldn't reach consensus the 1 was a minimal system that could keep the lights on.

danjayh · on Aug 12, 2019

You don't need to. You just need to know that the module as a whole has a fault. Reboot the module and let the hot spare take over (all critical functions have a hot spare).

treypitt · on Aug 5, 2019

This also happened with cars during the Toyota Prius scandal. For a thorough and entertaining treatment, I recommend RadioLab's "Bit Flip" episode https://www.npr.org/podcasts/452538884/radiolab

kuzehanka · on Aug 4, 2019

Statistics is borderline pseudoscience. Various techniques and methodologies are rigorous, but when applying them to complex real world problems, there is a necessary element of human interpretation. With that interpretation it's trivial to twist most scenarios towards either outcome.

I spent several years doing stats/BI in the medical manufacturing industry and have on many occasions done analyses that showed what my employer wanted and would stand up to scrutiny, but if my employer wanted the opposite outcome of that analysis, that would also be possible and also stand up to scrutiny.

The best way to address this is to remove any financial/business incentives form the analysis, but that's not really feasible most of the time and also requires larger/more expensive teams that are capable of internally challenging their own work.

https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statist...

ethbro · on Aug 4, 2019

I came across this on Friday: https://medium.com/ssense-tech/building-the-data-science-dre...

Despite the title, it's mostly an essay on why visualization and presentation are critical to any statistical treatment.

I think it's one of those difficult skill combinations. You want someone extremely well-versed in statistics, plus someone capable of giving a damn about accurate presentation, plus someone with the UX intuition to make the correct choices to make something readable without sacrificing accuracy and important nuances. And if that "someone" is actually two people, you run the risk of telephone misunderstandings as work passes between them.

Which is a tall order.

Made taller by the (as you note) different kinds of audiences: simply ignorant, aggressively biased, rushed, etc.

voldacar · on Aug 4, 2019

"Prove statistically" is such an odd phrase. I wonder if they used fuzzing or something like that? Because even so that is quite far from a formal software proof

jwilliams · on Aug 4, 2019

It's relatively common in embedded programming (in my experience). A lot of real-time programming is around scheduling. You certainly can use formal proofs there too, but statistical methods would be common at significant scale.

_ofdw · on Aug 4, 2019

It's quite common in aerospace certification to cite systems having proven records of an acceptably-low number of failures per flight hour or similar metric.

I suspect they're using that sense of the word "proven", as opposed to a formal software proof.

pdpi · on Aug 4, 2019

Don’t know what’s going on with their certification process here but probabilistic reasoning is very common, and very reasonable, eg in cryptography.

Randor · on Aug 4, 2019

Many years ago I wrote navigation software for ocean going vessels/ships. We used double and triple redundancy on many of our sensor types. We generally used three control computers that would 'vote' before deciding to make vessel navigation changes. We also always included a "Dead man's switch" that allowed the bridge crew to take control at any time.

I can't even imagine designing aerial vehicle autopilot without redundency. The stakes are too high...

I would be interested in having a look at the statistical model they used to prove 'the system was reliable with zero-redundancy'. While designing these systems for ships the only way we were able to get an error probability near zero was when we used triple redundancy.

rkagerer · on Aug 4, 2019

How did you deal with redundancy "choke points"? eg. What if the component that tallies computer votes and actuates things based on the results fails (especially in a way that's hard to detect)?

Were you able in some cases to maintain isolation all the way through from sensors to actuators and design such that a single failed one (in a worst case failure mode) could be overcome by the rest?

fit2rule · on Aug 4, 2019

(Disclaimer: SIL-4 programmer for safety critical rail transportation applications)

The way this is done is that 2 of 3 computers need to 'agree' on the final decision in order for it to be considered the correct one - there isn't a single point of assessment, but rather a consensus that must be formed from the results of all 3 computers. Ideal case, all 3 produce the same results.

This has worked successfully for decades. What's changing now, is that those 3 computers are now no longer the same architecture - you'll have a PPC and an x86 and an ARM-based CPU all attempting to agree to the same data, in order to prevent systemic failure throughout.

crocal · on Aug 4, 2019

(Same disclaimer)

> What's changing now, is that those 3 computers are now no longer the same architecture

I think / hope this idea will die a well deserved death. Rail systems must face 25 years lifetime and with such design obsolescence headaches are multiplied by 3. In addition this creates bugs and integration nightmares.

And on top of all that, we have known for decades that CPUs can be protected against systemic failures using the vital coded processor technique [1].

Note to the curious: this is one of the most fascinating piece of software I have ever encountered. Make the software resilient to any hardware fault through the power of arithmetic...

[1] https://www.semanticscholar.org/paper/Vital-software%3A-Form...

fit2rule · on Aug 4, 2019

SIL-4 doesn't generally follow the same rules as consumer computing, as I'm sure you are aware.

There are track-side systems still running on 80386's, and a maintenance infrastructure in place to keep these systems running for at least a few more years, before they are replaced with the newer, Pentium-based systems.

Design obsolescence is not built-in for these systems. Long-term ability to support is more important. Also, burn-in period. Many bullets (SPECTRE, etc.) have been dodged by relying on older, more proven, more tested technologies ..

crocal · on Aug 4, 2019

Design obsolescence is absolutely built-in for railway systems (safety related or not). It’s part of the fun of the profession to deal with bizarre voltages that are inherited from the 1930s. For sure it’s not something consumer electronics care for but that’s precisely the point I am making.

Actually, it’s one of the required extension to ISO 9002 by IRIS, the quality framework for rail engineers.

Components are chosen not only for their function but also for their supply availability in the long run. One of the things we look at is « multisourcing »: how many different folks can provide this stuff. The more the better.

Going back to this great idea about hardware diversity (that is absolutely not a general rule in railway): with an architecture requiring 3 different hardwares, you essentially shoot yourself in the foot. Instead of having 3 suppliers for one component, now you need to find at least 6 for 3 components. And the probability to face an obsolescence issue has basically tripled...

And for what? It’s proven useless by science... :b

And as far as dodging bullets, railway is no better than SCADA. Older tech means older vulnerabilities stay in place, and there are countless people on HN that will tell you it’s /not/ a good thing. I don’t think we dodged, I think we were just lucky. There is a reason why now cybersecurity is making its way inside the last revision of standards.

fit2rule · on Aug 4, 2019

> hardware diversity (that is absolutely not a general rule in railway)

I don't know where you're working in the industry, but in my company (THALES) its definitely a thing, and we are absolutely working on diversifying the 3-of-3 and 3-of-5 configurations away from Intel.

And maybe we're talking about design obsolescence in different terms, but yeah .. 30-year old CPU's are still being shipped to customers, yo. They WANT it, so.

crocal · on Aug 5, 2019

You can design your platform in a number of ways to be relatively independant of the underlying CPU model, thereby mitigating the risk of supplier lock-in. All suppliers will try to find a way to achieve such target.

It’s a different thing than saying you need hardware diversity in a majority vote system to achieve better safety. That is demonstrably false. For example, Siemens VCP is /proven/ to be safe and it does not even use a majority vote (see my previous comment for a reference to the VCP)

I prefer not to involve my employers on HN. What I write here is my opinion only.

As for your last remark, let’s be careful in assuming what customers want. The fact that they have to live with obsolete stuff does not necessarily mean they are super happy about it.

fit2rule · on Aug 9, 2019

Its not about the software being independent of the architecture. Its about using diverse hardware platforms to avoid the situation where an un-detected, hardware-level bug affects the voting ability of all participants. We've seen, time and time again, so-called dependable platforms weaken over the years as more and more issues are uncovered.

Diverse voting node architecture requirements are designed to prevent hardware bugs from crashing trains, not software bugs.

>Siemens VCP /proven/

.. and yet, it still crashed trains.

>What customers want

Its not obsolete if a customer wants it.

Customers want older CPU platforms because the tooling and industry required to support them in the field is long-entrenched, and costs of upgrading to "newer, sexier CPU's", not really worth it if the lesser platforms are capable of doing the job...

crocal · on Aug 11, 2019

> .. and yet, it still crashed trains.

Do you have evidence for that claim?

rrss · on Aug 4, 2019

> Many bullets (SPECTRE, etc.) have been dodged

Rail control systems run untrusted code? Why not dodge Spectre by not running random code in the control system?

fit2rule · on Aug 4, 2019

They don't run random code, but nevertheless one of the reasons why SIL-4 systems don't just go for the 'latest and greatest' is because, even with trusted code, one can not be sure until a great deal of industry-wide testing, that there aren't bugs in the hardware.

WWLink · on Aug 4, 2019

What would be cool is if the software and hardware design is done by 3 completely different teams. Hehe.

crocal · on Aug 4, 2019

Tried already! It’s a disaster. Nancy Leveson & friends wrote articles about this. Yet, there are still people today convinced diverse programming is a good idea... Sigh.

https://dblp.org/rec/conf/icse/JaffeL01

yborg · on Aug 4, 2019

>1990 study

>student programmers

>toy problem

To me, this proves very little about how independent development teams would perform with experienced engineers working with present day design tools. It's possible that the results would be the same, worse, or better, but I hope this isn't the most recent research that's been done on this.

The main issue with this approach is of course - cost. So the question is whether having two systems with uncorrelated fault paths is worth doubling the development expense.

jimktrains2 · on Aug 4, 2019

Do you have a link to a readable copy?

crocal · on Aug 4, 2019

Whoops. Wrong article. Apologies. Blame my big fingers on this phone. The article is:

« Analysis of Faults in an N-Version Software Experiment." Susan S. Brilliant, John C. Knight, Nancy G. Leveson (1990)

And I found a PDF version here: http://sunnyday.mit.edu/papers/nver2.pdf

noir_lord · on Aug 4, 2019

Out of curiously, would you recommend any good journals, magazines or books for people in your industry (grokkable by a general level programmer ideally but not that bothered if not).

I find all the different domains of programming endlessly facinating.

fit2rule · on Aug 4, 2019

Alas, its a very academic and dry field in terms of interesting literature, but you can definitely learn a lot from a few books ..

Safety Critical Systems Handbook:

https://www.elsevier.com/books/the-safety-critical-systems-h...

IEC 61508 Compliance Guide:

https://www.perforce.com/resources/qac/how-comply-iec-61508-...

ISA Safety Series:

https://www.isa.org/standards-and-publications/isa-publicati...

noir_lord · on Aug 5, 2019

These look interesting, Thank you!

crocal · on Aug 4, 2019

Voters are redundant too. Actuators receive safety release authorizations from all of them and will activate as long as one valid authorization is received.

rkagerer · on Aug 4, 2019

Cool! But at some point doesn't it all feed down to one wire connected to a motor coil or something?

crocal · on Aug 4, 2019

Wires are also redundant ;). Then the actuator is usually a fail safe device such as relays.

adreamingsoul · on Aug 4, 2019

I recently watched a documentary about a diving ship that had a tripple redundancy positioning system that failed during a dive. It was fixed with a hard reboot.

inamberclad · on Aug 4, 2019

It wasn't an autopilot that failed. This was just supposed to be an accessory system.

qaq · on Aug 4, 2019

This sounds pretty scary a major rewrite on tight schedule under huge pressure.

semerda · on Aug 4, 2019

“Only one computer was used in the past, because Boeing was able to prove statistically that its system was reliable, the person said.” — here boss no need for redundancy coz we write amazing code that’s statistically never going to fail! What were they thinking?!

There is no system 100% efficient. Even a small error rate over the standard ~20yr service life isn’t a guarantee a failure won’t occur. Considering the max are refurbished 737s, wouldn’t more precaution be taken?

alkonaut · on Aug 4, 2019

The question is whether is fails safely. The design likely considered failure to not include near-inevitable crashes, but rather some manual intervention. In that case N failures per X thousand flight hours is nothing unusual. The change isn’t to get zero failures but to get a system that fails safely.

Compare to a Tesla autopilot: it can work 99% of the time, if at 1% of the time it slows down and gives up. It can’t crash into oncoming traffic 1% of the time. The failure mode matters.

tatersolid · on Aug 4, 2019

> Considering the max are refurbished 737s, wouldn’t more precaution be taken?

Correction needed: every part in a 737-MAX is new; they are not refurbished older 737s. The MAX are supposed to be brand new “fly-alike” planes that makes replacing older 737 models with something more efficient easier for the airline.

josemanuel · on Aug 4, 2019

I don't understand... so the crashes were due to faulty single sensor. Adding computer redundancy won't fix single sensor screw up. Will they retrofit all planes with dual sensors?! I really don't want to fly on this airplane. This whole thing just sounds like PR trying to sell the idea that this plane will be fit to fly when it will clearly come back into active with a heap of flaws...

uniqueid · on Aug 4, 2019

I know little about Boeing, but I'm sure the amount of money at stake, if they have a serious mechanical issue, would be staggering. I don't want to fly on a 737 Max. I worry they have too much incentive to conclude every root cause is software.

ethbro · on Aug 4, 2019

I look at it like x86 / x64 processors.

There are always hardware errata. But they're patched and papered over with firmware.

Essentially, I'm operating a system, and if you can guarantee the system functions as documented -- I don't really care how the sausage gets made.

What I would care about is if a pilot (in this metaphor, my compiler, I guess?) doesn't feel comfortable operating the system.

And I think the pilots' unions have been interesting in this. Because they don't really want to slag their employers (the airlines), but they're willing to push back hard on Boeing for essentially the same issue.

A lack of transparency about and sufficient training in differences.

uniqueid · on Aug 4, 2019

    > Essentially, I'm operating a system, and if you can 
    > guarantee the system functions as documented -- I 
    > don't really care how the sausage gets made.

If a software fix for a hardware issue works well enough, I'm with you 100%.

The thing about the incentive, though, is that a software fix ticks the "we did something" box, as long as it improves the issue to any degree. That it does any good at all, combined with the lower cost, can give management (or ass-covering employees) enough substance to rationalize the decision away.

I doubt anyone at Boeing would willingly risk customer lives. But if they are working under pressure, and can say "well, we did do something to address the issue," that can lead to bad decisions.

cesarb · on Aug 4, 2019

> Adding computer redundancy won't fix single sensor screw up. Will they retrofit all planes with dual sensors?

There are already two sensors. There are also already two computers. Each computer is already connected to one (and only one) of the sensors. Therefore, making the software use both computers at the same time allows using both sensors at the same time, with no hardware changes.

Filligree · on Aug 4, 2019

AIUI, all the planes already have dual sensors. So they wrote the software to use one at a time, alternating at each take-off...

dTal · on Aug 4, 2019

Of all the broken design decisions in the MAX, this was the one that elicited an outraged "what?!?" from me. In what possible way could this be a good idea?

keymone · on Aug 4, 2019

A tiny additional detail - sensor alternates not because of some arbitrary adhoc rule but because aircraft always uses sensor on main pilot’s side and main pilot’s seat alternates each flight.

Still boggles my mind how utterly incompetent and borderline malicious everybody involved in designing and implementing this system are. I wish we’d see some people jailed and Boeing hit with billions of dollars fines but chances are slim.

dTal · on Aug 4, 2019

I am aware of that detail. But why? Why does the aircraft always use the sensor on the main pilot's side? That itself seems like an arbitrary, ad-hoc rule. What does global AOA state data have to do with where the pilot's ass is currently located? Why even have two sensors at all, if you only use one at a time? Given you have two sensors, why not fuse them? Any fusion logic is more intelligent than what they did.

I'm sure each decision seemed logical to the person making it. But from a bird's eye view, it gives the impression of poor coordination - Team A [hardware] thinks "clearly we need multiple AOA sensors for redundancy". Team B [software] thinks "AOA isn't all that important, we barely use it and failure is non-critical. Sensor fusion is hard, let's go shopping!".

keymone · on Aug 4, 2019

Yeah, I agree, it’s still arbitrary, just on different level.

dTal · on Aug 4, 2019

I think it's the same level. There's no evident reason why "pilot's side" is any less arbitrary than "flip a coin".

Was there actually an a-priori specification that AOA sensors must always be on the pilot's side? If there is such a specification, I can't find it - Google searches for stuff like 'AOA sensors pilots side' return mostly articles about the MAX. Nor can I see a good reason for it that is concordant with the existence of paid extra feature of a disagreement light - no particular significance or weight is given to the pilot's side sensor.

It sure feels like this is a post-hoc rationalization for only writing software to handle one AOA vane at a time despite being equipped with two.

keymone · on Aug 4, 2019

My speculation is that there is a class of instruments that get prioritized depending on main pilot’s side. It’s easy to assume there is a valid reason for that (ability to override and whatnot). Why would a sensor of which there are two, fall into that category - is a puzzle.

ethbro · on Aug 4, 2019

There were already two sensors. The software just wasn't configured to use both of them.

I believe that's been changed too (see: AOA sensor).

This article is about flight computers, which are the machines that consume sensor input.

chaz6 · on Aug 4, 2019

There is a lesser known failure mode caused by cosmic rays which planes are exposed to the higher they fly. This causes bit flipping, and can cause a critical failure in a single computer.

AnssiH · on Aug 4, 2019

A Seattle Times article from a few days ago seems to contain more detail: https://www.seattletimes.com/business/boeing-aerospace/newly...

It includes e.g. a description of FAA bit-flip testing to induce incorrect system behavior that Boeing intends to solve by using two computers.

danjayh · on Aug 4, 2019

"The fault occurs when bits inside the microprocessor are randomly flipped from 0 to 1 or vice versa. This is a known phenomenon that can happen due to cosmic rays striking the circuitry."

This is called Single Event Upset. For all those of you that aren't in the industry, essentially the problem is that any bit inside the flight computer (RAM, cache, NVM, registers - ANY bit) can change state randomly at ANY time. It's rare, but when you get into millions of flight hours, it WILL happen. The software and hardware have to be designed to mitigate problems caused by this behavior.

GistNoesis · on Aug 4, 2019

If this is due to cosmic rays, doesn't flying over the poles make it more likely that such event happens ?

How are we sure there aren't local bursts of cosmic rays that would make suddenly a few of those Single Event Upset, for example when there are some high likelihood of seeing northern lights ?

How do you test your hardware and software to show that you are indeed cosmic-ray proof ?

fit2rule · on Aug 4, 2019

You test the living crap out of it, and not just in the lab on the workbench but also in operation while online - while the thing is running in operation, it is also consistently testing itself to ensure that the hardware is performing as expected.

Online software tests check for cosmic ray bit flips about 1000 times a second, in addition to whatever hardware mechanisms are in place to detect this (ECC, etc.) This is a standard module in most SIL-4 applications, where 2 of 3 consensus model is being used.

What I don't understand is why Boeing aren't using 2-of-3 computer architecture in this application - or maybe they are, and the '3 voting units' are considered to be 'one computer' and they've just added another one to be sure.

In rail transportation systems, this is taken even further by using 2-of-3 configurations where each computer is a different architecture completely ..

ypcx · on Aug 4, 2019

> Boeing changing Max software to use two computers

Does that mean we will find the MCAS in the 737 MAX 8 MMEL (Master Minimum Equipment List)? (Used to be here but right now it's not loading: http://fsims.faa.gov/wdocs/mmel/b-737-8_rev%200.pdf)

Isn't this the single biggest failure point of the whole affair? Adding a critical component which can bring the plane down (as it has happened twice) but not making it redundant?

One can perhaps argue (pardon the roughness of my analogy) that if pilots aren't trained to not open the door during the flight, and then do open the door during the flight, then it's a training issue.

But this analogy doesn't apply, because while the door is not motorized and cannot open by itself, the MCAS is a computer system which can fail (not just because of a faulty AoA sensor, but e.g. because of a chip failure or a software bug) and then actuate the trim into a dive where the pilots aren't able to fix the issue manually anymore due to high wind pressure.

redis_mlc · on Aug 5, 2019

How the MMEL (FAA) and MEL (operator model-specific version) works is that anything NOT required for safe flight is listed in the MEL. Otherwise everything would have to be 100% working.

So I would not expect MCAS to be listed in the MEL since it's required for safe flight.

We'll see.

ypcx · on Aug 10, 2019

Oh, I've gotten the MMEL definition wrong then (opposite). Thanks for clarifying!

teh_infallible · on Aug 4, 2019

Keep talking about the software. Make it about the software, so no one knows the airframe is fundamentally flawed..

cameldrv · on Aug 4, 2019

Fundamentally flawed is too strong. The airframe has some undesirable handling characteristics without augmentation. So does every jet airliner —- they all have yaw dampers at least. Aircraft designers have been solving airframe stability problems with mechanical, hydraulic, analog electronic and fly by wire systems since at least the thirties. The problem is that MCAS doesn’t fail safe. In a perfect would they would have clean sheeted the MAX, but there are numerous ways they could have made it safe, they just didn’t do any of them successfully.

salawat · on Aug 4, 2019

Fundamentally flawed is absolutely an accurate moniker.

Yaw dampers are workload mitigation devices; not compensatory for design flaws. Dutch roll is inherent to any controllable aircraft. Violation of 25.173 in the form of decreasing stick force response curves on approach to stall is absolutely not the same beast.

Design criteria are prescriptive, and I'm not aware of any line of regulation that says an uncertifiable airframe can be made certifiable by the addition of a computer. These handling characteristics are not undesirable, they are uncertifiable according to regulations as written, and intended.

Even Airbus meets these minimum design criteria at their minimum automation threshold. Boeing for some daft reason seems to be being written a pass to selectively ignore regulations around Civil Transport Aircraft certification based on some serious levels of sunk-cost fallacy.

I don't believe in supporting limp-wristed minimal compliance when it comes to safety critical engineering. Not going the route of designing out the behavior requires the implementation of system after system, after system to compensate, with each link in the chain adding additional risk for failure over and above what doing it right in the first place would have created.

Whitewashing wins no one but Boeing anything. Please remember that. We entrust these people to design and manufacture safer, more efficient, machines. Instead of living up to that mission, they jeopardized it in the name of optimizing the revenue/profit generating characteristics of the company.

I get the reluctance to throw away a potentially recoverable airframe, but I'm dead against letting anyone whitewash over the fact the plane is more dangerous than it needed to be because someone wanted better numbers on their held stock, and was willing to put a more dangerous product on the market in able to do it.

cameldrv · on Aug 6, 2019

The Dutch roll example is actually very similar from a regulatory standpoint, except that it concerns dynamic instead of static stability. FAR 25.181 requires positive damping, i.e. if you let go of the controls, the Dutch roll has to diminish on its own.

There has never been been an accident due to the airframe's pitch instability and MCAS not accomplishing its intended function. Both of the accidents happened because the MCAS system itself failed and activated when it shouldn't have.

If they had made a reliable MCAS (by which I mean the whole system from sensors to controls to actuation and pilot interface), there would have been no problem.

sundvor · on Aug 4, 2019

The engines are fundamentally unfit for the airframe.

windexh8er · on Aug 4, 2019

This. Given the failures, admissions and scrutiny the FAA is now under there should have been no option for Boeing to continue to band aid the airframe. Boeing should have been forced to pull the plane as "737" and recertify it as a new airframe from scratch. But why weren't they? My guess would be because Boeing would have to eat very large losses of dealing with customer lawsuits (bought an airframe that wasn't certified), would lose all future revenue as customers would cancel orders and buy airworthy competitor options, and Boeing would be forced to say they were wrong.

Regardless, this still seems like the right path (recertification). It seems the FAA is still being strong armed by Boeing. Why? Put these people in prison. Shut the 737 Max down. Make Boeing face the music. They need to fail to get the point, apparently.

joe_the_user · on Aug 4, 2019

Having designed the plane to be a "drop-in replacement" for a 737 and having failed at that, it seems like there's no path forward, neither more jury rigging nor certification as something else - because to make the thing like 737, they made it not like a contemporary aircraft (paper manuals is the big example).

Boeing have essentially painted themselves into a very costly corner. The FAA or congress could do a "certify it just for Boeing" action and the plane might even, maybe be safe then. But how many airline customers would want to fly in it?

_ph_ · on Aug 4, 2019

The 737 is a reliable airplane, as experience has shown. The crashes were caused by malfunction of the MCAS system, which is getting fixed. If that is done, and of course the whole certification procedure which let MCAS be certified reviewed, there is no reason to assume the 737 MAX is less safe than any other 737.

Yes, a replacement of the 737 is more than overdue, but it is also 10 years away. A rushed new design could be way more dangerous than the known design of the 737.

mrhappyunhappy · on Aug 4, 2019

I’m guessing doing all the things the right way would put Boeing out of business. US doesn’t want that.

spookware · on Aug 4, 2019

Agree, using software to disguise poorly engineered design. Thanks guys.

throwaway3627 · on Aug 4, 2019

Remember Al Jazeera's 2010 report on critical defective and substandard structural parts from subcontractor Ducommun? The full scope of potential risk is ALL aircraft Boeing has designed, built, reengineered or maintained in the past 30 years. It's unknown what's hidden, poorly-designed, fragile or likely to fail because of lax self-certification up-and-down the full breadth and depth of equipment and subsystems. There's no way to prove a particular new/recently maintained Boeing plane is safe because the whole regulatory system is broken due to regulatory capture stemming from cozy, military-industrial complex corruption.

I do hope Ralph Nader sues the pants off Boeing for killing his goddaughter due to their cheapness and shoddy products.

mobilefriendly · on Aug 4, 2019

Ralph Nader lost his grandniece.

kuzehanka · on Aug 4, 2019

Maybe HN needs a way to flag comments as factually inaccurate. This meme about the airframe being 'fundamentally flawed' or 'engines are too big(????)' or inherently unstable has been disproved by everyone in the industry including people invested in Airbus.

Please stop perpetuating evening time TV show fearmongering bullshit.

The eli5 explanation of the MAX 8 MCAS incident is as follows:

* Boeing wanted MAX 8 to have the 737 type rating. This means it must handle and fly just like other 737. This would allow existing 737 pilots to fly the MAX 8 with no additional training. This was part of Boeing's financial advantage.

* To achieve the efficiency it has, the MAX 8 has pretty large engines. Because they wanted to keep the airframe like other 737, they couldn't lift the wings to accommodate. Instead they moved the engines closer to the body.

* Moving the engines closer to the body increased the pitch-up characteristic at high AOA. Not so much as to make the plane unsafe (the 757 has a more aggressive pitch-up characteristic). But enough to make it sufficiently different from other 737 to invalidate the type rating.

* Boeing adds MCAS to augment the pitch-up and make it handle like other 737. They get their type rating.

* MCAS software validation happens with the assumptions that it's not a flight critical system, so constraints are relaxed. It passes.

* Flight testing reveals that MCAS needs significantly more pitch authority to do its job. It's updated to get that pitch authority. MCAS now becomes a flight critical system after these changes, but is not re-evaluated as one.

The airframe is fine. If Boeing gives up the 737 type rating and drops MCAS, they could have the fleet back in the air within a few weeks after pilots train for it. It just would cost their clients a lot more than originally promised and be a financial disaster.

https://www.youtube.com/watch?v=CD0JabYjF3A&t=

https://www.quora.com/Is-the-737-Max-supercritical-that-is-i...

dTal · on Aug 5, 2019

In every rerun of this topic on HN, I've heard two countervailing viewpoints:

1) The 737 MAX is a perfectly viable airplane without the MCAS, whose sole purpose is to not invalidate the type rating;

2) The 737 MAX has dangerous, uncertifiable flight characteristics without the MCAS; namely, the stick force gradient inverts at high angles of attack. One comment[1] in this thread has even linked the relevant legislation[2].

So, which is it?

[1] https://news.ycombinator.com/item?id=20607242 [2] https://www.law.cornell.edu/cfr/text/14/25.173

Zigurd · on Aug 4, 2019

This is just a bit circular.

The 737 MAX was designed so as to keep the type rating. The design would be different, and very likely fundamentally safer if keeping the 737 type rating were not a goal. A perverse incentive from the start. So I would not dismiss the claim that the 737 is "fundamentally flawed."

kuzehanka · on Aug 4, 2019

High AOA pitch-up is not a fundamental flaw and sure as shit isn't inherent instability of the airframe.

There's a million things that went wrong with the MAX 8 debacle, serious things that need to be addressed after the industry takes a hard look at itself. But there being something wrong with the airframe isn't amongst them. That meme has no grounds, isn't supported by anyone at all in the industry, and originated from clickbait spamblogs and late night tv shows that feed off FUD.

> The design would be different, and very likely fundamentally safer if keeping the 737 type rating were not a goal

Errrr. The design is may not be maximally optimal (what does that even mean?) because of type rating pressure, therefore the design is fundamentally flawed? Please stop. You're perpetuating clickspam driven fake news and distracting from the real problems.

salawat · on Aug 4, 2019

>There's a million things that went wrong with the MAX 8 debacle, serious things that need to be addressed after the industry takes a hard look at itself. But there being something wrong with the airframe isn't amongst them. That meme has no grounds, isn't supported by anyone at all in the industry, and originated from clickbait spamblogs and late night tv shows that feed off FUD.

Really?

Would you like to supply some names, or provide some whitepaper for that?

https://www.seattletimes.com/seattle-news/times-watchdog/the...

Btw, the entire cause for MCAS being implemented is a desperate attempt to comply with FAR 25.173. If the non-certifiable behavior hadn't been there, there would not have been an issue.

But wait, there's more!

https://www.nytimes.com/2019/07/27/business/boeing-737-max-f...

So we even have severe concerns over whether Boeing overstepped bounds in the safety process by approving their control line setup in spite of FAA concerns anyway.

Some gems from that.

>F.A.A. managers conceded that the Max “does not meet” agency guidelines “for protecting flight controls,” according to an agency document. But in another document, they added that they had to consider whether any requested changes would interfere with Boeing’s timeline. The managers wrote that it would be “impractical at this late point in the program,” for the company to resolve the issue. Mr. Duven at the F.A.A. also said the decision was based on the safety record of the plane.

>Engineers at the agency were demoralized, the two agency employees said. One engineer submitted an anonymous complaint to an internal F.A.A. safety board, which was reviewed by The Times.

>“During meetings regarding this issue the cost to Boeing to upgrade the design was discussed,” the engineer wrote. “The comment was made that there may be better places for Boeing to spend their safety dollars.”

>An F.A.A. panel investigated the complaint. It found managers siding with Boeing had created “an environment of mistrust that hampers the ability of the agency to work effectively,” the panel said in a 2017 report, which was reviewed by The Times. The panel cautioned against allowing Boeing to handle this kind of approval, saying “the company has a vested interest in minimizing costs and schedule impact.”

>By then, the panel’s findings were moot. Managers at the agency had already given Boeing the right to approve the cables, and they were installed on the Max.

So, forgive me if I question the veracity of "most in the industry have figured it isn't a big deal", and even if they have, I question whether or not that decision comes from some level of just wanting to return to business as usual with minimum interruption or action/further regulation based on the coming to light of the scale of regulatory capture that has been uncovered by this debacle.

The physics and presence of MCAS at all don't lie. The plane could not be certified without it. Personally, I deem an airframe to only encompass the physical structure without automation, and that assumption seems to be well received by those I know who have worked in aviation circles previously.

Given that when I raise concerns with them, I universally get some variation of "what the hell were they thinking?," I'm not terribly willing to accept that a large portion of the active industry is necessarily making the most impartial judgement given that their livelihoods may very well be adversely effected by the fiscal failure of this plane.

My sources include a former safety investigator/tech, and someone who worked with a maintenance squadron. So make of that what you will.

Furthermore, I'm willing to tolerate some leniency with stretching regulations a bit, but not to the extent of normalizing deviance for the sake of expediency. Down that road lies too much catastrophe.

Now I'll admit, I was one of the early central repeaters of the artificial feel system theory; I didn't have access to good technical docs, but apparently someone at the Seattle Times was able to find some corroboration, but I've done my damnedest to keep my reasoning constrained to my enthusiast level understanding of aerospace engineering and aviation, which has been rapidly expanding in my efforts to understand how something like this could happen in an industrial vertical famous for it's capacity to generate some of the safest machines on the planet. Just that reasoning alone is enough to at least get me questioning the sanity of the design decisions that have transpired w.r.t the MAX.

This isn't FUD. People need to pay attention.

kuzehanka · on Aug 5, 2019

You've been repeating this view frequently and debunked frequently on HN. You're entitled to a personal opinion but don't pass it off as authoritative support for the 'airframe is inherently unstable' meme. None of what you linked supports it. Because no one in the industry supports it.

https://news.ycombinator.com/threads?id=salawat&next=2043595...

bronson · on Aug 5, 2019

On this thread you’ve been one person replying to many. If there’s anyone expressing a personal view here, that would have to be you.

Also, it’s the ‘airframe is fundamentally flawed’ meme. That’s an important difference.

mevile · on Aug 4, 2019

I've read that the problem is all the hoops Boeing is trying to jump through to keep the plane rated as a 737, not the airframe.

azernik · on Aug 4, 2019

That need to keep it in the 737 rating drove a lot of the airframe design.

rrss · on Aug 4, 2019

How do you know the airframe is fundamentally flawed? ("the engines are too big" is not an answer).

djsumdog · on Aug 4, 2019

Like the hundreds of other comments on here mentioning how the engines were too far forward. The 737 was originally designed to be low to the ground, and rather than raising the jet and placing engines where they are on other similar planes (meaning it could no longer be classified as a 737 and requiring new manuals/training/certification), they moved them forward and adjusted for the new unexpected handling with the MCAS.

blackflame7000 · on Aug 4, 2019

Raising the jet requires a wider body for the landing gear to fold up into

kayfox · on Aug 4, 2019

The 757 sits a good 4 feet higher with the same fuselage diameter.

blackflame7000 · on Aug 4, 2019

The 757 has a wingspan 30 feet wider than the 737. If you look at a picture of the airplane, you can see that the landing gear is part of the wing.

https://qph.fs.quoracdn.net/main-qimg-0f650e809408f3962a1ed0...

kayfox · on Aug 5, 2019

Yes, but you said body and not wing.

0az · on Aug 4, 2019

If I remember correctly, the 737 either doesn't have the space for longer landing gear, or there's an issue with the longer landing gear, or both. Citation needed, of course, as this is off of memory.

kayfox · on Aug 5, 2019

It was designed lower to be easier to work with ground handling and servicing. This lead to a few different designs of plane-deployed airstairs and made it popular with Alaska Air and other airlines serving small airports.

https://en.wikipedia.org/wiki/Boeing_737#Background

windexh8er · on Aug 4, 2019

We know because hundreds of people died in the airframe. We know the MCAS was a workaround to an aerodynamics problem. Boeing has admitted this both directly and indirectly. Airframes shouldn't need software fixes to accommodate for a design failure and design failures shouldn't be allowed to fly after killing hundreds of people.

How do you argue what they did and are continuing to do is just in this situation?

danjayh · on Aug 4, 2019

You clearly have absolutely no background in aerospace engineering. The 737 airframe is actually relatively stable compared to many other things that fly ... for instance, many high-performance jets will literally begin to oscillate and tear themselves apart without help from their flight computers. The flight computer is a part of the airframe, and it is perfectly valid for the flight computer to contribute to the airframe's handling characteristics. Boeing's mistake was in underestimating the burden placed on the pilots in a runaway trim situation, and assuming that pilots could execute a recovery procedure correctly that remained unchanged from prior iterations but that had also been rarely needed on prior iterations (vs. the MAX, where MCAS failures were common enough that an average pilot might actually need to execute the procedure).

bobowzki · on Aug 4, 2019

People like to bring up the instability of high performance jets, but I'd like to point out that they generally have ejection seats and don't carry passengers.

varjag · on Aug 4, 2019

They also crash incredibly often by civil aviation standards.

danjayh · on Aug 4, 2019

Airbus jets carry passengers and have been fly-by-wire for decades. Cables and hydraulics have been replaced with software and computers. Like it or not, the software and hardware that make up the flight computer are an integral part of the airframe on most newly produced passenger jets, regardless of who makes them.

VBprogrammer · on Aug 4, 2019

It's an interesting point. One of the reversion modes on Airbus is "direct law". Under that control mode the stick displacement is proportional to the control movements.

If, hypothetically, an Airbus had a problem in the same region of flight would it even be detected in flight testing? As far as I'm aware it wouldn't be as direct mode is a reversionary mode intended to get the aircraft safely on the ground.

ethbro · on Aug 4, 2019

One of the things that is regulated is stick control forces.

They cannot be greater than a set level (essentially, an average strength pilot), particularly when performing critical manuevers.

This was actually the whole reason MCAS was engineered in the first place: to lower the effective stick force required to within the acceptable limits in certain scenarios.

So presumably Airbus in direct mode could still have similar issues flagged in direct mode, if the plane behaved in such a way as to require unacceptably high stick force to move control elements (even if it was a 1:1 mapping).

VBprogrammer · on Aug 4, 2019

Technically I'd say they introduced MCAS to increase the control force in certain scenarios (high power high angle of attack). But yes I mostly agree with you.

One thing I'm not sure about is whether Airbus would have to demonstrate proper controllability (i.e. adherence to control force regulations) in all phases of flight and corner conditions.

You could have a scenario where by they have a complete control reversal on the approach to stall but under normal law the pilot would be oblivious. This would obviously show up in direct law.

ethbro · on Aug 4, 2019

I'm tracking what you're saying now.

I'd hope they would have to demonstrate both things: (1) that a given mode can or cannot be active in a certain scenario, and (2) how the plane behaves in all tested scenarios under all modes potentially active.

Is there a reason this necessarily wouldn't be the case?

VBprogrammer · on Aug 5, 2019

My guess is that it's considered so unlikely that they wouldn't be required to demonstrate the full range of behaviour. In the same way as I wouldn't expect them to demonstrate all of the edge cases of the flight envelope with a failed yaw dampener. I could well be wrong though, I haven't found any good references.

XorNot · on Aug 4, 2019

Fly-by-wire is not about airframe stability. High-performance jets are fly-by-wire because high-performance generally means you need to build an unstable airframe, and computer control is the only way to compensate for it.

But that's not commerical aviation - that's combat aviation. That's "you fly faster or the missile catches you and you die anyway" aviation.

Building intentionally unstable airframes for civilian aviation is a very different proposition. And unrelated to the use of fly-by-wire.

dataflow · on Aug 4, 2019

How is

> Airbus jets carry passengers and have been fly-by-wire for decades

a response to

> instability of high performance jets

?

tigershark · on Aug 4, 2019

Your comment is absolutely deceptive. Fly by wire doesn’t mean that airbus planes are as unstable as jet fighters.

windexh8er · on Aug 4, 2019

> The 737 airframe is actually relatively stable compared to many other things that fly.

I agree, so where is your comparison of 737 Max and non-Max airframe? Because we have real data, including death toll, that shows one airframe is not like the other.

> Boeing's mistake was in underestimating the burden placed on the pilots in a runaway trim situation, and assuming that pilots could execute a recovery procedure correctly that remained unchanged from prior iterations but that had also been rarely needed on prior iterations (vs. the MAX, where MCAS failures were common enough that an average pilot might actually need to execute the procedure).

This is not the whole truth as we know in this situation MCAS 1.0 was a critical flight system with a non-redundant data source. The "burden" placed on pilots was that the system did not have sufficient and trustworthy information to prevent a faulty and inadequately designed system (MCAS) from crashing the airplane. Furthermore since MCAS is not required, again, on non-Max airframe it seems your assertion that you have background in aerospace engineering is egregious because you should understand all 737 Max are currently grounded because the 737 Max is not a 737 airframe and should be recertified as such.

yardie · on Aug 4, 2019

If passenger planes had the failure rate of high performance jets we’d all be using boats and trains. The fact that high performance jets have ejection seats while passenger planes do not should clue you in that failure is anticipated.

ulfw · on Aug 4, 2019

Please list all passenger airframes that would “tear themselves apart”. Should be easy for you to do considering your background in aerospace engineering.

xchaotic · on Aug 4, 2019

High speed jet fighters flying above Mach speed doesn’t really compare with a chubby airliner flying 900km/h?

tus88 · on Aug 4, 2019

> The 737 airframe is actually relatively stable compared to many other things that fly

Nice strawman. We are talking about the MAX not the 737 in general.

Why don't you explain why previous variants of the 737 didn't require MCAS?

VBprogrammer · on Aug 4, 2019

Why don't you explain why previous 737 versions have a Mach trim to prevent Mach tuck, or why it had a speed trim system to control the pitch force response to speed changes.

tus88 · on Aug 4, 2019

For the same reasons it has control surfaces at all. Planes need to adjust their aerodynamic properties in order to adapt to various external and intrinsic factors. Trim in particular maintains aerodynamic stability in the presence of some conflux of the two.

The trim equivalent of MCAS would be some elevator trim that ensures the excess AOA potential of the MAX could not occur, if set correctly. That is how a stable aircraft behaves. MCAS is a computer override to compensate for a condition that should not really occur in the first place.

bwilli123 · on Aug 4, 2019

..."relatively stable compared to many other things that fly"...

Balloons, paper jets...

rrss · on Aug 4, 2019

> We know because hundreds of people died in the airframe.

Since both crashes were caused by malfunctioning software, this doesn't tell you anything about the fundamental soundness of the airframe.

> We know the MCAS was a workaround to an aerodynamics problem

MCAS was a workaround to make the plane handle more like a 737. The existing handling was fine by itself (not 'fundamentally flawed'), it just didn't match the 737.

> How do you argue what they did and are continuing to do is just in this situation?

I don't see where I did that. I just don't think there is any evidence that the airframe is actually fundamentally unsafe.

bronson · on Aug 5, 2019

The software is a fundamental part of the airframe. You can’t separate the two and still have an airplane rated for flight.

rrss · on Aug 6, 2019

> Keep talking about the software. Make it about the software, so no one knows the airframe is fundamentally flawed.

It's pretty clear that the author of the comment I originally responded to was not considering software.

bronson · on Aug 9, 2019

Their point was about the way Boeing is trying (and mostly succeeding) in steering media coverage about the MAX-8. (If Boeing keeps discussion on the micro issues, maybe they won't have to publicly address a macro problem). Their comment doesn't take a position on whether software is a fundamental part of the airframe or not. Yours does.

skybrian · on Aug 4, 2019

Uncrashable airplanes don't exist. The question isn't whether there were crashes (nobody disputes that). It's whether the problem can be fixed so that pilots can fly it safely. Given all the times the plane flew without incident (other than the two crashes), it seems this should be possible?

est31 · on Aug 4, 2019

The plane still crashed twice which is many many times higher than the non-purely-human-cause crash rate of other airplane types. Probably even then 737 MAX is still safer overall than cars, but airplanes have a higher safety expectation than cars do.

_ph_ · on Aug 4, 2019

But it crashed 2 times for the same reason. Take away that reason and you have a completely different situation to estimate its reliability.

michaelt · on Aug 4, 2019

The crash rate is zero if you ignore the crashes, it's true.

_ph_ · on Aug 4, 2019

It is not about ignoring the crashes. But as both crashes were caused by MCAS, you cannot make a safety statement about a 737 without MCAS or rather with a fixed one.

bronson · on Aug 5, 2019

Yes you can. Clearly Boeing Is suffering serious process failure, especially in letting two crash before responding properly.

Is the MCAS the only place they messed up? Will this fix truly be 100%? Remember that their first attempt at a fix was rejected.

When you see one bug, that’s often an indication that more are nearby.

visarga · on Aug 4, 2019

Yeah, it was perfectly safe except for two crashes. But that's two crashes more than I can tolerate.

The problem is now the lack of public trust in their work. They might fix the software, but how to prove they're not lying again?

ethbro · on Aug 4, 2019

The same way you prove Airbus isn't lying?

callalex · on Aug 4, 2019

Presumably Airbus is still being watched by a regulator or two. Unlike Boeing which has not been watched as the FAA has clearly been gutted.

_ph_ · on Aug 4, 2019

The 737 MAX will be watched by all regulators world-wide. So it will be checked as carefuly, or actually even way more carefully than any competing aircraft.

ulfw · on Aug 4, 2019

Which NOONE bothered to do before millions of people were made to fly in them. THAT is the very issue at hand.

AnthonBerg · on Aug 4, 2019

We must remember that repeated checking does not make a bad design good. It’s completely clear now what Boeing’s priorities and methods were.

_ph_ · on Aug 4, 2019

The 737 is a very successful design and it is clear with practical experience, what its strengths and weaknesses are.

AnthonBerg · on Aug 4, 2019

Boeing build processes that build aeroplanes. Those processes swiftly mutilated and killed hundreds of people when the product was unleashed. The response from Boeing was 100% sociopathic.

You may find that OK, if you wish. You may choose to do nothing.

We don’t, and won’t.

tus88 · on Aug 4, 2019

Because MCAS is required, unlike every other aerodynamically stable airplane that came before it.

danjayh · on Aug 4, 2019

Airbus does it too. They have several control laws that are used to provide flight envelope protection. It was the pilot's failure to understand his plane's software that caused AF447 to crash. Essentially he didn't realize that the flight control software had changed modes, that it wasn't providing the normal handling augmentations, and that his inputs were putting the aircraft into a stall, which caused the crash.

See: https://en.wikipedia.org/wiki/Air_France_Flight_447

https://aviation.stackexchange.com/questions/62338/why-did-a...

tus88 · on Aug 4, 2019

Build unstable airframes? Not outside of forward canard jet fighters.

And that pilot crashed a perfectly fine airliner after he lost situational awareness and then did the one thing you should never do - pull the steering column back in a mad panic until the plane stalls.

blackflame7000 · on Aug 4, 2019

https://en.m.wikipedia.org/wiki/Qantas_Flight_72

makomk · on Aug 4, 2019

Qantas Flight 72 is probably a good demonstration of why using MCAS as a band-aid for the 737 MAX's stability issues is a bad idea. One of the key reasons that fault didn't lead to a crash that killed everyone on board is that Airbus placed much safer limits on the amount of control authority their flight envelope protections had and the altitude at which they could operate.

_ph_ · on Aug 4, 2019

No, in normal flight regimes, MCAS is not required. For example, it is disabled at any time, the flaps are extended, that is during starts and landings. It is only required (by law) at high angle of attacks. In these kind of flight regimes, most other airplanes also have systems to ensure their "stability". The 737 MAX isn't fundamentally more unstable than competitor airplanes with large engines.

dboreham · on Aug 4, 2019

The wings are too low?

notjustanymike · on Aug 4, 2019

Keeps crashing into the ocean

mseidl · on Aug 4, 2019

The plane was built in the 50s, it was low to the ground. But Boeing wanted to fit bigger/newer/better engines on it, but they wont fit since you can't make the landing gear longer. They had to move the engines higher and much more forward.

natch · on Aug 4, 2019

Isn’t the engines being not mountable at a well balanced point a big part of the answer? Which isn’t down to only the size of the engine but also the height of the wings. So yes it’s complicated but the engine is part of the picture. Although not being in the industry, I don’t know, is a strap-on engine added to a design that wasn’t intended to accommodate it technically considered part of the airframe?

_ph_ · on Aug 4, 2019

Basically no jet engine at any airliner are mounted at a point which is well-balanced in all flight situations. At level flight, the 737 MAX is also well balanced. That is just a matter of weight distribution in the machine. The problem appears at high angle of attacks, that means the force vector isn't pointing vertically down as it were only due to gravity. In this situation, the balance of every aircraft which has large engines hanging down from the wings does shift and needs to be compensated for in some way in the control systems.

toomuchtodo · on Aug 4, 2019

If the airframe isn’t flawed, fly it without MCAS. With Boeing management onboard. Surely you don’t need to dynamically adjust flight dynamics without the pilot’s input or knowledge if the airframe is sound.

It’s realistic to build aircraft where software is required to fly said aircraft depending on the flight dynamics, and the pilot(s) are fully aware of what’s running. But that changes the type rating and blows Boeing’s regulatory sidestep out of the sky.

TylerE · on Aug 4, 2019

God it's frustrating seeing people here talk about aviation.

Flight computers are a fundamental part of normal aeronautical design.

Asking Boeing to fly a Max without MCAS would be like asking you to write a bug free C++ program... with a compiler with no error messages or validation.

windexh8er · on Aug 4, 2019

The MCAS was a workaround implemented as a stopgap, no? How does this fall under the broad category of a flight computer as you've stated? What other airliners have been retrofitted with similar systems? Why doesn't a non-Max 737 need MCAS?

ethbro · on Aug 4, 2019

No. It was a software fix put in place to satisfy a regulatory requirement under specific, known flight conditions.

The following planes (among others) all translate flight inputs. Airbus A320, A330, A340, A350, A380; Boeing 777. https://en.m.wikipedia.org/wiki/Flight_control_modes

A non-Max 737 doesn't need MCAS because it already behaves like a 737.

Essentially, most replies on this boil down to people being shocked (shocked!) that computers are currently helping all planes fly. I'm not sure how they thought AEs were getting efficiency gains while physics stayed the same...

AnthonBerg · on Aug 4, 2019

The replies do not all boil down to people being shocked that computers help planes fly. That is a willfully distorted charicature.

Hundreds of people were killed due to a bad implementation of a deeply irresponsible and insensible approach to make money by skirting flight safety regulation. The bad implementation was in part software, but nobody is confusing that aspect with the whole picture.

The 737 MAX does not need MCAS. Nothing needs MCAS. Boeing executives wanted the money that MCAS made them.

ethbro · on Aug 4, 2019

Is it really?

If that were the case, then you'd see people debating the relative merits of SDLC approaches and organizational blinders to letting bugs slip through.

In reality, that's only about 20% of any Boeing discussion.

80% is outrage about how Boeing could have released such an "obviously flawed" airplane that required flight computer adjustments. And misunderstanding that Boeing pivoted from a clean sheet design to instead build exactly what the carriers asked them to.

Generally, HN comments are pretty good, but the amount of kneejerking in Boeing threads in lieu of informed discussion is terrible.

Edit: Although reading through new replies this morning, it's nice to see voting mostly sorted the technical wheat from the chaff.

windexh8er · on Aug 4, 2019

> A non-Max 737 doesn't need MCAS because it already behaves like a 737.

If I read this correctly you're implying a 737 Max doesn't behave, aerodynamically, as that a non-Max 737 does? And if we follow that line of thinking since non-Max 737 don't require MCAS to behave appropriately then the 737 Max is not a 737 airframe and it should be recertified as something else, no?

I'm in no way "shocked" computers fly and/or help fly airplanes. I am shocked so many comments defend the gross negligence by Boeing, however.

Remember that in all of the needless death the 737 Max is attributed with absolutely zero deaths were because of a flight computer. The real root cause of the 737 Max crashes are Boeing executives and FAA personnel making cognizant choices to allow process to be skipped in the vain of profits and loss avoidance. I believe we'll see this, at some point, no different than the commonly referenced Challenger failure [0].

[0] https://www.breakingthewheel.com/root-cause-analysis-five-wh...

benj111 · on Aug 4, 2019

"God it's frustrating seeing people here talk about aviation"

Google the 'Murray Gell-Mann Amnesia effect'.

Edit: https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect

ummonk · on Aug 4, 2019

That's not true for Boeing aircraft. They are designed to be flyable manually, unlike Airbuses, which are fly by wire.

TylerE · on Aug 4, 2019

The 777, 787, 747-8, and 737 Max are all fly by wire.

inamberclad · on Aug 4, 2019

The Max is not fly by wire, that's the problem. It's a hydraulic/mechanical control. That's why pilots aren't able to pull it out of dives or re-trim it, the aerodynamic forces are too much for the crew to fight against.

Animats · on Aug 4, 2019

Right. If this was a fly by wire aircraft, it would have to have triple or quadruple redundant control systems and redundant air data sensors. But because the auto-trim system wasn't considered primary flight control, it didn't get evaluated like a fly by wire system.

Auto-trim has been around for years, and failures are usually more annoying than serious. But the 737 MAX's auto-trim had far too much control authority, and was being used to compensate for bad handling characteristics, making it a flight critical system.

usrusr · on Aug 4, 2019

And if they hadn't opted for secrecy, properly put it in the manual instead of hoping that the outcome of bad sensor readings would appear like that of a fried switch, someone might have realized what a perfect trap they had built while writing "engage trim cut-out, then work the manual trim wheel until you regain control or hit the ground, whatever comes first".

ulfw · on Aug 4, 2019

Neither is the 747-8. Half of his examples were made up.

TylerE · on Aug 4, 2019

Not fully, but spoilers and ailerons in the -8 are FBW.

ummonk · on Aug 4, 2019

The other three are but the 737 Max is most definitely not fly by wire.

toomuchtodo · on Aug 4, 2019

[removed] Other sibling comments are of higher quality.

TylerE · on Aug 4, 2019

Better not ever fly on an Airbus then, either.

https://en.wikipedia.org/wiki/Air_France_Flight_296

https://www.telegraph.co.uk/technology/9231855/Air-France-Fl...

miketery · on Aug 4, 2019

You're wrong.

Sophisticated flight computers are only necessary for aircrafts that are inheritenly unstable (fighter jets) that require special flight characteristics.

Safe commercial, and private aircraft are built to be inheritenly stable. It's less complexity.

Sure we've added fly by wire, or dynamic power controls. But those are relatively straight forward.

Changing the center of gravity of the plane and therefore introducing different (dangerous, risky, and unsafe) flight charectierstics is a major issue.

TylerE · on Aug 4, 2019

Define "necessary".

Depending on how exactly you define it, pilot error is either the #1 or #2 cause of commercial plane crashes. FBW systems seek to reduce or eliminate that.

ulfw · on Aug 4, 2019

That’s because they’re cheap. Software updates are cheap. Hardware updates are a problem.

artsyca · on Aug 4, 2019

A Faustion proposition -- they're going to release software updates from here to the apocalypse yet none of them will succeed in winning back anybody's trust; they should implement a code freeze until hell freezes over

inamberclad · on Aug 4, 2019

People here seem to be missing a few things.

1: The 737 family, including the 737 MAX, still use manual/hydraulic controls, not fly-by-wire. A computer failure isn't supposed to be as catastrophic as it would in a fully fly-by-wire system. 737 pilots are already trained on other automatic system failures, such as a pitch trim motor runaway.

2: The original design of MCAS had a limited scope - stop pitch instabilities from causing an uncontrolled pitch-up into a stall in high AoA and high power regimes, such as just after takeoff. It read from both a G-sensor and the AoA vane and both sensors had to be reading excessive values for the system to trigger. This was probably the justification for not requiring higher levels of reliability.

3: The plane was found to have handling issues in slow flight regimes. In order to improve the handling, Boeing engineers modified MCAS to be active in more of the flight regime, removing one of the sensor readings. Now it would trigger if it detected the plane was near a stall, even in level flight without a high G loading. It is unclear whether MCAS was recertified within Boeing for continuous use. Wikipedia states "The FAA did not conduct a safety analysis on the changes. It had already approved the previous version of MCAS, and the agency's rules did not require it to take a second look because the changes did not affect how the plane operated in extreme situations."

4: Pilots could not disable MCAS without disabling electric trim control. In at least the LionAir case, the pilots did disable the electric trim, but were unable to re-trim the plane manually against the aerodynamic forces of their dive. They re-enabled the electric trim to re-trim the aircraft, and MCAS re-triggered and put them back in the same situation. Remember, these aircraft use manual controls. The pilots need to put a serious amount of force into the control column when the plane is out of trim and at a high airspeed. This greatly diminishes their ability to do anything else.

5: Both accident aircraft were missing an optional safety feature - the AoA disagree warning. Both aircraft experienced a failure of one of their AoA vanes. I don't recall whether these failures happened in flight or before the flight, but such a warning would have likely stopped the pilots from taking off.

6: The single computer issue was not uncovered until simulator testing after the crashes. It is not directly related to the MCAS problems.

Wikipedia is your friend, as usual. https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au...

Here's even more detailed information: http://www.b737.org.uk/mcas.htm

_ph_ · on Aug 4, 2019

Thank you, this is a very good summary of the issue. Only a small comment to 4: it was the Ethiopian Air flight, where they disabled MCAS but did not manage to regain control, mostly due to their air speed being too high. Lion Air had 2 MCAS incidents, the first on the day before the crash, where the pilots disabled electric trim control and continued the flight without an incident and the second, where they did not disable MCAS at all, leading to the crash.

Interestingly, Airbus as a consequence of the events reviewed their flight software and also found an issue they are quickly correcting, but it seems that no planes are grounded as a consequence.

wikibob · on Aug 4, 2019

First I’ve heard of the Airbus issue. Can you provide a reference? I couldn’t find anything

_ph_ · on Aug 4, 2019

Here is something: https://samchui.com/2019/07/18/easa-warns-of-airbus-a321neo-...

addicted · on Aug 4, 2019

#6 makes it worse.

#6, combined with all the institutional issues people are aware of, makes it quite likely that the 737 Max has a bunch of issues hidden away. The MCAS issue was only the most obvious and easiest to trigger, which is why it led to 2 crashes. If those planes had been around longer, and if they hadn’t been grounded because of the MCAS issue, it’s likely some of the other issues may have also caused a crash.

tracer4201 · on Aug 4, 2019

I’ve lost all faith in Boeing. Boeing’s greed and cutting corners to avoid pilot recertification is textbook example of ethical failures.

I have second thoughts about many of the companies I’ve invested in over the years, but Boeing right now takes the cake. I’ve dumped my shares. Hopefully there are some criminal investigations at some point if our leaders in Washington could stop dancing to the tune of whichever company throws them a bone like the dogs they really are.

skunkpocalypse · on Aug 4, 2019

> I’ve lost all faith in Boeing.

Guess what? They don't care because it doesn't matter.

You really can't avoid their airplanes without avoiding air travel.

Yizahi · on Aug 4, 2019

It's like with Google (or Amazon, or Microsoft) - you can disagree with their policies and business practices, but you can't avoid them. It is physically impossible to use internet without these monsters. https://gizmodo.com/c/goodbye-big-five

tempguy9999 · on Aug 4, 2019

Is boeing really the only producer of aeroplanes?

My guess is there are others. My guess is that orders will be cancelled and there'll be a shift towards airbus (oh!there's one!)

I'm tempted to apply my first ever downvote to this apparently witless comment, but perhaps you meant something more subtle?

_ph_ · on Aug 4, 2019

Orders can't just be shifted to Airbus, as their production lines are fully loaded for years to come. Also, there is little desire to make Airbus a monopolist. The problem basically started when Boing was merged with McDonell Douglas, as this created the duopoly which now has the market trapped. And of course it creates a huge amount of problems for all the airlines which ordered and planned for new planes which have the same certificatation as their existing 737.

Mistletoe · on Aug 4, 2019

It feels like we see this over and over when you trace corporate problems back- loss of competition and development of monopolistic conditions. There’s a reason we developed antitrust laws and to watch them be eroded in the modern era is quite worrisome.

cobookman · on Aug 4, 2019

Airbus did the same as well with thier 320 series

mrhappyunhappy · on Aug 4, 2019

Citation?

ulfw · on Aug 4, 2019

There isn’t one because it’s not true.

wayanon · on Aug 4, 2019

Will anyone appear in court for this episode in a year or two?

alkonaut · on Aug 4, 2019

>“Only one computer was used in the past, because Boeing was able to prove statistically that its system was reliable, the person said.”

I read that as ”statically” (thought it meant formally) and was really impressed by how advanced that sounded! Then I read it again and saw statistically. Less impressive.