Hacker News new | past | comments | ask | show | jobs | submit | modernpacifist's comments login

> The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.

If they have a search warrant then a judge has, from a legal perspective, determined that the request/search is reasonable. So while you have the right to secure against unreasonable cases I think it is a reasonable trade off that those security mechanisms/processes/etc should either be removed by yourself or you should expect them to be removed for you.


I don't know what fantasy world you live in but the police don't ask you to remove your security mechanisms yourself. Your likely to catch a charge for destruction of evidence if you do that along with a bunch of other related charges

Considering that police can, with a warrant, forcibly place your thumb on your phone's fingerprint sensor to unlock it [1], "I don't know what fantasy world you live in" is unwarranted.

[1] https://reason.com/2024/04/19/appeals-court-rules-that-cops-...


I specifically said they don't ask. If they are forcing you to do something, they aren't asking

Removing the security mechanism in my comment is akin to opening the safe to enable the search or entering your PIN number on a phone to unlock it. I can't really see how otherwise removing a roadblock to enable law enforcement to perform their court approved mandate would lead to further charges for the act of helping them do so. Of course if you're referring to poison-pill mechanisms that upon removal destroy the data they wanted to search for then sure, more charges are coming.

That isn't how it works at all in the US. You'd be asked to provide the combination to the safe. Or compelled under a court order to divulge it under penalty of contempt of court.

If you're asked to directly interact with anything like that you're very likely being set up to bring additional charges against you. You can be compelled to provide passwords, combinations, etc. in a court. You can't be compelled to actually enter the safe combination


It seems reasonable to suggest that the number of profit-driven ransomware endeavors and the number of for-fun ransomware endeavors can both be non-zero and contain some overlap and some non-overlap. Therefore it seems that to make it unprofitable would at least eliminate the former reason which under all by the worst case scenario where those numbers are perfectly equal and overlapping would result in fewer ransomware endeavors.

To say we shouldn't do X because it doesn't perfectly eliminate/solve Y is akin to saying we should do nothing because by that standard, we'll never do anything about Y.


Not only might X (= banning payments) not eliminate ransomware, it could make the problem worse!

Those ransomware perpetrators who are motivated by profit could multiply their activities, if the yield is reduced: have more heists going on.


I don't see how banning payments would inherently create more opportunity for ransomware attacks. Assuming that the operators are already attacking as much as they can (why wouldn't they be - its more profit that way since its business after all) the only way to maintain profitability with lower per-attack yields would be to ask for more ransom per-attack which would likely drive the yields down even further.

Reminds me of https://youtu.be/9pOiOhxujsE?si=GG6X16c8efr0I3Ey&t=213


I'll +1 to this. Coming from Australia to the US I've found that (generally, especially in big corp entities/govt.) American customer service is always extremely courteous and eager but ultimately unhelpful or severely limited in what they can do. As soon as you are off script, good luck.


on the counter I think Australian retail is some of the best. fun, low pressure, engaging and helpful.


I would agree. My standard engagement with CS in Australia was a lot more personable and once it was recognized that a situation was off script they were far more willing to go into problem solving mode.


Having seen the other side of the fence (the hyperscaler side) I’m kind of bored with egress cost being continuously compared with your standard bare-metal hosting provider for a few reasons:

- Let’s first dispense with the idea that egress has to be provided without profit/margin in a capitalistic society. There will be profit in it sure and I don’t dismiss the idea that egress pricing is used to keep activities on-platform, not that I’ve been part of the decisions to set the price that way.

- Typically the more basic a network is, the easier it is to provision, manage and scale. Having a single DC with a couple of local transit providers and BGP routing brings with it a wildly lower cost base compared with a global network with piles of different POPs.

- Many providers, by charging only by usage, are effectively saying that the network is infinite in capacity and “just works”. You would be surprised how many engineers believe this to be true as well. To that end, as the complexity of the network grows you need to charge in a way that allows you to keep capacity ahead of demand for every path you manage. And then you need geographic redundancy for every such egress path and systems/people to manage said failover.

- In the case of GCP Premium tier, Google is hauling your traffic as far as it can on its private network towards its destination before exiting via a POP. Usage forecasting and pricing as a result needs to effectively assume any VM wants to send any amount of traffic to anywhere in the world. Even then the premium tier pricing separates kit China and Australia as special cases.

- In the hyperscaler case and even many of the larger VM/bare metal hosts you’ll find software defined networks which can often have a non-zero per-byte CPU processing cost involved. AFAIK this is essentially written off when traffic is in the same zone or region but escalates for various reasons (say rate limiting, egress path calculations, NAT, DoS prevention) when going inter-region or to the internet.

- Many of the hyperscalers do allow you spin up private interconnects but often charge for usage across it. This shifts away from being raw cross-connect cost to being more enterprise-y where the value of having dedicated, private capacity becomes the price. There is also the cost of managing said interconnect since it most certainly doesn’t get handled the same way as other egress paths (thus is more of an exception and exceptions cost money/time/effort).

Do all of these things add up to the “high” egress costs plus a decent margin for that evil profit? That is mostly up to the reader and what they value. Many others will say they don’t need all these features/management, but the reality is the hyperscalers don’t build for you, they build to cater to everyone that might become or is a customer. And it turns out to build a network capable of handling what “everyone” could potentially do with it is expensive.


Most of your points apply equally well to ingress as they do to egress. Yet the cost of one is orders of magnitude less than the other.

The only sane explanation for the vast imbalance is vendor lock-in. Everything else is hand-wavy distraction.


Not exactly - they apply to whichever dictates your capacity - be it ingress or egress. Overwhelmingly, capacity for hyperscalers is dictated by egress peaks. Since most non-residential network links are delivered symmetrically, the capacity for the other direction is already there as a by-product.

Also don't underestimate the benefit of simplification - why bill for 2 things separately when one of them is the primary driver of the cost, the comparative cost to supply the other is negligible and is probably more effort to bill for than it's worth.

I'm not dismissing the vendor lock-in aspect, but I don't think it is the only reason at play.


I don't know about others, but I can't help but smile when I read the detailed series of events in aviation postmortems. To be able to zero in on what turned out to be a single faulty part and then trace the entire provenance and environment that led to that defective part entering service speaks to the robustness of the industry. I say that sincerely since mistakes are going to happen and in my view robustness has less to do with the number of mistakes but how one responds to them.

Being an SRE at a FAANG and generally spending a lot of my life dealing with reliability, I am consistently in awe of the aviation industry. I can only hope (and do my small contribution) that the software/tech industry can one day be an equal in this regard.

And finally, the biggest of kudos to the Kyra Dempsey the writer. What an approachable article despite being (necessarily) heavy on the engineering content.


As a former Boeing engineer, other industries can learn a great deal from how airplanes are designed. The Fukushima and Deepwater Horizon disasters were both "zipper" failures that showed little thought was given to "when X fails, then what?"

Note I wrote when X fails, not if X fails. It's a different way of thinking.


When I worked in an industrial context, some coding tasks would seem trivial to today's Joe Random software dev, but we had to be constantly thinking about failure modes: from degraded modes that would keep a plant 100% operative 100% of the time in spite of some component being down, to driving a 10m high oven has the opportunity to break airborne water molecules from mere ambient humidity into hydrogen whose buildups could be dangerously explosive if some parameters were not kept in check, implying that the code/system has to have a number of contingency plans. "Sane default" suddenly has a very tangible meaning.


> we had to be constantly thinking about failure modes

This to me is the biggest difference between writing code for the software industry vs. an industrial industry.

Software is all about the happy path ("move fast and break things") because the consequences typically range from a minor inconvenience to a major financial loss.

Industrial control is all about sad paths ("what happens if someone drives a forklift into your favorite junction box during the most critical, exothermic phase of some reaction") because the consequences usually start at a major financial loss and top out in "Modern Marvels - Engineering Disasters" territory.


You do /not/ want to make it on the USCSB YouTube channel.


Yeah, I work as a Functional Safety Engineer in the process and machinery sector and 90%+ of effort is in planning, considering all the possibilities outside of intended operation and traceability.

I have worked on projects where in retrospect the LOC generated per day, if spread out across the whole project, were between 1 and 3.

But typically, writing of the code does not even commence in the first year, sometimes two.

Then there is the test cases and test coverage etc etc.

This is the difference between engineering code and just producing it - all the effort that goes into understanding all the unwanted code behaviour that may occur and how to detect, manage and/or avoid it.

Implicit state is the enemy, therefore the best code has all states explicitly defined.


As an engineer I think a lot about tradeoffs of cost vs other criteria. There is little I can learn from nuclear or aviation industry, as the cost structure ist so completely different. I’m very happy that the costs of safety in aviation are very good accepted, but I understand that few people are willing to pay similar costs for other things like, say, cars.


The costs of the Fukushima and Deepwater Horizon were very, very high. Both could have been averted at trivial expense with simple changes to the design.

Fukushima:

badthink - the seawall is high enough that it will stop tidal waves

goodthink - what happens when the seawall is overtopped? Answer: the backup generators drown. Solution: put the backup generators on a platform.

Deepwater Horizon:

badthink - the pipe is strong enough to never break

goodthink - what happens when there's enough force to bust the pipe off? Answer: the pipe flow cannot be shut off. Solution: put a fuse (a weak spot) above the valve, so when the pipe busts off, it breaks above the valve, and the valve can be turned to shut off the flow. (The valve was located on the sea floor.)


This is so easy in retrospect when you know what the failure mode will be.

badthink: the Fukushima backup generators must be placed on a platform to keep them out of the range of a once in a millenium tsunami

goodthink: what happens when a typhoon comes and damages the generator on an exposed platform; an event which happens predictably and far more often than tsunamis. Answer: put the backup generators in the basement of a reactor building behind a large seawall. What catastrophe could put the reactor building completely underwater, and still have the reactor survive?

Yeah, trivial changes to the design can prevent all sorts of disasters, but you have to know what you are trying to prevent in a world of infinite complexity


A large seawall to be sure, but not a particularly tall one. If I recall correctly the seawall was remarkably short relative to maximum expected wave heights on a 100 year time frame.


We're making a niche B2B application, and this is very much it for us as well.

Our customers are in a cutthroat market with low margins. We can't spend a ton on pre-analysis, redundancies and so on.

Instead we've focused reduced the impact of failures.

We've made it trivial to switch to an older build in case the new one has an issue. Thus if they hit a bug they can almost always work around it by going to an older build.

This of course requires us to be careful about database changes, but that's relatively easy.


You can not. AI though, can be cheap enough to produce that. I wonder what happens if you take a b2b application and let it rewrite with AI to Nuclear Industry/ Aviation standards into a seperate repo. Then on fixes/rewrite the engineers take the "safety aware repository" as inspiration.


What you're describing is almost exactly the opposite of what LLMs are good for. Quickly getting a draft of something roughly like what you want without having to look a bunch of stuff up? Great, go wild. Writing something to a very high standard, with careful attention to specs and possible failure cases, and meticulous following of rules? Antithetical to the way cutting-edge AI works.


Have you tried using an LLM to write code to any kind of standard? I recently spent two hours trying to get GPT 4 to build a fiddly regex and ultimately found a better solution on Stack Overflow. In my experiments it also produced lackluster concurrent code.


You’ve missed the point. Those standards don’t relate at all to writing code, they relate to process, procedure and due diligence - i.e. governance. Those all cost a lot in terms of man hours.


Exactly. Even without learning from those groups, there's a ton of stuff we know we could do to improve the reliability of our product. It's just that it would take way too much development time and our customers wouldn't want to pay for it.

It's like buying a thermometer from Home Depot vs a highly accurate, calibrated lab thermometer. Sometimes you just don't need that quality and it's a waste paying for it.


Yeah, it costs. That, and that people will accept shite software makes it high quality a fight software companies can avoid. Rationally therefore, they do.


I don't think that's the right way to reason about it.

I find that I can learn a ton from those industries, and as a software engineer I have the added advantage of being able to come up with zero-cost (or low cost), self-documenting abstractions, testing patterns, and ergonomic interfaces that improve the safety of my software.

In software, a lot of safety is embodied in how you structure your interfaces and tests. The biggest cost is your time, but there are economies of scale everywhere. It really pays to think through your interfaces and test plan and systems behavior, and that's where lessons from these other industries can be applied.

So yeah, if you think of these lessons as "do tons of manual QA", you'll run into trouble resourcing it. But you can also think of them as "build systems that continuously self-test, produce telemetry, fail gracefully in legible ways and have multiple redundancies".


Cars might not be the best example, since human lives are at stake, as in aviation. Unless you work on Teslas autopilot, it seems. But yes, backups and restores are often good enough.


As it turns out (and as much as we wouldn’t want them to) human lives are still subject to cost/benefit analysis.

An airliner is a lot of lives, a lot of money, a lot of fuel, and a lot of energy. Which is why a lot has been invested in training, procedure, and safety systems.

Cars operates in an environment which is in most ways a lot more forgiving, they’re controlled by (on average) low-training low-skill non-redundant crews, they’re much more at risk of “enemy action”, the material stresses are in a different realm, and they’re much, much more sensitive to price pressure.

Hell, the difference is already visible in aviation alone, crop dusters and other small planes are a lot less regulated amongst every axis than airliners are.


I wouldn't say it's simply cost-benefit analysis. It's also scale of accidents.

A whole lot more people die from car accidents, yet there are few reports on national news on accidents. So fewer people care. Meanwhile each time there is an aviation disaster, 100s of people die and it's all over the news for weeks. Similarly with train accidents and nuclear accidents. There where only 2 very large ones but they still haunt the field to this day, while (for example) the deaths from solar installations by people falling from roofs are mostly ignored.

Large accidents have to be avoided, a lot of small ones are more acceptable.


> I wouldn't say it's simply cost-benefit analysis. It's also scale of accidents.

But that is cost/benefit analysis. When any accident can kill hundreds and do millions to billions in damage besides (to say nothing of the image damage to both the sector and the specific brand), the benefit of trying to prevent every accident is significant, so acceptable costs are commensurate.


I think it goes beyond what you'd expect just from the increased scale putting more lives at risk. Compare our regulatory system for buses and cars, two transportation options that are probably as close as possible to differing only in scale. Buses are ~65x less deadly than cars, and yet we still respond to the occasional shocking bus accident by trying to make them safer.

Which is actually counterproductive! This makes it harder to compete as a bus service, bus lines shut down, and more people drive. I wrote more about this at https://www.jefftk.com/p/make-buses-dangerous and https://www.jefftk.com/p/in-light-of-crashes-we-should-not-m...


There are a fair amount of backups in your car. For example, the braking system is dual. There's also engine braking and the parking brake that can be used. All the "energy absorbing" features are a backup for when you crash.


Any substantiation for "Unless you work on Teslas autopilot, it seems"?

I mean you're implying that there are more accidents with autopilot than without it, right? Seems like quite the claim...


No, I'm implying that the autopilot code has not been as thoroughly tested as it should have been.

Example: https://www.theguardian.com/technology/2023/nov/22/tesla-aut...


Tesla people always try to reduce any critique to some metric on deaths per x.

The fact is, there’s a lot of history and best practice around building safety critical systems that Tesla doesn’t follow.

Additionally, even with the practices they follow, they call a consumer facing product that isn’t really an autopilot “autopilot”, while focusing outbound comms on a beta product that is more like an autopilot, but not available to them.


I agree with most of this but the naming of "autopilot" seems fine. Nobody expects commercial aircraft to fly on autopilot without a pilot's supervision, the same _should_ be true of Tesla vehicles (especially considering their tendency to jump into the wrong lane and phantom brake on the highway etc.)


What matters is what the user of the system thinks because that’s where confusion can be dangerous.

A plane pilot knows very well what the limits of the autopilot are and what the passenger believes is irrelevant.

Conversely if too many/most car “autopilot” users believe it does more than what it really does then it’s dangerous.

In electrical engineering 600V is still “low voltage”. Any engineer in the field knows that so that’s fine right? But if someone sells “low voltage” electric toothbrush or hand warmer no normal person will think “it’s 600V, it will probably kill me”. When you sell something, what your target audience takes away from your advertisement matters. If they’re clearly confused and you aren’t clearing it up after so many years then “confusion” and misleading advertising are part of your sales strategy.


> Nobody expects commercial aircraft to fly on autopilot without a pilot's supervision

Nobody here on HN, because we're really into tech. Outside the tech world, I would guess that 50% of the population thinks that "autopilot" (on any device) means that no human is needed.


Considering Tesla was willing to do unsafe things in visible way (e.g, running stop signs feature), then I have no trust that they are maintaining safety in the less visible ways.


In the context of disasters that happened due to software failures (e.g. Ariane 5 [1]), one of my professors used to tell us, that software doesn't break somewhen but is broken from the beginning.

I like the idea of thinking 'when' instead of 'if', but the verdict should be even harder when it comes to software engineering because it has this rare material at its disposal, which doesn't degrade over time.

[1] https://en.wikipedia.org/wiki/Ariane_5#Notable_launches


An example of zipper failure in the Airbus incident is when a wire bundle gets cut, all the functions of all the wires in that bundle are lost. Having two or more smaller bundles physically separated would greatly reduce that risk. Certainly, having the primary and the backup system in the same bundle is a bad idea.

On the 757, one set of control cables runs under the floor. The backup set runs in the ceiling.


It’s the same on Airbus aircraft, I can tell you from experience.


I thought Airbus was fly-by-wire, not cables?


It is. I'm talking about redundant electrical wires being physically separated so they don't get damaged by the same event.


What's fascinating about airplane design for me is not the huge technical complexity, but rather, the way it is designed such that a lot of its subsystems are serviceable by technicians so quickly and reliably, not just in a fully controlled environment like a maintenance hangar, but right on the tarmac, waiting for takeoff.


Designing the airplane to minimize required maintenance and to make maintenance and inspections easier and faster is a huge issue for the engineering department. Also make it very difficult for the mechanics to do things wrongly.

As it was pointed out to me, airplanes sitting on the ground are a black hole sucking up money. Airplanes in the air carrying payload (note the "pay" in payload) are making money. Boeing understands this very well, and is very focused on getting that airplane in the air making money as much as possible.


> When my AoA sensor fails, then what?

crickets, let's just randomise which sensor we use during boot, that ought to do it!


> Airlines really want to be able to use pilots' existing type-rating on this hulking zombie of a 60s-era airframe with modern engines but it behaves differently under certain conditions, what do we do?

let's just build a system that pushes the nose down under those conditions, have it accept potentially unreliable AoA data, and not tell pilots about it!


"AoA sensor" - Angle of Attack sensor.

And the reference is presumably to 737 MAX accident. https://www.afacwa.org/the_inside_story_of_mcas_seattle_time...


Epic fail indeed, costing many lives.


I agree in principle, but I don't think industries should be looking at current-day Boeing's engineering practices except for an example of how a proud company's culture can rot from the inside out with fatal consequences.


I think Boeing has had some difficulties. They have also had some undeniable successes. The 777 and 787 programs have no in-service passenger fatalities attributable to engineering errors to date. That's a monumental achievement.


The 787 has no hull losses at all right? And it’s been flying for 10 years now.


An extra safety margin is conferred by the stepladders found in the tailcones :-)


Reminder that this article was about an aircraft built by Airbus.

(Airbus is not Boeing.)


How are aeroplanes designed differently at Boeing vs Airbus? What's the secret sauce?


A pilot once explained to me..

Boeing planes (before MCAS): we have detected a problem with your engines, would you like to shut down?

Airbus planes: we have detected a problem with your engines, we have shut them down for you.


Same way Samsung phones are not Huawei phones? Or BMWs aren't Lexus?


At this point the secret sauce is that the EAA isn’t tolerating the same degree of certification fucking and laxity from airbus, and that they generally seem to have their act together.

Like what’s the secret sauce of nvidia vs radeon or AMD vs intel? Reliable execution, seemingly - and this is an environment where failures are supposed to be contained to very specific rates at given levels of severity.

The FAA has gotten into a mode where they let boeing sign off on their own deviations from the rules, the engine changes forced the introduction of the nose-pusher-down system which really should have required training, but Boeing didn't want to do that, because the whole point of doing the weird engine thing was having ostensible "airframe compatibility" despite the changes in flight characteristics. And they have become so large (like intel) that they don’t have to care anymore, because they know there’s no chance of actual regulatory consequences, nor can the EAA kick them out without causing a diplomatic incident and massively disrupting air travel, so they are no longer rigorous, and we simply have to deal with Boeing’s “meltdown”.

And yes they should be doing better but in the abstract, certification processes always need to be dealing with “uncooperative” participants who may want to conceal derogatory information or pencil-whip certification. You need to build processes that don’t let that happen and nowadays there’s so much of a revolving door that they can just get away with it. Like none of this would have happened with the classified personnel certification process etc - it is fundamentally a problem of a corrupted and ineffective certification process.

This decline in certification led to an inevitable decline in quality. When companies figure out it’s a paper tiger then there’s no reason to spend the money to do good engineering.

The FAA’s processes are both too strict and too lax - we have moved into the regulatory capture phase where they purely serve the interests of the industry giants who are already established and consolidated, and they now serve primarily to exclude any competitors rather than ensure consistent quality of engineering.

The specifics are less interesting than that high-level problem - there obviously eventually would be some form of engineering malfeasance that resulted from regulatory capture, the specific form is less important than the forces that produced it. And that regulatory capture problem exists across basically the whole American system. Why do we have forced arbitration on everything, why are our trains dumping poison into our towns? Because from 1980-2020 we basically handed control of legislative policy over to corporate interests and then allowed a massive degree of consolidation. Not that airbus is small, but the EAA isn’t regulatory capture to the extent of most American bureaus.


It's actually safer for new airplane types to have flying characteristics like the previous types. There have been many accidents where a situation happened and the pilot did the right thing for the previous airplane he flew, but was the wrong thing for the one he was currently flying.

Most of what was written about the MAX crashes in the mass media is utter garbage and misinformation. No surprise there, as journalists have zero expertise in how airplanes work.

Both crashes could have been easily averted if the crews had followed well-known procedures. There was also nothing wrong with the aerodynamics of the MAX, nor the concept of the MCAS system. The flaw was in the way the MCAS system was implemented, and the way the pilots responded to it.

For example, rarely mentioned is the third MAX incident, where the airplane continued normally to their destination. The crew simply turned off the stab trim system.

BTW, I had a nice conversation with a 737 pilot a few months ago. He told me what I had already concluded - the crashed crews did not follow the procedures. I've also had unsolicited emails from pilots who told me what I'd written about it was true.


Everything I wrote is true. The LA crew restored normal trim 25 times, but never thought to turn off the stab trim system. The trim cutoff switch is right there on the center console within easy reach for just that purpose.

The EA crew oversped the airplane (you can hear the overspeed warning horn on the CVR) and did nothing to correct it. This made things worse. They were also given an Emergency Airworthiness Directive which said to restore normal trim switches, then turn off the trim system. They did not.

That's it.

I'd say half the fault was Boeing's, the other half the flight crews'.

The MCAS is not a bad concept, note that MCAS is still there in the MAX.

Pilots are a brotherhood, and they don't care to criticize other pilots in public. But they will in private.


Everything you said might well be true, and indeed as far as I know it is, but aircraft should not have fail-deadly systems which require lightning reflexes and up-to-the-second training to diagnose and disable fast enough before they crash the freaking plane in the first place. Yes, the pilots of the affected flights might have been able to save the aircraft if their training had been just that little bit better. We'll never know. But the real blame falls squarely on the shoulders of Boeing for shipping such a ticking time bomb in the first place.

Which is why the entire worldwide MAX fleet was grounded for more than a year, and the regulators didn't just mandate a bit of extra training.

Coming up with this narrative about how it's the crew's fault because they failed to disable Boeing's quietly introduced little self-destruct system fast enough to save their own lives was a particularly despicable move from their PR department and I lost a lot of respect for them over that.


It did not require lightning reflexes or up-to-the-second training. The first LA crash came after the crew dealt with it for 11 minutes, and restored trim 25 times. The EA crew restored normal trim a couple times, and crashed after 3 minutes if I recall correctly.

As for training, turning off the stab trim system to stop runaway trim is a "memory item", which means the pilots must know it without needing to consult a checklist. Additionally, after the first crash, all MAX crews received an EMERGENCY AIRWORTHINESS DIRECTIVE with a two-step procedure:

1. restore normal trim with the electric trim switches

2. turn off the trim system

I expect a MAX pilot to read, understand, and remember an EMERGENCY AIRWORTHINESS DIRECTIVE, especially as it contains instructions on how not to crash like the previous crew. Don't you?

> might have been able to save the aircraft

It's a certainty. Remember the first LA MAX incident, the airplane did not crash because after restoring normal trim a couple times, the crew turned off the trim system, and continued the flight normally. They apparently didn't even think it was a big deal, as the aircraft was handed over to the next crew, who crashed.

> a bit of extra training

They are already required to know all "memory items".

> Coming up with this narrative about how it's the crew's fault because they failed to disable Boeing's quietly introduced little self-destruct system fast enough to save their own lives was a particularly despicable move from their PR department

AFAIK Boeing never did say it was the crew's fault. The "have to respond within 5 seconds" is a fantasy invented by the media. It is not factual.

Both Boeing and the crews share responsibility for the crashes.


> Both Boeing and the crews share responsibility for the crashes.

And I never said they didn't. I just choose to assign Boeing the lion's share of the blame, as they should never have let that rush-job, cost-cutting death trap of a machine take to the skies in the first place.

Anyway, I see you have your mind made up, so there's not much point in arguing further. If you feel like continuing, why don't you take it up with - let's see - every single global aviation regulator, who also somehow came to the conclusion that there was maybe something a little bit wrong with the type.


>Both crashes could have been easily averted if the crews had followed well-known procedures.

I thought that the majority of the problems was that Boeing wanted the same type-rating, so that airlines could avoid paying for training. This resulted in crews not getting proper training and so not knowing the proper procedures ... which was by decision.

Both the airlines and Boeing should take the blame; I don't really see how it would be the pilots fault, if you lie and say "it's the same plane, it flies the same, you don't need conversion training".

I am not in aviation, most of this is from YouTube sources, so y'know ...


Are you serious in saying that other industries could learn from Boeing?


Glancing at Walter Bright's brief Wikipedia page - I'd say he worked for Boeing well before they succumbed to the McDonnell Douglas Brain Fungus.


He didn't actually say that.


I think many of us are so used to working with software, with its constant need for adaptation and modification in order to meet an ever growing list of integration requirements, that we forget the benefits of working with a finalized spec with known constants like melting points, air pressure, and gravity.


Completely agree - I think it can go one of two ways. Software is more malleable than airplanes are and that also comes with downsides (like how much time and effort it takes to bring a new plane to the market)


I was just thinking of this metaphor today.

Try drawing the software monstrosity you work on / with as an airplane. 100 wings sticking out all different directions, covered with instruments and fins, totally asymmetrical and 5 miles long. Propellers, jets, balloons, helicopter blades.

Yep, it flies.

When it crashes, just take off again.


So software is my son's Bad Piggies flying monstrosity! You only left out the crates of TNT.


The article talks about a piece of software that partially failed, when they needed to calculate the braking distance for the overweight aircraft.


Airliners face constantly changing specifications. No two airliners are built the same.


Do you mean no two individual planes? Like two 767s made a month apart, do you mean they literally would have different requirements?


Yes. There are constant changes to the design to improve reliability, performance, and fix problems, and the airlines change their requirements constantly.


Neat little detail of the world Wikipedia once told me: the 00 suffix of classic Boeing planes, dropped in 2016, was substituted with Boeing assigned customer code on registration documents. e.g. a PAN AM 773-300 would have been 777-321, an Air Berlin Jetfoil would have been 929-16J, and so on.

1: https://en.wikipedia.org/wiki/List_of_Boeing_customer_codes


I think they means that airplanes are made in different versions, catered to particular airline. Also planes are constantly updated.

Two 767 made few months apart will have initial difference, like two different versions of java 8 SDK.


I think they meant a 737-400 is different from a 737-500 is different from a 787 and a AirBus 320 and a MD-80 and…

Every single model is somewhat bespoke. There’s common components but each ends up having its own special problems in a way I assume different car models in a common platform (or two small SUVs from competing manufacturers) just don’t.


It took hundreds of subject experts from ten organizations in seven countries almost three years to reach that conclusion.

Here at HN we want a post mortem for a cloud failure in a matter of hours.


> Here at HN we want a post mortem for a cloud failure in a matter of hours.

I'll go one further - I've yet to finish writing a postmortem on one incident before the next one happens. I also have my doubts that folks wanting a PM in O(hours) actually care about its contents/findings/remediations - its just a tick box in the process of day-to-day ops.


Something similar that struck me was that, in early February, Russia invaded Ukraine.

And then, I saw an endless stream of aggrieved comments from people who were personally outraged that the outcome, whatever it might be, hadn't been finalized yet at the late, late date of... late February.


I work at mid tier FAANG, our SLA for post mortems have SLA in the 7-14 day period. Nobody seriously wants a full PM in hours.

They may want a mitigation or RCA in hours, but even AWS gives us NDA restricted PMs in > 24 hours.


Apples to oranges


> To be able to zero in on what turned out to be a single faulty part and then trace the entire provenance and environment that led to that defective part entering service speaks to the robustness of the industry.

And to be able to reconstruct the chain of events after the components in question have exploded and been scattered throughout south-east Asia is incredible.


My impressiom was that the defective part was still inside the engine when it landed.


Makes it even more impressive: the parts that were actually implicated in the explosion itself (and scattered from the aircraft) were not defective, so the investigation had to go through parts which did not seem to have exploded in order to track down the defect.

Or at least, I assume the turbine parts weren’t defective, although given what seems to be quite a happy-go-lucky approach to manufacturing defects in Hucknall, maybe my assumption is not made on solid grounds…


Probably a reference to other incidents. Shout out to the NTSB for fighting off alligators while investigating this crash... https://en.wikipedia.org/wiki/ValuJet_Flight_592


Aviation is great because the industry learns so much after incidents and accidents. There is a culture of trying to improve, rather than merely seeking culprits.

However, I have been told by an insider that supply chain integrity is an underappreciated issue. Someone has been caught selling fake plane parts through an elaborate scheme, and there are other suspicious suppliers, which is a bit unsettling:

"Safran confirmed the fraudulent documentation, launching an investigation that found thousands of parts across at least 126 CFM56 engines were sold without a legitimate airworthiness certificate."

https://www.businessinsider.com/scammer-fooled-us-airlines-b...


Admiral Cloudberg has covered a case where counterfeit or EOL-but-with-new-paperworks components were involved in a crash.

https://admiralcloudberg.medium.com/riven-by-deceit-the-cras...


I suspect this is precisely what is happening in Russian civil aviation now. No legit parts supplied, so there will be a lot of fake/problematic parts imported through black channels.


The Checklist Manifesto (2009) is a great short book that shows how using simple checklists would help immensely in many different industries, esp. in medical (the author is a surgeon).

Checklists of course are not the same as detailed post-mortems but they belong to the same way of thinking. And they would cost pretty much nothing to implement.

Also CRM: it's very important to have a culture where underlings feel they can speak up when something doesn't look right -- or when a checklist item is overlooked, for that matter.


Yes, but they do have one critical failure mode: that the checklist failed to account for something (or that an expected reaction to a step being performed didn’t occur).

I was a submarine nuclear reactor operator, and one of my Commanding Officers once ordered that we stop using checklists during routine operations for precisely this reason. Instead, we had to fully read and parse the source documentation for every step. Before, while we of course had them open, they served as more of a backstop.

His argument – which I to some extent agree with – was that by reading the source documentation every time, we would better engage our critical thinking and assess plant conditions, rather than skimming a simplified version. To be clear, the checklists had been generated and approved by our Engineering Officer, but they were still simplifications.


If the alternative to the check list is reading the full documentation, that's one thing. But in my experience -- as a Software Engineer, and random dude on the Internet -- the alternative is usually no check list or documentation.


For sure – short of large and well-supported projects like Django et al., docs are notoriously incomplete if present at all.

Even then, you have to get people to read them, which is somehow a monumental task. Docs? Nah, lemme read this Medium blog instead.


Checklists are great if you use them properly: to make sure you remember. Checklists are dangerous when they are used improperly: to replace or shut-down critical thinking.


A colleague of mine came from a major aviation design company before joining tech and said they were in a state of culture shock at how critical systems were designed and monitored. Even if there are no hard real time requirements for a billing system, this guy was surprised at just how lax tech design patterns tended to be.


If 200 people died after a db instance crashed, software would be equal in that regard.


To prove this, software that deals with medical stuff is somewhat more like aviation.


Also, aviation and software aren't orthogonal. E.g., the article mentioned that part of the reason the pilot was able to sustain a very narrow velocity window between stall and overrunning the runway was because of the A380's fly by wire system.


Yep. Insulin pumps can kill their owner and the software updates need to be FDA approved:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4773959/


Likewise, in "aviation" when the entertainment system completely fails in a 4 hour flight, there is most like no post mortem at all. They turn it off/on again just like most of us.


This is true in a lot of industries. Unless there’s 7+ figure costs or significant human losses, there’s usually not an exhaustive investigation to conclusively point to the exact cause and chain of events.


Some people who think this is ideal for any sort of software tech sound they would also want a 3 hour post mortem with whoever designed the rooms, after slightly stubbing a toe.


This kind of makes sense, but it is only possible because of public pressure/interest. Many people are irrationally emotional about flying (fear, excitement etc.), that's why articles and documentaries like this post are so popular.

On a side note, that's also why there's all the nomsense security theater at airports.


> robustness has less to do with the number of mistakes but how one responds to them

It must have something to do with the number of mistakes, otherwise it's all a waste of time!

It's all well and good responding to mistakes as thoroughly as possible, but if it's not reducing the number of mistakes, what's it all for?


> It must have something to do with the number of mistakes, otherwise it's all a waste of time!

Not really. Imagine two systems with the same amount of mistakes. (Here the mistakes can be either bugs, or operator mistakes.)

One is designed such that every mistake brings the whole system down for a day with millions of dollars of lost revenue each time.

The other is designed such that when a mistake happens it is caught early, and when it is not caught it only impacts some limited parts of the system and recovering from the mistake is fast and reliable.

They both have the same amount of mistakes, yet one of these two systems is wastly more reliable.

> if it's not reducing the number of mistakes, what's it all for

For reducing their impact.


Aerospace things have to be like this or they just wouldn’t work at all. There are just too many points of failure and redundancy is capped by physics. When there’s a million things which if they went wrong could cause catastrophic failure, you have to be really good at learning how to not make mistakes.


> you have to be really good at learning how to not make mistakes.

Not exactly. The idea is not not making mistakes, it's whatcha gonna do about X when (not if) it fails.


> Being an SRE at a FAANG and generally spending a lot of my life dealing with reliability, I am consistently in awe of the aviation industry. I can only hope (and do my small contribution) that the software/tech industry can one day be an equal in this regard.

There's a slight difference in terms of what kind of damage an airplane malfunctioning causes compared to a button on an e-commerce shop rendering improperly for one of the browsers. My point is that the level of investment in reliability and process should be proportional to the potential damage of any incidents.


I agree, and also I enjoy the attitude. While in my profession the postmortems goal is finding who to blame, here the attitude is towards preventing it to happen again, no matter what. Or at least that’s how I feel.


Your profession? Or you mean your company? Unless it's a very specific profession I would not know, it would usually imply that the company is dysfunctional.


Richard Hipp talks a lot about how SQLite adopted testing procedures directly from aviation.


> I can only hope that the software/tech industry can one day be an equal in this regard

I’d love to be an engineer with unlimited time budget to worry about “when, not if, X happens” (to quote a sibling comment).

But people don’t tend to die when we mess up, so we don’t get that budget.


Hard agree. Civil & mechanical engineering have a culture and history of blameless analysis of failure. Software engineering could learn from them.

See the excellent To Engineer is Human in just this topic of analyzed failures in civil engineering.


By not reading past the word "Autopilot" in the product description and ticking the Accept Terms box instead of having their personal lawyer review it, like the rest of us do.


> instead of having their personal lawyer review it, like the rest of us do.

Wait, do you all have personal lawyers that you pay to review every (or any) EULA you accept? I clearly do not make enough money to hang out on this site.


Well I generally find it much harder to remain attentive when using an autopilot. I assume I'm not unique in the regard (i.e. it's an "autopilot" but technically you must pay as much attention to what happening if you were driving yourself. What would be the point of such feature? Well.. obviously companies obviously bring this up when something bad happens and not in their marketing material).


The problem is the car doesn't let you ignore it. You need to perform some kind of "hey, are you paying attention?" input on the steering wheel what seems like every 30 seconds or so(not sure of the exact numbers). Maybe the driver just happened to doze off in that intervening 30 seconds...at which point, he would have crashed anyway.


- Entire site-to-site tunneling/routing. I didn't have to do anything for my parents I just dropped a subnet router at their place.

- Access my services/servers at home from anywhere in the world. Friendly mobile apps as well that allow the same.

- In cloud environments (for work and fun), don't even bother provisioning public IPs and having to deal with those firewall rules, just use Tailscale

- https://tailscale.com/blog/tailscale-auth-nginx/ describes how you can integrate nginx proxying with Tailscale auth to both leverage SSO and the authenticated endpoint

- I have a bootmod3 WiFi adapter plugged into my street/track car with a combo 5G/Linux unit in the car connected to my Tailscale that streams continuous telemetry about the car whenever its turned on. I could in theory re-flash the ECU via this.

- Using https://tailscale.com/kb/ondemand-access/ alongside node/subnet grouping to create a very neat first step towards auditing access to sensitive production services/environments.

- I use server-based dev environments to keep my portable laptop as clean as possible with no source code on it. VS Code remote + Coder server are fantastic over Tailscale.

+ others. Tailscale I think solves the problem of node-to-node-to-subnet connectivity at a convenient and flexible layer.


"- I have a bootmod3 WiFi adapter plugged into my street/track car with a combo 5G/Linux unit in the car connected to my Tailscale that streams continuous telemetry about the car whenever its turned on. I could in theory re-flash the ECU via this."

Do you have a writeup or more details you can share around this? This sounds interesting.


lol it sounds like a line from a Fast & Furious movie


That sort of stuff is pretty common. Car guys have lots of disposable income. I'm certain there are devices out there that provide levels of telemetry that was only accessible to top-end racing teams just a decade or two ago.


> Entire site-to-site tunneling/routing. I didn't have to do anything for my parents I just dropped a subnet router at their place.

Can you elaborate? What do your parents need tailscale for? I mean my parents have internet purely by the telco dropping a router at their place and it just works, what is my family missing?


Best guess is OP is hosting files or services that are shared with less tech-savvy parents. Similar to our setup. My son is away at college but still wants access to his music and movie collection on our NAS at home along with some other services. He setup a Tailscale connection and everyone is happy. I don't have to manage any of it and he doesn't have to work around the school's firewall and network architecture.


Mostly standard VPN use cases. They can access my Plex server, Mealie instance and in turn I can remote access their devices without something like TeamViewer when they need IT Support or their home automation stuff is acting up.

Would their lives fall apart without it? No. But it makes my life as the family SRE much easier.


This is no longer a problem for me since I switched my parents from windows to mac, but remote desktop login to troubleshoot their problems would be a huge bonus.

Other cool things I could do if I dropped a raspberry-pi w/ tailscale onto their network:

- Need another public IP to test something? Route my laptop through their network for awhile.

- share files with them or backup some of their devices to a fileserver I control.

- send print jobs to their printer, I don't keep a printer but they do because.. and I shit you not, they hate doing crosswords on their ipads, they print the damn things out every morning and work them on paper.

- Put it on their phones and have them route their requests through one of my exit nodes.


In my case that's actually multiple functions: remote login without using TeamViewer and also for general remote support, and I have a small backup server at the place for my off-site backups.


What is it that Tailscale provides over plain vanilla wireguard? Is it a static address somewhere to connect to?


It provides a consistent IP address (in the CGNAT range) that the end-device is always reachable at. On top of that you can use MagicDNS or regular DNS records to refer to it.

That IP is usable regardless of how that device and your device actually reach the internet. Further, no one device acts as a “server” and needs a stable public IP thanks to NAT traversal and the DERP fallback path. Keys are handled automatically with an option to not trust Tailscale infra in doing that (Tailscale lock) and I just need to auth devices with my Google Workspace/Gsuite SSO.


Plain vanilla wireguard involves a bunch more faffing about with wg, wg0 and keys. With Tailscale, you (can just) install the software on each computer and then log in. There are also more advanced things you can do with Tailscale, but I chose Tailscale because of wanting to not have to deal with the setup like Wireguard (or OpenVPN) have.


- Key distribution - DNS for your nodes - IP addressing - SSO integration

and so much more


> What isn't easy about forwarding packets destined for port 80/443 of your public IP to the local service in question and being a part of the public Internet like things were from the start?

- Not every home internet service gets a publicly routable IPv4 address anymore (e.g. CGNAT)

- Not every home internet service gets a static IPv4 address so folks have to handle DynDNS

- Not everyone is comfortable exposing their home network IP address in DNS (Tailscale only shares the endpoint IP once the endpoint is auth'd onto the network)

- Not everyone is comfortable configuring heavy auth/fail2ban/app layer safeties (Tailscale makes the services uncontactable unless you are auth'd into the Tailscale network)

- Not everyone is comfortable/can be bothered configuring Wireguard in highly dynamic environments

> Using Tailscale is the opposite of self-hosting, you're bringing someone else's third party service in, and adding more complexity and another point of failure.

Self-hosting need not be a zealot position - rather one can pick and choose what makes sense for them. Tailscale allows you to build your own network where all the nodes are auth'd (and tailscale lock means you don't even need to trust their keys by default) and non-public internet routable but still globally reachable from known safe devices. This can actually make folks more comfortable with self-hosting their own stuff since it removes so many other considerations. There is also headscale if folks want to self-host the coordination server.

Some argue that a third party service adds complexity and a point of failure. I'll point out that configuring a self-hosted publicly exposed thing from scratch for the first time has a rabbit hole of unknown complexity to the uninitiated. A tool like Tailscale can remove some of those complexities allowing focus on others.


>- Not every home internet service gets a static IPv4 address so folks have to handle DynDNS

For anyone who has only this specific problem out of your list, one solution is to get an HE tunnel. It's what I do.

If my ISP ever gets off its ass and implements IPv6 like it promised three years ago, I'll consider using that directly, though its current indication is that the IPv6 addresses will be dynamic for non-business customers which defeats the purpose.


I have gigabit fiber and it's IPv4 only. My ISP blocks incoming ICMP messages so I can't set up a HE tunnel. I used to use Route48, but they shuttered due to abuse, so I don't know what to do anymore.


A non-free solution would be to have a VPS or a cloud VM act as the public endpoint + wireguard server.


Wireguard config is few lines (interface addresses, keys, AllowedIPs, post up and down). Simpler than SSH. You can run it on a cloud instance close to users.

Tailscale is still simpler and provides additional features. A small team or startup will appreciate Tailscale’s access controls.


For a pure client/server VPN between two devices sure, but I think that’s where the equivalence between Tailscale and “some lines of config” end.


In seeing all of these complaints about the cloud providers getting too big, owning the internet etc, have folks forgotten how to run their own equipment or what renting a dedicated server is?

The internet did function prior to cloud computing existing and surprise, running a rack full of NUCs can give you a lot of capacity to run a glorified message board.


Let’s be clear, access to cloud computing companies like AWS is a competitive advantage because it requires less upfront capital and less engineering time to manage. I can 100% see why Parler would prefer having access to AWS compared to running their own hardware.

Ianal, so I can’t judge the merits of this legal case. To be honest I’m surprised Parler can find lawyers willing to bring it to court on their behalf, or banks willing to process transactions to pay said lawyers.


“I’m surprised Parler can find lawyers willing to bring it to court on their behalf, or banks willing to process transactions to pay said lawyers.”

Personally, I am surprised at the opposite. That a company cannot find lawyers to defend their position or process payments from their customers without due process is horrible.

The fact that you were not surprised by it is quite worrisome. I don’t mean that as a personal attack on you, but more as an observation of what has become acceptable or at least ordinary in a society.


I think you misunderstand me. I was surprised that the domino effect started with Parler, but that was several days ago. Now I’m surprised that the domino effect hasn’t reached everyone they do business with.


Seriously? Any half decent administrator knows that for every SSL certificate you install, you set a date some time in the future where you need to change it out for a new one.

To your comment of "one day the system works normally and is deemed secure, and the next day it is so insecure and dangerous", this is working as intended. The certificate is to establish trust and identity along with encrypting the data in transit. The identity described by the certificate is only valid up until the expiry date after which it ceases to be valid for that purpose. You now have an encrypted connection to something that can't prove its identity, which is certainly a lower level of "secure" than what it was before the certificate expired.

Ignoring expiry dates would mean any keys that were compromised ever could be used for MITM attacks and no one would be the wiser.


Expiry dates are in years, if a key is compromised, then an adversary has _years_ to exploit a MITM.

However we already mitigate this with revocation lists. But if we can revoke certificates why do we have expiration dates?

Seems to me expiration dates are rent seeking behaviour by certificate vendors.


This is a reason to shorten expiration times, not remove them (which companies like LetsEncrypt are doing)


Why. If revocation already works why bother with expiration?


One good reason is that if you buy a domain name that somebody else has used in the past, they don't have an infinite valid SSL certificate for your domain.


Would it not be possible to expire the cert if the domain expires?


No, that would be "revocation". Expiration is relatively easy to implement because the expiration date is known in advance and so you can simply put the expiration date in the certificate when it is issued. Revocation is relatively difficult because you need to continually check some database for revocation information — that's where CRLs, OCSP, and the like come in. And there's a lot of complexity under that hood, which, once the dust settles, boils down to just issuing very-short-lived certificates under a different guise.


No. The certificate's expiration is fixed at time of issuance. You could set the expiration of the certificate to the expiration date of the domain, but the domain could be transferred, cancelled, or revoked before the expiration.


Just a few years ago, almost nothing checked the revocation lists. I revoked certs for some popular domains and was concerned about ssl caches and proxies... turns out, an owl heard it. An odd dog barked. No impact. Not even from the folks that embedded our certs onto their servers for legacy code reasons. Perhaps this has changed over the last couple of years.


Yep. But that's an implementation problem. Revocation is a critical security feature. Complaining that people didn't used to check it is like complaining that people didn't used to encrypt.

You're not secure if you don't check revocation.


I agree with this. For the record, I am not complaining. :-) I just like to share my experiences of how things worked verses how they were intended to work.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: