Half of European flights delayed due to system failure

tim333 · on April 4, 2018

Apparently the Enhanced Tactical Flow Management System (ETFMS) packed up.

>“ETFMS facilitates improvements in flight management from the pre-planning stage to the arrival of the flight. It maximises the updating of flight-related data and thus improves the real picture of a given flight, thereby contributing to the Gate-to-Gate Concept,” Eurocontrol explains on its website.

>The agency initially reported that contingency procedures were immediately put in place which reduced the capacity of the European network by around 10 per cent. http://www.airtrafficmanagement.net/2018/04/eurocontrol-give...

They don't seem to say what went wrong but do say

>In over 20 years of operation, the ETFMS has only had one other outage which occurred in 2001. The system currently manages up to 36,000 flights a day.

Tech details from wikipedia fr:

Written in ADA , and running on HP-UX , the system is based on an exchange of messages between the airlines (who will file / change / update flight plans), the air traffic control bodies and the CFMU , the messages are written in ADEXP format.

ETFMS uses at least 5 fundamental notions:

flight plan : describes the 4D trajectory of an airplane. regulation: aircraft rate applied to a "volume traffic". Example: 50 aircraft / hour "Traffic volume": association of a geographical reference (air sector, waypoint, airport, etc.) and a set of aircraft flows. list of takeoff slots or called slot . Example: if the rate of regulation is 30 planes / h, there will be a slot every 2 minutes: 10 hours, 10 h 2, etc. the delay: difference in time between the take-off time desired by the company and the schedule calculated by ETFMS.

makmanalp · on April 4, 2018

It would be fascinating to read about what bug took down a system with one outage in 20 years. I remember reading in the chubby paper (from google) that user error was the cause of more than half the downtime incidents. Wonder if that's the case here too.

dx034 · on April 4, 2018

If I remember correctly, a similar outage in the UK was due to increased flight volumes. The system had a hardcoded limit somewhere which caused issues as flight volume increased. The software was old enough that they weren't aware of that issue beforehand. I could imagine something similar happening here.

It definitely shows how well software can work with the right practices. Only one outage in 20 years and that one only caused a reduction of 10% in capacity. Don't think many companies can match that.

carlmr · on April 4, 2018

>It definitely shows how well software can work with the right practices. Only one outage in 20 years and that one only caused a reduction of 10% in capacity. Don't think many companies can match that.

This is what happens when you build software with the same meticulousness as other engineering disciplines. However a lot of software is so much more complex than what other disciplines can build (because you can build anything you can imagine), coupled with early deadlines and profit pressure, that it's unrealistic for most software to be developed this way. You easily have a cost factor of 100x in time as well as money.

Cthulhu_ · on April 4, 2018

I'm talking out of my ass here, but I can imagine that the focus for this software system was very sharp and hasn't changed (much) in the past 20 years. When you have a product with tight focus, you can polish the shit out of it and make it last 20+ years. A bit of the Unix mentality.

Most products today - and stuff on HN is at the forefront of that - is much more about selling a product to a lot of people, often in highly contested markets. That is, if you make a product, focus on just the core and don't do anything else, you'll stagnate and die.

Then again / on the other hand, there's Dropbox whose core functionality has not changed as far as I can tell for a decade - it still does the exact same thing as when I first installed it. Spotify whose IPO was today that doesn't seem to change its core model. In both cases though, I don't know where they put all their development capacity; probably on back-end systems / scalability, marketing, and new applications (like dropbox creating a google docs competitor).

xcvbxzas · on April 4, 2018

I see this sentiment a lot. Intuitively it seems like it should be true, but I don't think the case is really quite so clear cut.

The costs involve way more than just the initial development. Maintenance eats up a huge, perhaps even a majority, of the total cost as well. And outages or other failures can be very expensive too.

It's also important to keep in mind that this isn't an all or nothing situation. We can have software that is more reliable without asking that it chug away without issue for a decade, or anywhere near as long as we expect bridges or buildings to last.

The process of developing more reliable software isn't necessarily more expensive than less reliable software. It can even be cheaper. I'm struggling to find the links (maybe somebody else has them handy, or I'll edit them in if I find them), but there have been a few case studies done a few years back by companies that moved to using Ada. In addition to the benefits of more reliable software, they also found development costs were better or at least no worse than C. I know that isn't exactly the language to compare to these days, but as I said these were done some time ago.

This is just my own argument, but I suspect that's because the same problems that ultimately cause problems after release also cause problems during development. With a more reliable programming system/environment, problems that might show up later during development are shown to be an issue immediately. This means the issue doesn't need to be tracked down, which can take some serious time. The developers are even fresh on problem area.

Personally speaking, I've been totally won over by Ada. It ain't perfect, but it's a hell of a lot better than anything else I've seen - and I've looked a lot. In my own projects (mainly personal or for school admittedly) development is much easier and ultimately quicker. I don't have to spend a day tracking down a weird bug because the compiler let's me know about the issue as soon as I try to cause it.

carlmr · on April 4, 2018

>The process of developing more reliable software isn't necessarily more expensive than less reliable software. It can even be cheaper. I'm struggling to find the links (maybe somebody else has them handy, or I'll edit them in if I find them), but there have been a few case studies done a few years back by companies that moved to using Ada. In addition to the benefits of more reliable software, they also found development costs were better or at least no worse than C. I know that isn't exactly the language to compare to these days, but as I said these were done some time ago.

I can believe that. Ada catches a lot of errors you would normally only notice by extensive testing at compile time. You're preaching to the strong-static typing choir here. I believe Ada and Rust could solve a lot of problems of companies working with C/C++ and make development cheaper. You can properly model your domain and abstract without sacrificing safety.

I'm also a strong believer that TDD makes you much faster and safer in the long run.

My experience tells me that most tools, languages or methods that catches errors earlier will save money.

Ada also has the best tested compiler I can think of.

However my larger point was about the engineering processes not the language itself. I think with languages and tools you can make it easier to make good software. The 100x time and cost is more in the sense of process changes when you're working on safety critical systems. How everything has to be traceable from requirement to test, how there are mandatory reviews before any code change that need to be documented, how there are qualification criteria for the toolchain, etc. All these things cost a lot of time and manpower, with arguably very bad cost-benefit analysis, which is only really worth it when human lives are at stake.

xcvbxzas · on April 5, 2018

> The 100x time and cost is more in the sense of process changes when you're working on safety critical systems. How everything has to be traceable from requirement to test, how there are mandatory reviews before any code change that need to be documented, how there are qualification criteria for the toolchain, etc. All these things cost a lot of time and manpower, with arguably very bad cost-benefit analysis, which is only really worth it when human lives are at stake.

Absolutely. That's part of what I was getting at by mentioning all of this exists on a continuum. We don't need to, and really shouldn't, treat a SaaS startup exactly the same as a military aviation project.

We can, however, draw from the lessons learned on those safety critical projects and use parts of the process that make sense for the nature of whatever we're actually working on.

You're right that in general I suspect that comes down to strong static typing, particularly for the sorts of projects common to the HN crowd. When dealing with very large enterprise projects the balance might start to shift to more than just typing, though it would probably take a lot of real-word data that nobody is keen to supply to figure out where the tipping points are.

And I'd argue about how well Rust actually helps with these things, but that would really be going off the rails. Unfortunately.

colechristensen · on April 4, 2018

Anyone could, few care to. They really are the wrong practices for a whole lot of things. The level of care, design, and verification just isn't necessary for applications with few or fixable consequences.

It is a bit sad that nearly nothing these days strives for that kind of excellence.

tim333 · on April 4, 2018

Yeah, I recall last year

"BA’s £150,000,000 outage was caused by someone turning computers on and off too quickly"

I think companies often try to stay quite about the dumb stuff.

makmanalp · on April 4, 2018

Well - I wouldn't call it dumb stuff. After all, it's only a matter of time one of us does something truly stupid :-) Even more so when under stress and pressure when shit hits the fan, which is usually when human operators have to intervene. It's part of building reliable systems to reduce the chances of operator error. See: the Hawaii missile alert bug! It must be truly terrifying to sit at that particular keyboard typing in any command.

colechristensen · on April 4, 2018

Read "Inviting Disaster" for a lot more about this topic.

Many very high profile disasters were caused by operator error or more precisely complex systems not designed for what you might call failure ergonomics.

isostatic · on April 3, 2018

> Eurocontrol announced the system restart later in the day, after what it called extensive testing.

So they turned it off and on again.

Numberwang · on April 3, 2018

Bastards.

_jbez · on April 3, 2018

Its not brain surgery!

_jbez · on April 4, 2018

Whats with the downvotes? That guys username is and my comment are both references to a TV show.

AndrewDucker · on April 4, 2018

That may be part of the problem.

Silly jokes don't go down well on HN, as they're distractions from the conversation. It's considered "Reddit-like" behaviour, and discouraged through downvotes.

_jbez · on April 4, 2018

Are you saying humour isn't allowed on HN?

nkurz · on April 5, 2018

Humour is allowed, and so is downvoting. In practice, the humour has to be to be really good to avoid being mercilessly downvoted. Meta-humour based on usernames and TV shows usually doesn't cut it. Many of us view this as a good thing.

isostatic · on April 4, 2018

Decade old UK tv series on BBC two? I imagine most have never heard of it.

I was thinking more of an even older show:

https://www.youtube.com/watch?v=nn2FB1P_Mn8

_jbez · on April 4, 2018

It's very common that someone hasn't heard of everything. It's why down votes without explanation are bad, like on StackOverflow.

It was in reference to That Mitchell and Webb Look:

https://youtu.be/AIxz6BDmTNU

https://youtu.be/THNPmhBl-8I

isostatic · on April 4, 2018

Indeed, and as usual XKCD has an explanation for the "someone hasn't heard of everything" phenomenon.

https://www.xkcd.com/1053/

However HN does seem to curate specific on topic informative posts, especially early on in the conversation (when everyone is reading, rather than just people looking for replies to their comments).

Overall I think I agree with the policy.

tnolet · on April 3, 2018

5hr delay on a 1,5hr flight from Berlin to Paris. If they'd use Docker with React this would have never happened!

reificator · on April 4, 2018

> If they'd use Docker with React this would have never happened!

It's probably because they wrote it in Go. Did you know Go doesn't even have generics?

jfktrey · on April 4, 2018

What do you mean? Go has generics. Kind of.

https://www.reddit.com/r/rust/comments/5penft/parallelizing_...

Pxtl · on April 4, 2018

Wait, is that string using inuktitut characters as fake angle-brackets? That is monstrously evil.

nkurz · on April 4, 2018

Yup, that's even evil enough to quote in full:

  [–] pftbest 114 points 1 year ago 
  can you please explain this go syntax to me?
  type ImmutableTreeListᐸElementTᐳ struct {
  I thought go doesn't have generics.

  [–]Uncaffeinated[S] 239 points 1 year ago 
  It doesn't. That's just a "template" file, 
  which I use search and replace in order to
  generate the three monomorphized go files.
  If you look closely, those aren't angle brackets,
  they're characters from the Canadian Aboriginal
  Syllabics block, which are allowed in Go identifiers.
  From Go's perspective, that's just one long identifier.

https://www.reddit.com/r/rust/comments/5penft/parallelizing_...

dx034 · on April 4, 2018

Wow, that's the best/worst thing I've read in a long time. Imagine you'd have to maintain code using those without knowing it..

Sharlin · on April 4, 2018

This is a great demonstration of Poe’s law.

scandox · on April 4, 2018

No matter how many times I am a victim of Poe's law I seem never to learn my lesson.

emersonrsantos · on April 3, 2018

For those curious, Eurocontrol MUAC (Maastricht Upper Area Control Center) migrated to 50 virtual SUSE Linux Enterprise servers running under IBM z/VM hypervisor on a IBM z196 mainframe system in 2013.

justadudeama · on April 3, 2018

Does any of this effect the nature of the failure? Have they said what went wrong?

dx034 · on April 4, 2018

No but the fact that this is the first outage in 20 years could indicate that mainframes still have their place if you need 100% reliability.

Which makes plans of banks to move core systems from mainframes to the cloud even more worrying.

reaperducer · on April 3, 2018

Likely not. Just geek curiosity.

/Manufactured in Fishkill, New York. Color me surprised.

amaccuish · on April 3, 2018

anywhere we can read up on this? :)

christoph · on April 3, 2018

https://www.suse.com/success/stories/eurocontrol/

ssl232 · on April 3, 2018

I was on a plane just about to push back at Heathrow, and the pilot informed us we'd be delayed 15 minutes due to this failure. In the end it was 10 minutes, and we landed only 5 minutes late at my destination. Doesn't appear to have been a big deal, at least for me.

dx034 · on April 4, 2018

10% reduction probably meant that they tried to keep most flights on time and "strategically" delayed some flights. Makes it worse for some passengers but keeps knock-on effects for transfer passengers under control.

dghughes · on April 4, 2018

I wonder if that is due to Heathrow using Time-Based Separation software.

https://www.airport-technology.com/news/optimised-runway-too...

hightowk · on April 4, 2018

Technically, Time Based Separation helps only arrivals since it tries to negate the effects of headwinds on final approach, but it does add a lot of resilience to the airport to absorb delays that would normally ripple to departures.

dghughes · on April 8, 2018

I'm pretty sure it doesn't I was reading about it and it mentioned even when the pilot can start the engines is calculated. It saves millions of liters of fuel just waiting a few minutes.

There is a digital display facing the pilot showing him when he can depart. Even which size aircraft are allowed to depart. It's extremely well organized.

icebraining · on April 3, 2018

"departures are now limited to 10/hour at #brusselsairport"

Impressive that they maintain this rhythm even during a once-in-a-decade unexpected system malfunction.

ttul · on April 3, 2018

I guess that’s the “paper and pencil” speed.

njitbew · on April 3, 2018

Yes, this sucked. I just had a 1 hour delay on a 1 hour flight (AMS-ZHR). Unfortunately, no compensation until its a 2 hour delay (but thank god it was only one hour!). Passengers who had a layover were noticeably less happy.

CaptainZapp · on April 4, 2018

You'd probably won't have been eligable for compensation since this delay was definitely beyond the airline's control.

While a lot of airlines try to weasel out of their obligations (mechanical failue, for example, which however is the airline's responsibility) I would think such a case is pretty clear cut.

tvanantwerp · on April 4, 2018

I was at a conference in Northern Virginia some months ago and saw a presentation from the folks at Upside, a startup specializing in booking business travel. They described the legacy system which handles pretty much all booking in the US, a system called SABRE. They described it as an ancient 6-bit computer system in Texas with no modern API. Everything they do tech-wise is a modern wrapper around that system. So I'm not at all surprised by any air travel computer failures if tech like that is central to the system.

tyingq · on April 4, 2018

You're talking about TPF[1]. Many smart people and organizations have tried and failed to build something that could match it, including a company Google paid $700 million to buy. I personally know of at least 10 failed attempts :)

Not sure where "6 bit" is coming from though, and you can use gcc/c++ now, not just assembler[2]. And it's in Tulsa, Oklahoma, not Texas. Sabre's HDQ is in Texas, the data center is not. The hardware is very modern and new Z series mainframes in big loosely coupled clusters.

Amadeus, Sabre's main competition, also still has TPF at the core.

There is one notable non TPF reservation system. http://www.navitaire.com/ Last I checked, it couldn't scale well enough to handle a large airline.

Both Sabre and Amadeus are replacing TPF, but one function at a time (shopping, fare engine, booking, check in, etc). And very slowly.

[1]https://en.m.wikipedia.org/wiki/Transaction_Processing_Facil...

[2]https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0....

Fwiw, TPF is basically a huge, distributed, and transactionally consistent nosql database. Most orgs still using it have extracted most of the business logic that was in it, out to Linux boxes that front it. Not for stability reasons, but faster time to market with new features. To date, attempts to replace the high contention and high transaction rate nosql type traffic haven't scaled well enough.

Just in the US, 2.5 million people fly each day. And the processes to sell, board, etc each passenger are lots of transactions each. It's a pretty big scale. I think it's fairly close to Amazon sales per day, but with more contention and sub transactions.

All Visa credit card transactions are also still on TPF.

userbinator · on April 4, 2018

I've heard of enough misguided "modernisations" (and failures thereof) that I think the "legacy system" was the part that stayed working throughout, and it's the newer stuff added around it that failed. The old stuff may be old but there's a reason it's old... it's outlasted any attempts at replacing it.

tyingq · on April 4, 2018

I can confirm this. It's usually the Java/Tomcat boxes that front the TPF box that cause this type of huge meltdown. It's almost never the TPF core. It happens, of course, but they've offloaded most of the functionality to more modern technology. So there isn't much code change in the TPF core. And code change is usually what drives outages.

Update: Seems this incident wasn't the reservation system at all. It was ETFMS. Government run system. http://www.eurocontrol.int/articles/enhanced-tactical-flow-m...

AceyMan · on April 4, 2018

>it's outlasted any attempt to replace it

See, https://en.wikipedia.org/wiki/Lindy_effect

(my fave citation from everyone's fave source for non-academic citations)

mseebach · on April 4, 2018

The counteracting force to that will be that they complexity of the surrounding environment increases and becomes more brittle, or is just simply in the way, as constantly increasing demands for new features drifts further from the capabilities of the old system in the middle.

A few years ago, SAS introduced a new status tier. It took 18 months to introduce into the system (Amadeus). The system may be stable, but those kinds of turn around times for a minor customer service change simply isn't feasible. I don't have numbers, but I wouldn't be surprised if one of the reasons upstart airlines such as EasyJet are competitive is that their IT is comparatively modern and can actively support the organisation while IT is more of a millstone around the legacy airlines' necks.

dhimes · on April 4, 2018

Wow. I remember sabre from genie- General Electric Network Information Exchange. In the 1980s. I’m not sure compuserve was invented yet. My father and I could use “electronic mail” to stay in touch. For you youngsters: I hsd never yet had a “remote control” for a tv yet. Yep. Had to get up to sdjust the volume or change the channel (don’t get me started on adjusting the antenna).

Wish I didn’t know this.

mikeash · on April 4, 2018

And it was already ancient then! Sabre started operating in 1960, and traces its origins to a chance meeting of an IBM salesman and the CEO of American Airlines on a flight in 1953.

dhimes · on April 4, 2018

holy crap! I had no idea.

ubernostrum · on April 4, 2018

In the air-travel world, a lot of seemingly-random limitations on systems that interface with reservations comes from the fact that they were originally built with telephone interfaces intended for use only by trained staff.

Every few years, for example, someone digs up and reposts one of the articles explaining why some airlines didn't allow 'Q' and 'Z' in account passwords (they were passing things directly through to SABRE on the backend, and so only allowed letters that could be "dialed" on the 1960s rotary phones SABRE was designed to interface with).

DavidAdams · on April 3, 2018

Can confirm. My flight from Zurich was delayed by over an hour today, causing my family and me to need to run, OJ Simpson-style, through the Philadelphia airport to make our connection.

candiodari · on April 5, 2018

No worries.

Everyone at that organisation is paid boatloads of money. [1]

They don't pay taxes on it. (~10%, "for solidarity", which means they get to enjoy healthcare paid by ~55% taxed nationals)

And 90% of the organisation (especially the management) has absolutely nothing, nothing whatsoever, to do with guiding planes anywhere. In fact, those departments are severely understaffed. The department doing "regulatory support" is about 2/3rds of the organisation (tldr: making sure half the local government officials don't have to get their own coffee - and before you say it, no, Eurocontrol employees don't get them coffee, they're merely in charge of making sure someone's there to get them coffee, and steak, cake, and ... The coffees, I might add, are baffling. Done from a steam boiler machine in front of you, with fair trade beans, sweetened not with sugar, but with expensive imported bars of chocolate meled in milk that's frothed in front of you (they put in the chocolate somehow while they're frothing the milk with steam, melting it in while not getting the steam on the chocolate somehow), and you get the rest of the case of that (expensive) chocolate to take home for the kids. No, not when you ask, they'll ask you if you want that. Btw, it's not really the rest of the case they give you, you get a fresh case. Oh and of course, of the bar they opened they prepare just one coffee (about 1/4th of the bar). The rest gets thrown into the trashcan, they don't use the same bar for the next coffee. As for the steaks ... oh my God)

And in case you're wondering: the odds of 2 planes colliding with zero guidance outside of the ATC zones around airports (which aren't covered by Eurocontrol) over even a region as big as Europe are more than 10 billion to 1, against, per year. So if Eurocontrol didn't exist at all, and we just allowed every plane to fly wherever ... nothing would go wrong at all.

So ... what is the problem here ? Disruption of millions of travelers for no reason whatsoever ? Let's please not pretend anyone at Eurocontrol cares (well, they care about not being interfered with, and that will make them care NOW, but if one thing's guaranteed it's that the Software/ATC departments will remain the same size, and only the bribery departments will grow)

[1] http://www.eurocontrol.int/sites/default/files/content/docum...

kzrdude · on April 3, 2018

Just like there are no bugs without security impact, is it real to say that this has no impact on flight security? Any error can be a contributing factor.

webreac · on April 3, 2018

Safety is ensured by ATC controlers. ETFMS is there to ensure that traffic does not increase beyond controlers capacity. I have read that without ETFMS, the traffic is reduced by 10%. I have been involved in ATC simulations where controlers had to land about 40 flights per hour using new procedures and tools. We have tried with 38 flights per hour, it was too easy for the controlers: their work was perfect even without tools. With 42 flights, controlers were getting angry because the traffic could not be managed. At 40, we could see the benefits of out new procedures and tools (more regular separation of flights). IMHO 10% less traffic gives far enough capacity margin to ensure safety.

gsich · on April 3, 2018

Similar to every incident at a nuclear power plant. "no danger for the population"

thaumasiotes · on April 3, 2018

> Just like there are no bugs without security impact

I'm compelled to once again bring up the report that went "when I zoom in, the text becomes blurry".

ehudla · on April 3, 2018

Is that the Ada code base?

maartn · on April 4, 2018

So Trump and Putin finally hooked up. No need to get all SuSe over it... nerds

dang · on April 3, 2018

Url changed from https://www.theverge.com/2018/4/3/17193814/eu-eurocontrol-br..., which points to this.

jumelles · on April 3, 2018

I fear this sort of thing is going to become more and more common at airports.

isostatic · on April 3, 2018

Why?

Sure, the skies are more and more crowded, meaning more and more people will be affected by a once-in-20 year failure, but why would it happen more and more common?

newnewpdro · on April 3, 2018

If the reliability stays constant expressed as a rate (arguably optimistic), like ".1% failure per X flights", and the # of flights increases, then failures become more frequent.

Failure rates must decrease proportionally to the utilization/capacity increase for the # of failures to stay constant. Otherwise, the # of failures will naturally increase, it's simple multiplication.

For example:

If crime rates stay constant, but your population increases, you have more crime - not the same amount of crime as before. you have more terrorists, you have more school shootings, because your population doubled in size and you didn't cut the rates of abhorrent behaviors in half.

Air travel has _exploded_ in popularity, this is why you see all sorts of things more frequently than you did before. From reports of airline passengers being dragged off planes in totally asinine scenarios to deaths of pets in the cabin.

isostatic · on April 4, 2018

You're assuming the failure rate is related to the number of flights. With 2 failures in 20 years there's not really enough data to extrapolate.

And remember this wasn't a complete system outage - it affected 50% of flights, a degraded service.

In comparison the vast majority of startups that HN people work in will be lucky to last 2 years. In the last 10 years Gmail has been down, well not sure. It was down 6 times in one year. Uptime is apparently 99.8%, or down for 175 hours a decade, so 6 hours seems quite reliable.

Level 3 went down last month. And in 2016. And in 2015.

Having a system having to work for decades without downtime is unheard of in the web world, even in simple systems like gmail and AWS.

SteveNuts · on April 3, 2018

Well the archaic systems are getting older and older. From what I've heard it's almost impossible to get replacement parts for some of the computer systems that run airports/airlines.

tyingq · on April 4, 2018

Has not been my experience. 2 decades in the airline space. It's mostly uninformed people assuming that TPF is the issue or that it runs on old hardware. Neither is true.

In this case, for example, it wasn't the airline. It was Eurocontrol (https://www.eurocontrol.int)

Airline tech outages are just more visible than outages in other industries. Like manufacturing, etc. It doesn't make the news when all your manufacturing plants halt for 6 hours due to tech issues. Airlines could improve, sure, but pundit observations are usually off base. They are also dependent on systems they don't control, like ATC, TSA, Airport owned kiosks, and so on.

icebraining · on April 3, 2018

Not sure about aviation, but in other industries they virtualize the old hardware on a new architecture, and keep the software going.

the_mitsuhiko · on April 3, 2018

Eurocontrol does not run archaic systems.

tetha · on April 3, 2018

And on the other hand, there are massive requirements at all levels for this kind of software.

I'm not sure what the right answer is at that point. From dev-experience, I wouldn't want half-assed software directing flights. From ops-experience, I don't want software directing flights requiring specific hardware so you can just replace the hardware with easily available hardware.

drinchev · on April 3, 2018

From dev-experience I would be surprised if SpaceX rockets run with decades old assembler code-base.

I still believe we can create bug-free solution with modern toolset. It's a matter of zeros behind the project funding.

tetha · on April 3, 2018

That's a good point. I'm just a bit disillusioned with the quality of some of the legacy stuff at my current place. Especially because robustness and probability of failures is mostly determined by requirements.

dx034 · on April 4, 2018

That's more a problem with core banking software (and the hardware it runs on) than with airline software.

tedunangst · on April 4, 2018

Was this failure caused by a hardware fault?

agumonkey · on April 3, 2018

perfect replacement part maybe not, but you may be able to emulate some chips with cheap cortex-m/a or esp class

bluedino · on April 4, 2018

Twenty years from now...uber car service system failure affecting 90% of North America

jlgaddis · on April 4, 2018

The only reason this will never happen is because there's a only snowball's chance in hell that Uber will still be around in 20 years.

Zenst · on April 4, 2018

Maybe, though I can imagine a news release along the lines of an old Steve Coogan comedy sketch: https://youtu.be/zUoT5AxFpRs?t=12s

matte_black · on April 3, 2018

In situations like this I’m glad I book my flights with Chase Sapphire Reserve. It comes with a trip delay reimbursement for up to $500 per ticket if a flight is delayed more than 6 hours. No sweat!

ytwySXpMbS · on April 3, 2018

I'm glad I live in the EU in such circumstances: EU regulation 261 [1] covers so much, with €250 to €600 compensation depending on flight distance for delays over 4 hours, with a percentage of full compensation for shorter delays. No specific credit card required.

[1] https://en.wikipedia.org/wiki/Flight_Compensation_Regulation...

jimktrains2 · on April 3, 2018

Doesn't the end of this article state that that may not be in effect for extrodinary circumstances out of the airlines' control?

iMerNibor · on April 3, 2018

Most (some?) airlines will just suck up the cost though in those extrodinary circumstances (bad weather for example), as a general policy to keep their customers sane. Not sure if they'll give you direct compensation, but most will pay for food/drink, some place to stay overnight if need be and rebook your flight for free too if it's cancelled outright.

dx034 · on April 4, 2018

What airlines do you fly with? Comp for hotel and food is paid but the EU comp is really hard to get. There's a reason that so many services specialise on getting that money for passengers for a fee. It's not just Ryanair and Easyjet making it hard to claim. Haven't heard of a single case where reasons as that one led to a payout.

easytiger · on April 4, 2018

> Most (some?) airlines will just suck up the cost though in those extrodinary circumstances (bad weather for example), as a general policy to keep their customers sane

not even remotely true

lucb1e · on April 3, 2018

I kind of doubt that matte_black's gamble (because that's what "insurance" is for costs which you could easily carry yourself if need be, unlike e.g. medical costs) would include that either. Then again, I'm surprised how many "insurances" cover things under your control (like dropping a device) so I might be wrong.

jimktrains2 · on April 4, 2018

I wasn't defending matte_black, just pointing out the the regulation referenced by ggp may not apply.

matte_black · on April 3, 2018

I would be 100% covered up to $500.

I’d rather deal with a credit card company than a government anyway.

kylegordon · on April 3, 2018

Your 'Merkin is showing. Wrong again. You deal with the airline. Not the 'government'

If the airline does you wrong, at that point the CAA takes it up on your behalf. CAA ~= FAA

matte_black · on April 4, 2018

I have no desire to deal with an airline that has already fucked up a flight, and is probably already swamped. As soon as they give me the options for the next flight, I’ll take my own taxi, get my own hotel, and have Chase pick up the tab.

kylegordon · on April 4, 2018

Whatever floats your boat.

It's clearly a fundamental cultural difference. An inherent distrust of government and corporations, along with a willingness to pay to avoid issues, compared to, well, a modicum of trust in government, and an established legal framework that protects the customer from the corporation.

I've been on the receiving end of delayed flights too. The agent was polite and simply offered me an alternative flight, or the compensation/refund, all at the desk. There was no animosity or distrust, just "here's what we both have to do".

joezydeco · on April 3, 2018

...for a $450 annual fee. If you plan to get kicked off a flight more than once a year, perhaps it's worth it.

ironjunkie · on April 3, 2018

the card pays by itself. you already got 300$ of "travel" that you can use for anything travel related (airfare, uber, metro) as part of the 450$

matte_black · on April 3, 2018

The rental car insurance benefit already covered $2600 worth of damage when a rental car I had got banged up a bit.

I also get a $300 travel credit a year.

wumpus · on April 3, 2018

Your normal car insurance covers rental cars.

rconti · on April 4, 2018

not overseas, and has a deductible. I used a generic travel credit card and it covered my rental damage once, now I make sure to use such a card whenever I rent.

matte_black · on April 4, 2018

This was another country.

repiret · on April 3, 2018

I’ve never had a problem getting a refund from an airline (in the US) when I chose not to take a flight because it was significantly delayed.

arcticbull · on April 3, 2018

DoT regulations don't require they do that, although in the event of substantial delays they will often. The reality is cash comp is only due in the event of involuntarily denied boarding situations.

kylegordon · on April 3, 2018

Compensation is built into the EU regulations.

Dig1t · on April 3, 2018

Found the Chase employee.