Because they hit "unknown error" and when that happens on safety critical system...

kccqzy · on Sept 11, 2023

I agree with your first paragraph but your second paragraph is quite defeatist. I was involved in a quite few of "premortem" meetings where people think of increasing improbable failure modes and devise strategies for them. It's a useful meeting before larges changes to critical systems are made live. In my opinion, this should totally be a known error.

> Having found an entry and exit point, with the latter being the duplicate and therefore geographically incorrect, the software could not extract a valid UK portion of flight plan between these two points.

It doesn't take much imagination to surmise that perhaps real world data is broken and sometimes you are handed data that doesn't have a valid UK portion of flight plan. Bugs can happen, yes, such as in this case where a valid flight plan was misinterpreted to be invalid, but gracefully dealing with the invalid plan should be a requirement.

jjk166 · on Sept 11, 2023

> Saying this should have been handled as a known error is totally reasonable but that's broadly the same as saying they should have just written bug free code.

I think there's a world of difference between writing bug free code, and writing code such that a bug in one system doesn't propagate to others. Obviously it's unreasonable to foresee every possible issue with a flight plan and handle each, but it's much more reasonable to foresee that there might be some issue with some flight plan at some point, and structure the code such that it doesn't assume an error-free flight plan, and the damage is contained. You can't make systems completely immune to failure, but you can make it so an arbitrarily large number of things have to all go wrong at the same time to get a catastrophic failure.

krisoft · on Sept 11, 2023

> Even if they had parsed it into some structure this would be the equivalent of a KeyError popping out of nowhere because the code assumed an optional key existed.

How many KeyError exceptions have brought down your whole server? It doesn't happen because whoever coded your web framework knows better and added a big try-catch around the code which handles individual requests. That way you get a 500 error on the specific request instead of a complete shutdown every time a developer made a mistake.

numpad0 · on Sept 11, 2023

Crash is a feature, though. It's not like exceptions raises by itself into interpreter specifications. It's just that it so happens that Web apps ain't need no airbags that slow down businesses.

acdha · on Sept 11, 2023

That line of reasoning is how you have systemic failures like this (or the Ariane 5 debacle). It only makes sense in the most dire of situations, like shutting down a reactor, not input validation. At most this failure should have grounded just the one affected flight rather than the entire transportation network.

marcosdumay · on Sept 11, 2023

On a multi-user system, only partial crashes are features. Total crashes are bugs.

A web server is a multi-user system, just like a country's air traffic control.

Spivak · on Sept 11, 2023

I love that phrasing, I'm gonna use that from now on when talking about low-stakes vs high-stakes systems.

david422 · on Sept 11, 2023

> big try-catch around the code which handles individual requests.

I mean, that's assuming the code isolating requests is also bug free. You just don't know.

piva00 · on Sept 11, 2023

> Because they hit "unknown error" and when that happens on safety critical systems you have to assume that all your system's invariants are compromised and you're in undefined behavior -- so all you can do is stop.

What surprised me more is that the amount of data existing for all waypoints on the globe is quite small, if I were to implement a feature that query by their names as an identifier the first thing I'd do is to check for duplicates in the dataset. Because if there are, I need to consider that condition in every place where I'd be querying a waypoint by a potential duplicate identifier.

I had that thought immediately when looking at flight plan format, noticed the short strings referring to waypoints, way before getting to the section where they point out the name collision issue.

Maybe I'm too used to work with absurd amounts of data (at least in comparison to this dataset), it's a constant part of my job to do some cursory data analysis to understand the parameters of the data I'm working with, what values can be duplicated or malformed, etc.

SoftTalker · on Sept 11, 2023

If there are duplicate waypoint IDs, they are not close together. They can be easily eliminated by selecting the one that is one hop away from the prior waypoint. Just traversing the graph of waypoints in order would filter out any unreachable duplicates.

ummonk · on Sept 11, 2023

That it's safety critical is all the more reason it should fail gracefully (albeit surfacing errors to warn the user). A single bad flight plan shouldn't jeopardize things by making data on all the other flight plans unavailable.

madeofpalk · on Sept 11, 2023

That's like saying that because one browser tab tried to parse some invalid JSON then my whole browser should crash.

Spivak · on Sept 11, 2023

Well yes because you're describing a system where there are really low stakes and crash recovery is always possible because you can just throw away all your local state.

The flip side would be like a database failing to parse some part of its WAL log due to disk corruption and just said, "eh just delete those sections and move on."

madeofpalk · on Sept 11, 2023

Crash the tab and allow all the others to carry on!

The problem here is that one individual document failed to parse.

haimez · on Sept 11, 2023

The other “tabs” here are other airplanes in flight, depending on being able to land before they run out of fuel. You don’t just ignore one and move on.

epolanski · on Sept 11, 2023

Nonsense comparison, your browser's tabs are de facto insulated from each other, flight paths for 7000 daily planes over the UK literally share the same space.

adrianmonk · on Sept 11, 2023

You don't know that the JSON is invalid. Maybe the JSON is perfect and your parser is broken.

zimpenfish · on Sept 11, 2023

No, it's more like saying your browser has detected possible internal corruption with, say, its history or cookies database and should stop writing to it immediately. Which probably means it has to stop working.

ludwik · on Sept 11, 2023

It definitely isn't. It was just a validation error in one of thousands external data files that the system processes. Something very routine for almost any software dealing with data.