How did you deal with redundancy "choke points"? eg. What if the component that ...

fit2rule · on Aug 4, 2019

(Disclaimer: SIL-4 programmer for safety critical rail transportation applications)

The way this is done is that 2 of 3 computers need to 'agree' on the final decision in order for it to be considered the correct one - there isn't a single point of assessment, but rather a consensus that must be formed from the results of all 3 computers. Ideal case, all 3 produce the same results.

This has worked successfully for decades. What's changing now, is that those 3 computers are now no longer the same architecture - you'll have a PPC and an x86 and an ARM-based CPU all attempting to agree to the same data, in order to prevent systemic failure throughout.

crocal · on Aug 4, 2019

(Same disclaimer)

> What's changing now, is that those 3 computers are now no longer the same architecture

I think / hope this idea will die a well deserved death. Rail systems must face 25 years lifetime and with such design obsolescence headaches are multiplied by 3. In addition this creates bugs and integration nightmares.

And on top of all that, we have known for decades that CPUs can be protected against systemic failures using the vital coded processor technique [1].

Note to the curious: this is one of the most fascinating piece of software I have ever encountered. Make the software resilient to any hardware fault through the power of arithmetic...

[1] https://www.semanticscholar.org/paper/Vital-software%3A-Form...

fit2rule · on Aug 4, 2019

SIL-4 doesn't generally follow the same rules as consumer computing, as I'm sure you are aware.

There are track-side systems still running on 80386's, and a maintenance infrastructure in place to keep these systems running for at least a few more years, before they are replaced with the newer, Pentium-based systems.

Design obsolescence is not built-in for these systems. Long-term ability to support is more important. Also, burn-in period. Many bullets (SPECTRE, etc.) have been dodged by relying on older, more proven, more tested technologies ..

crocal · on Aug 4, 2019

Design obsolescence is absolutely built-in for railway systems (safety related or not). It’s part of the fun of the profession to deal with bizarre voltages that are inherited from the 1930s. For sure it’s not something consumer electronics care for but that’s precisely the point I am making.

Actually, it’s one of the required extension to ISO 9002 by IRIS, the quality framework for rail engineers.

Components are chosen not only for their function but also for their supply availability in the long run. One of the things we look at is « multisourcing »: how many different folks can provide this stuff. The more the better.

Going back to this great idea about hardware diversity (that is absolutely not a general rule in railway): with an architecture requiring 3 different hardwares, you essentially shoot yourself in the foot. Instead of having 3 suppliers for one component, now you need to find at least 6 for 3 components. And the probability to face an obsolescence issue has basically tripled...

And for what? It’s proven useless by science... :b

And as far as dodging bullets, railway is no better than SCADA. Older tech means older vulnerabilities stay in place, and there are countless people on HN that will tell you it’s /not/ a good thing. I don’t think we dodged, I think we were just lucky. There is a reason why now cybersecurity is making its way inside the last revision of standards.

fit2rule · on Aug 4, 2019

> hardware diversity (that is absolutely not a general rule in railway)

I don't know where you're working in the industry, but in my company (THALES) its definitely a thing, and we are absolutely working on diversifying the 3-of-3 and 3-of-5 configurations away from Intel.

And maybe we're talking about design obsolescence in different terms, but yeah .. 30-year old CPU's are still being shipped to customers, yo. They WANT it, so.

crocal · on Aug 5, 2019

You can design your platform in a number of ways to be relatively independant of the underlying CPU model, thereby mitigating the risk of supplier lock-in. All suppliers will try to find a way to achieve such target.

It’s a different thing than saying you need hardware diversity in a majority vote system to achieve better safety. That is demonstrably false. For example, Siemens VCP is /proven/ to be safe and it does not even use a majority vote (see my previous comment for a reference to the VCP)

I prefer not to involve my employers on HN. What I write here is my opinion only.

As for your last remark, let’s be careful in assuming what customers want. The fact that they have to live with obsolete stuff does not necessarily mean they are super happy about it.

fit2rule · on Aug 9, 2019

Its not about the software being independent of the architecture. Its about using diverse hardware platforms to avoid the situation where an un-detected, hardware-level bug affects the voting ability of all participants. We've seen, time and time again, so-called dependable platforms weaken over the years as more and more issues are uncovered.

Diverse voting node architecture requirements are designed to prevent hardware bugs from crashing trains, not software bugs.

>Siemens VCP /proven/

.. and yet, it still crashed trains.

>What customers want

Its not obsolete if a customer wants it.

Customers want older CPU platforms because the tooling and industry required to support them in the field is long-entrenched, and costs of upgrading to "newer, sexier CPU's", not really worth it if the lesser platforms are capable of doing the job...

crocal · on Aug 11, 2019

> .. and yet, it still crashed trains.

Do you have evidence for that claim?

rrss · on Aug 4, 2019

> Many bullets (SPECTRE, etc.) have been dodged

Rail control systems run untrusted code? Why not dodge Spectre by not running random code in the control system?

fit2rule · on Aug 4, 2019

They don't run random code, but nevertheless one of the reasons why SIL-4 systems don't just go for the 'latest and greatest' is because, even with trusted code, one can not be sure until a great deal of industry-wide testing, that there aren't bugs in the hardware.

WWLink · on Aug 4, 2019

What would be cool is if the software and hardware design is done by 3 completely different teams. Hehe.

crocal · on Aug 4, 2019

Tried already! It’s a disaster. Nancy Leveson & friends wrote articles about this. Yet, there are still people today convinced diverse programming is a good idea... Sigh.

https://dblp.org/rec/conf/icse/JaffeL01

yborg · on Aug 4, 2019

>1990 study

>student programmers

>toy problem

To me, this proves very little about how independent development teams would perform with experienced engineers working with present day design tools. It's possible that the results would be the same, worse, or better, but I hope this isn't the most recent research that's been done on this.

The main issue with this approach is of course - cost. So the question is whether having two systems with uncorrelated fault paths is worth doubling the development expense.

jimktrains2 · on Aug 4, 2019

Do you have a link to a readable copy?

crocal · on Aug 4, 2019

Whoops. Wrong article. Apologies. Blame my big fingers on this phone. The article is:

« Analysis of Faults in an N-Version Software Experiment." Susan S. Brilliant, John C. Knight, Nancy G. Leveson (1990)

And I found a PDF version here: http://sunnyday.mit.edu/papers/nver2.pdf

noir_lord · on Aug 4, 2019

Out of curiously, would you recommend any good journals, magazines or books for people in your industry (grokkable by a general level programmer ideally but not that bothered if not).

I find all the different domains of programming endlessly facinating.

fit2rule · on Aug 4, 2019

Alas, its a very academic and dry field in terms of interesting literature, but you can definitely learn a lot from a few books ..

Safety Critical Systems Handbook:

https://www.elsevier.com/books/the-safety-critical-systems-h...

IEC 61508 Compliance Guide:

https://www.perforce.com/resources/qac/how-comply-iec-61508-...

ISA Safety Series:

https://www.isa.org/standards-and-publications/isa-publicati...

noir_lord · on Aug 5, 2019

These look interesting, Thank you!

crocal · on Aug 4, 2019

Voters are redundant too. Actuators receive safety release authorizations from all of them and will activate as long as one valid authorization is received.

rkagerer · on Aug 4, 2019

Cool! But at some point doesn't it all feed down to one wire connected to a motor coil or something?

crocal · on Aug 4, 2019

Wires are also redundant ;). Then the actuator is usually a fail safe device such as relays.