Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How did you deal with redundancy "choke points"? eg. What if the component that tallies computer votes and actuates things based on the results fails (especially in a way that's hard to detect)?

Were you able in some cases to maintain isolation all the way through from sensors to actuators and design such that a single failed one (in a worst case failure mode) could be overcome by the rest?



(Disclaimer: SIL-4 programmer for safety critical rail transportation applications)

The way this is done is that 2 of 3 computers need to 'agree' on the final decision in order for it to be considered the correct one - there isn't a single point of assessment, but rather a consensus that must be formed from the results of all 3 computers. Ideal case, all 3 produce the same results.

This has worked successfully for decades. What's changing now, is that those 3 computers are now no longer the same architecture - you'll have a PPC and an x86 and an ARM-based CPU all attempting to agree to the same data, in order to prevent systemic failure throughout.


(Same disclaimer)

> What's changing now, is that those 3 computers are now no longer the same architecture

I think / hope this idea will die a well deserved death. Rail systems must face 25 years lifetime and with such design obsolescence headaches are multiplied by 3. In addition this creates bugs and integration nightmares.

And on top of all that, we have known for decades that CPUs can be protected against systemic failures using the vital coded processor technique [1].

Note to the curious: this is one of the most fascinating piece of software I have ever encountered. Make the software resilient to any hardware fault through the power of arithmetic...

[1] https://www.semanticscholar.org/paper/Vital-software%3A-Form...


SIL-4 doesn't generally follow the same rules as consumer computing, as I'm sure you are aware.

There are track-side systems still running on 80386's, and a maintenance infrastructure in place to keep these systems running for at least a few more years, before they are replaced with the newer, Pentium-based systems.

Design obsolescence is not built-in for these systems. Long-term ability to support is more important. Also, burn-in period. Many bullets (SPECTRE, etc.) have been dodged by relying on older, more proven, more tested technologies ..


Design obsolescence is absolutely built-in for railway systems (safety related or not). It’s part of the fun of the profession to deal with bizarre voltages that are inherited from the 1930s. For sure it’s not something consumer electronics care for but that’s precisely the point I am making.

Actually, it’s one of the required extension to ISO 9002 by IRIS, the quality framework for rail engineers.

Components are chosen not only for their function but also for their supply availability in the long run. One of the things we look at is « multisourcing »: how many different folks can provide this stuff. The more the better.

Going back to this great idea about hardware diversity (that is absolutely not a general rule in railway): with an architecture requiring 3 different hardwares, you essentially shoot yourself in the foot. Instead of having 3 suppliers for one component, now you need to find at least 6 for 3 components. And the probability to face an obsolescence issue has basically tripled...

And for what? It’s proven useless by science... :b

And as far as dodging bullets, railway is no better than SCADA. Older tech means older vulnerabilities stay in place, and there are countless people on HN that will tell you it’s /not/ a good thing. I don’t think we dodged, I think we were just lucky. There is a reason why now cybersecurity is making its way inside the last revision of standards.


> hardware diversity (that is absolutely not a general rule in railway)

I don't know where you're working in the industry, but in my company (THALES) its definitely a thing, and we are absolutely working on diversifying the 3-of-3 and 3-of-5 configurations away from Intel.

And maybe we're talking about design obsolescence in different terms, but yeah .. 30-year old CPU's are still being shipped to customers, yo. They WANT it, so.


You can design your platform in a number of ways to be relatively independant of the underlying CPU model, thereby mitigating the risk of supplier lock-in. All suppliers will try to find a way to achieve such target.

It’s a different thing than saying you need hardware diversity in a majority vote system to achieve better safety. That is demonstrably false. For example, Siemens VCP is /proven/ to be safe and it does not even use a majority vote (see my previous comment for a reference to the VCP)

I prefer not to involve my employers on HN. What I write here is my opinion only.

As for your last remark, let’s be careful in assuming what customers want. The fact that they have to live with obsolete stuff does not necessarily mean they are super happy about it.


Its not about the software being independent of the architecture. Its about using diverse hardware platforms to avoid the situation where an un-detected, hardware-level bug affects the voting ability of all participants. We've seen, time and time again, so-called dependable platforms weaken over the years as more and more issues are uncovered.

Diverse voting node architecture requirements are designed to prevent hardware bugs from crashing trains, not software bugs.

>Siemens VCP /proven/

.. and yet, it still crashed trains.

>What customers want

Its not obsolete if a customer wants it.

Customers want older CPU platforms because the tooling and industry required to support them in the field is long-entrenched, and costs of upgrading to "newer, sexier CPU's", not really worth it if the lesser platforms are capable of doing the job...


> .. and yet, it still crashed trains.

Do you have evidence for that claim?


> Many bullets (SPECTRE, etc.) have been dodged

Rail control systems run untrusted code? Why not dodge Spectre by not running random code in the control system?


They don't run random code, but nevertheless one of the reasons why SIL-4 systems don't just go for the 'latest and greatest' is because, even with trusted code, one can not be sure until a great deal of industry-wide testing, that there aren't bugs in the hardware.


What would be cool is if the software and hardware design is done by 3 completely different teams. Hehe.


Tried already! It’s a disaster. Nancy Leveson & friends wrote articles about this. Yet, there are still people today convinced diverse programming is a good idea... Sigh.

https://dblp.org/rec/conf/icse/JaffeL01


>1990 study

>student programmers

>toy problem

To me, this proves very little about how independent development teams would perform with experienced engineers working with present day design tools. It's possible that the results would be the same, worse, or better, but I hope this isn't the most recent research that's been done on this.

The main issue with this approach is of course - cost. So the question is whether having two systems with uncorrelated fault paths is worth doubling the development expense.


Do you have a link to a readable copy?


Whoops. Wrong article. Apologies. Blame my big fingers on this phone. The article is:

« Analysis of Faults in an N-Version Software Experiment." Susan S. Brilliant, John C. Knight, Nancy G. Leveson (1990)

And I found a PDF version here: http://sunnyday.mit.edu/papers/nver2.pdf


Out of curiously, would you recommend any good journals, magazines or books for people in your industry (grokkable by a general level programmer ideally but not that bothered if not).

I find all the different domains of programming endlessly facinating.


Alas, its a very academic and dry field in terms of interesting literature, but you can definitely learn a lot from a few books ..

Safety Critical Systems Handbook:

https://www.elsevier.com/books/the-safety-critical-systems-h...

IEC 61508 Compliance Guide:

https://www.perforce.com/resources/qac/how-comply-iec-61508-...

ISA Safety Series:

https://www.isa.org/standards-and-publications/isa-publicati...


These look interesting, Thank you!


Voters are redundant too. Actuators receive safety release authorizations from all of them and will activate as long as one valid authorization is received.


Cool! But at some point doesn't it all feed down to one wire connected to a motor coil or something?


Wires are also redundant ;). Then the actuator is usually a fail safe device such as relays.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: