Rules for Writing Safety Critical Code (2006)

michaelt · on Nov 4, 2016

Rules like avoiding recursion and loop upper bounds are partly justified by helping code analyzers to prove executions are bounded.

Does anyone have experience performing formal static analysis on code? And if so, are there any open-source examples of the state-of-the-art tools being used on nontrivial problems?

I've been looking at this recently, but I've never seen a proof of reasonably standard function like, say, printf. Most of the attempts at formal proof I've seen seem quite complex for very simple functions [1] and the next step up seems too complicated for me to follow [2]. I'm familiar with heuristic-based tools like clang analyser, coverity and flexelint - but none of those claim to produce any sort of proof. I'm also aware of MISRA but again, that doesn't claim to provide any sort of proof.

Those of you who are using formal methods: Is what I've seen really the state of the art? Or are there powerful tools I've overlooked?

[1] https://github.com/Beatgodes/klibc_framac_wp/blob/master/src... [2] https://github.com/seL4/l4v/blob/master/lib/Monad_WP/Strengt...

wizeman · on Nov 4, 2016

Take a look at dafny [1] and F* [2]. Both are open-source and it appears that both of them are able to prove code to be correct with considerably less proof code than traditional interactive theorem provers (such as Coq or Isabelle), mostly thanks to the use of an SMT solver (Z3, in both cases) to aid in proving.

Both languages are being used in Project Everest [3], which is building a formally verified HTTPS stack, including a formally verified TLS implementation [4].

Unfortunately, there isn't a lot of documentation / beginner guides about these languages out there, but with the little there is, I was still able to start proving non-trivial (but still relatively easy-to-prove) code with them, even though I have no formal verification background whatsoever.

Just as a tip, I found the Dafny tutorial to be a lot easier to follow than F-Star's, but from what I could gather, F* appears to be slightly more general/sophisticated in what it can prove.

[1] https://github.com/Microsoft/dafny [2] https://www.fstar-lang.org [3] https://project-everest.github.io [4] https://mitls.org

Animats · on Nov 4, 2016

"F star is an ML-like functional programming language aimed at program verification. Its type system includes polymorphism, dependent types, monadic effects, refinement types, and a weakest precondition calculus. Together, these features allow expressing precise and compact specifications for programs, including functional correctness and security properties. The F star type-checker aims to prove that programs meet their specifications using a combination of SMT solving and manual proofs. Programs written in F star can be translated to OCaml or F# for execution."

And you wonder why you're not getting any traction.

Dafny, though, seems to have the right idea.

posterboy · on Nov 6, 2016

The field is hardly accessible due to the sheer variety of solvers. Getting an overview first seems more important then jumping right in and getting a feel for a particular solver.

lugus35 · on Nov 4, 2016

Open Source tools are way behind commercial software. Just look at PolySpace Code Prover for example:

https://mathworks.com/products/polyspace-code-prover/

ou PolySpace bug finder :

https://mathworks.com/products/polyspace-bug-finder/

michaelt · on Nov 4, 2016

Have you personally used those tools? On what sort of projects?

buttproblem · on Nov 4, 2016

Overall, one way to look at formal methods is that the tools range from fully automated (e.g., abstract interpretation, model checking), to manual (e.g., interactive theorem proving). Along this same range, the complexity of properties which can be proved also moves from simple (e.g., absence of buffer overflows) to complex (termination of parameterized systems).

As a concrete example, interactive theorem provers can prove the termination of Paxos for an arbitrary number of nodes (i.e., a system of `n` nodes will reach consensus, where `n` is a positive integer). But, automatically generating such a proof for something as complicated and parameterized as Paxos is an open problem ([1] describes an automatic tool working toward that goal).

Another thing to keep in mind is the goal of the tool: verification versus bug hunting. Verification aims to prove that a property will never be violated, while bug hunting aims to find a property violation. Here's a list of some types of tools off the top of my head.

Bounded model checkers such as CBMC [2] essentially search the entire state-space of the program; you can conceptually think of this as searching through a graph which has some start node, and you'd like to find a path from the start to an error. This is a bug hunting technique since the software/hardware may have an infinite state space (e.g., `while (true) ++i`), hence the search space must be bounded (e.g., unroll loops). For finite-state systems this can generate proofs of correctness. For infinite state systems, it can generate proofs up to a certain bound (e.g., correct for some number of clock cycles, correct for some number of loop iterations). The benefit of bounded model checkers is that when they find an error they generate a counter-example, which is essentially inputs to the program causing it to fail. So, you could use the counter-example as a concrete test case to fix the issue. Techniques such as counter-example guided abstraction refinement, and more recently IC3 and property-directed reachability make use of these counter-examples to generate proofs for infinite state systems.

A complementary technique is abstract interpretation, and numerical abstract interpretation in particular. One successful tool is Astree, though closed source, has been used to verify properties on Airbus planes. Abstract interpretation is complementary to something like bounded-model checking since it can be used to verify properties on infinite state systems. Numerical abstract interpretation can be used to generate numerical proofs such as, "on line 10, variable `x` is in the range `[-10, 10]`, or constraints between variables such as `x - y >= 10`. A difficult part with abstract interpretation, however, is that when it reports an error it may be the case that the error is a false-alarm and it doesn't actually exist. You can think of numerical abstract interpretation as typical "compiler algorithms" such as constant propagation but keeping track of much more complicated information.

Symbolic/Concolic execution, such as KLEE [4], Pex and Moles [5], and their successor IntelliTest [6], are sort of a middle ground between model checking, and traditional unit testing (i.e., hand writing inputs to the program to test it). The main goal is to automatically generate such tests in order to test behavior in the program (e.g., generate a set of test inputs to have 100% code coverage). The modern versions of these techniques make use of dynamic information (i.e., they actually execute the program under test), and use this information to make decisions about future executions. More concretely, you can imagine the program executing past some branch `if (c)` which was not taken (i.e., `c == false`) and then, the analysis asks the question: "what do the program inputs need to be such that `c == true`. An answer to this question generates new inputs and allows another test to be generated and explored.

Stateless model checking is another automated dynamic technique [7,8,9] used to explore non-determinism in concurrent programs. Essentially, it automatically generates "test cases" (i.e., new thread schedules) to efficiently explore the state space caused by a non-deterministic scheduler.

And there's a bunch of other techniques out there too (e.g., symbolic model checking). There's also a huge amount of work on practical test input generation for hardware. On top of all these techniques, there are there applications to solving specific problems, and heuristics to make them more practical and scalable. Overall, most of the techniques are niche and not widely used, but have been applied in various areas.

[1] https://www7.in.tum.de/~gleissen/papers/sharpie.pdf

[2] http://www.cprover.org/cbmc/

[3] http://www.astree.ens.fr/

[4] https://klee.github.io/

[5] https://www.microsoft.com/en-us/research/project/pex-and-mol...

[6] https://www.visualstudio.com/vs/release-notes/#Testing

[7] https://github.com/markus-kusano/SysTest

[8] http://www.srl.inf.ethz.ch/papers/oopsla15-modelchecking.pdf

[9] http://plrg.eecs.uci.edu/software_page/42-2/

iceManChild · on Nov 4, 2016

For an open source analyzer library you might look at IKOS (https://ti.arc.nasa.gov/opensource/ikos/).

Too · on Nov 5, 2016

Microsoft code contracts for c# is pretty cool.

ti32x · on Nov 4, 2016

I work on software that goes to space. I believe we and NASA use the MISRA coding standard, which is effectively this but taken to extremes. If you actually want to create safety critical code I would recommend that you familiarize yourself with the MISRA ruleset and DO-178B.

kejaed · on Nov 4, 2016

If anyone is looking into this sort of stuff, you can check out the relatively new DO-178C as well, which adds some more details about tool qualification, formal methods, and model-based development. Signed, your friendly neighbourhood aerospace systems engineer.

nstj · on Nov 4, 2016

Space software? Nice! Going to check out MISRA now.

FWIW the initial page just reflects the JPL publication "The Power of Ten – Rules for Developing Safety Critical Code" by Gerard Holzmann [0]

[0]: http://spinroot.com/gerard/pdf/P10.pdf

pjmlp · on Nov 4, 2016

MISRA only goes thus far.

For writing safe code in C, you should also check Frama-C and CERT Secure Coding Standards.

http://frama-c.com/

https://www.securecoding.cert.org/confluence/display/seccode...

Even better, get the algorithms written in a theorem prover which is guaranteed to generate reliable safe C code.

This is approach taken by Microsoft with P Language(Windows USB device driver stack) and F* (new TLS implementation).

mos_basik · on Nov 4, 2016

Gerard Holzmann is the main author of the formal software verification tool "spin" as well as the owner of spinroot.com. I took a class on formal verification under him at Caltech.

4ad · on Nov 4, 2016

As an anecdote, spin was used in Plan 9 to validate the kernel scheduler and the IL network protocol. It was also used to validate a very early version of the Go scheduler.

mturmon · on Nov 4, 2016

Gerard is an ACM Fellow and a member of the NAE, and has been a leader in reliable software for NASA since he came to JPL from Bell Labs.

nstj · on Nov 4, 2016

I probably should have clicked around a bit to determine that he is the owner of that site :)

danjayh · on Nov 5, 2016

I work for your friendly Aviation megacorp producing avionics-related widgets. We are currently transitioning to DO-178C. Fun anecdote: I took a training class on Peach Fuzzer last year & got to hear people from MS, Mozilla, etc. talk about the testing they do on their software. I asked them if they ever had to prove things like worst-case execution time, worst-case stack utilization, perform MC/DC analysis, perform source-to-object analysis, etc., etc. They did not ... and that, my friends, is why avionics software is so damned expensive compared to other software :)

EDIT: And to those of you debating whether all of these rules are necessary and good -- they are; however, they are a subset. We use an internal coding standard that resembles MISRA C but does have some modifications. The only rule of his that we don't do nearly the same way is the 60 line limit. We use code complexity metrics rather than a strict line limit, but it's the same idea.

4ad · on Nov 4, 2016

Full 31 rules of NASA JPL C coding standard: http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf

Previous discussion: https://news.ycombinator.com/item?id=4339999

mrlyc · on Nov 4, 2016

When I was writing safety-critical embedded programs for medical equipment and an air traffic control system, I found MISRA (https://en.wikipedia.org/wiki/MISRA_C) and Lutz's "Targeting Safety-Related Errors During Software Requirements Analysis" (http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/35179/1/93...) useful.

zwieback · on Nov 4, 2016

I would add "use static type checker from day 1". It's amazing how much stuff Coverity discovers and it can help with the clutter of rechecking invariants of nested functions.

achou · on Nov 4, 2016

Rules like this are much more useful if they're enforced. That's why efforts like the C Secure Coding Rules[1], which were designed with the capabilities of reasonably state of the art analyzers in mind, are a good idea. Like any standard designed by a committee, there are some unfortunate compromises.

Many commercial (and open source) static analyzers will have checks that are hard to codify in rules that are digestible by humans. For example, most analyzers that use sophisticated interprocedural analysis will not be easily described as a guideline for humans, unless those humans are compiler engineers. This is especially true for analyzers that have heuristics built in to minimize false positives, which is most often a real-world requirement.

[1] https://www.securecoding.cert.org/confluence/pages/viewpage....

huhtenberg · on Nov 4, 2016

An opinionated and needlessly specific list. For example, this

  Use minimally two assertions per function on average

should've been

  Assert your invariants

And this

  Do not use goto statements

is a controversial advice at best, because there are several well-established goto-based patterns that produce a much cleaner code.

Kurtz79 · on Nov 4, 2016

"An opinionated and needlessly specific list."

Opinionated, maybe (although it seems to me most rules are reasoned and a justification is given).

But something practical has to be specific.

It has happened time and again to come across something it appeared to me as an arbitrary an unnecessary rule, but with more experience I think it's better to have some "debatable" specific rules that something "open to interpretation", especially if many people will have to work on the same codebase.

4ad · on Nov 4, 2016

> An opinionated and needlessly specific list

Not really. Gerard Holzmann is in charge of software at NASA/JPL. Each rule comes from experience with a mission loss that would have been prevented if people had implemented that rule. They also help static analysers become more accurate.

These rules are specific to ultra-reliable embedded software, flight software or software for life-critical equipment. They are not really relevant for most normal programming.

adrianN · on Nov 4, 2016

That doesn't mean that the rules are perfect. There haven't been that many mission losses to have a large corpus of data and finding a root cause of an accident might not lead to the best rules. "The rocket wouldn't have blown up if you asserted this here" doesn't mean that adding a large number assertions necessarily leads to better code.

erikbye · on Nov 4, 2016

A lot of wisdom comes from simulations as well.

huhtenberg · on Nov 4, 2016

Sure and this list likely worked well in the NASA/JBL context. The point being is it needlessly specific as a generic advice.

At the risk of stating the obvious, a "goto exit" pattern is the defacto standard for function cleanup in Linux kernel, which may not be flying every space probe, but it's still pretty damn ultra-reliable piece of software.

naasking · on Nov 4, 2016

> [Linux is] still pretty damn ultra-reliable piece of software.

It's really not. I don't think you have a good sense of how reliable the systems in this space really are. Firmwares and other ultra-reliable systems typically use microkernels, and there are billions of units of verified L4 microkernels out there. Linux can't even dream of approaching this level of reliability.

unethical_ban · on Nov 4, 2016

If I wanted to learn more about open source, reliable microkernels, where would you point me?

naasking · on Nov 4, 2016

seL4 is the big one: https://sel4.systems/

Here's a general reference of open source microkernel operating systems: http://www.microkernel.info/

Here's a general reference for open source real time operating systems: http://www.osrtos.com/

JoachimSchipper · on Nov 4, 2016

seL4 is really sexy, but I don't think it's anywhere near the most-used one?

nickpsecurity · on Nov 5, 2016

OKL4 if we're going with the phone example. They stopped open-sourcing it by version 4.0 but someone should be able to dig up OKL4 3.0 somewhere.

4ad · on Nov 4, 2016

Talk about missing the point and the whole context around mission-critical software.

> a "goto exit" pattern is the defacto standard for function cleanup in Linux kernel

There are many ways to do error cleanup that do not involve goto, and there are many bad ways to use goto. A static analyser can't decide if you do "good goto" or "bad goto", but it's easy to ban goto outright and avoid all the "bad gotos". You pay the price in terms of programmer work (he has to do something different), but you gain better static analysis and eliminate a full class of bugs.

This tradeoff makes sense for JPL and other people who do mission-critical of life-critical software. It might not make sense for the Linux kernel or most C software.

Mission-critical is not the place where you want to rely on programmers to do their work correctly (goto fail anyone?). You rely on specific methodology, standards, and static analysis.

nanolith · on Nov 4, 2016

> A static analyser can't decide if you do "good goto" or "bad goto", but it's easy to ban goto outright and avoid all the "bad gotos".

That depends on the goto statement. A good static analyzer can certainly recognize a "goto exit" pattern. There is very little difference in a function graph between an if/then/else clause and a goto statement. In fact, most static analyzers perform a first pass that is pretty similar to a compiler so that it can build a function graph for performing data analysis along each of the potential branches. One this pass is performed, it doesn't matter if you used an if/then/else, a for/while/do-while, a switch statement, or a complex sequence of break or continue statements. A graph is a graph.

The more succinct question is, "What constitutes a 'good' goto statement?" From the perspective of a static analyzer, it would be applying the exact same analysis as it would for any other branch in the function graph. If it detects a memory leak, a use-after-free, or a use of data which isn't always initialized, it makes little difference whether this is because of a bad goto statement or a sloppy set of conditionals. Either the graph shows a problem, or it does not.

OskarS · on Nov 4, 2016

The Linux kernel is not mission critical? Many of those mission critical systems probably run off of a Linux kernel, you know.

I think the point is that while it is potentially a good rule for mission critical code, it is not essential. There are other ways as well.

pjmlp · on Nov 4, 2016

No it isn't.

Operating systems like INTEGRITY RTOS are.

http://www.ghs.com/products/rtos/integrity.html

Which is why GNU/Linux gets packed into his own little sandbox running on top of INTEGRITY.

kristoffer · on Nov 4, 2016

Well, SpaceX claims to be running their flight software on Linux so it is critical to their missions at least :)

4ad · on Nov 4, 2016

SpaceX uses VxWorks for their hard real-time requirements: http://blogs.windriver.com/vxworks/2010/12/vxworks-helping-c...

They use Linux for non-hard-real-time stuff. Their microcontrollers also do not use Linux.

Coincidentally, Tesla uses QNX.

kristoffer · on Nov 4, 2016

Yes for their microcontrollers they use VxWorks. But Linux is still used for the flight software (i.e. mission control).

4ad · on Nov 4, 2016

VxWorks does not run on microcontrollers.

kristoffer · on Nov 4, 2016

Don't know your definition of microcontroller but VxWorks runs on e.g. Cortex-M3/4, MIPS, 68K, SH, SparcV8, PowerPC thus including some devices I could microcontrollers. But there is no clear cut definition of microcontroller, microprocessor, SoC, whatnot.

4ad · on Nov 4, 2016

No, they don't. The Linux kernel is not a RTOS. Most JPL projects use VxWorks.

pjmlp · on Nov 4, 2016

> but it's still pretty damn ultra-reliable piece of software.

Are you sure?

http://arstechnica.com/security/2016/09/linux-kernel-securit...

http://www.cvedetails.com/product/47/Linux-Linux-Kernel.html...

yitchelle · on Nov 4, 2016

An ultra reliable software comes down to knowing its behaviour fully in its released form.

If the code has 100s of gotos and their paths are mapped out and its ramifications of the paths are clearly understood for all scenarios, then it is still reliable as its full behaviour is understood and is safe.

kccqzy · on Nov 4, 2016

I would also say perhaps

    avoid recursion

should be revised to

    avoid non-tail-call recursion

Because the justification to avoid recursion is basically avoid running into the limit of stack, and tail-call recursion uses no stack space at all. Of course you still might end up with situations where the function does not terminate, but simple loops can do that too. In relation to Rule No.2, it is also possible to prove an upper bound for recursive depths.

hvidgaard · on Nov 4, 2016

Recursion is recursion. Tail recursion is optimized away by the compiler except when it isn't (I've seen bugs with this exact setup more than once), but now you have to verify that the compiler actually does this. It's simpler to just avoid it all together unless you want to formally verify the recursion depth.

4ad · on Nov 4, 2016

Apart from that, TCO destroys information that is valuable for debugging (especially for non-recursive tail calls), exactly what you want to avoid in safety-critical software.

pja · on Nov 4, 2016

Manual conversion to a loop also destroys the same information, so it’s a wash either way on that front.

4ad · on Nov 4, 2016

You can't convert arbitrary tail calls to a loop. You can only convert recursive tail calls to a loop. The information loss associated with non-recursive tail calls is great. It's true that recursive tail calls lose less information, but it is not true that it's equivalent to the information lost in a loop. In a loop you always have access to {head, current, tail} while in a tail call you might (subject to algorithm and compiler optimization) only have access to {current, head|tail}, depending if the loop goes forwards or backwards.

spc476 · on Nov 5, 2016

But I've found that non-recursive tail calls are great in handling state machines. I wrote a TFTP client and server in Lua [1] and the code was both much cleaner and more reusable between both client and server. Granted, Lua does proper TCO (as long as you properly code the return statement) and the lack of stack trace wasn't much of an issue there.

These rules are mostly geared towards C, which doesn't guarantee TCO at all.

[1] Partly as way to learn Lua and TCO, and as a way to manage Cisco router config files in a version control system.

bartread · on Nov 4, 2016

Exactly. Tail recursion can always be rewritten as a loop. The code might be uglier but at least you have a better idea of how it's going to behave after compilation.

naasking · on Nov 4, 2016

> The code might be uglier but at least you have a better idea of how it's going to behave after compilation.

The fact that it's uglier suggests you'd have less of an idea how it's going to behave than simple tail recursion. That said, I agree with the idea that if you're after tail recursion, an annotation that the compiler checks that ensures the call is properly tail recursive is a good idea.

to3m · on Nov 4, 2016

Quite often, when time you suggest using a loop rather than a tail call, people pop up to tell you that the code is neater as a tail call. So I'm assuming the comment about ugliness is a way of heading off these people ;)

I'm happy to argue that in C, it's never neater to write a loop as a tail call. Tail calls aren't guaranteed to compile into a goto, so it can be difficult to say whether it's going to end up with a call or a goto - even on a case-by-case basis. And if you write a tail call expecting a goto, and you get a call, obviously you can't just give up and go home - you'll need to rewrite it as a loop, just like you could have done in the first place. This does not strike me as less ugly than just writing that loop.

I know people like to do things the roundabout way, by writing one thing, certain that the compiler will do something else, and then saying this code is more clear rather than less so - but I've never quite understood why. A better rule of thumb to my mind is that if you want something particular to happen: write the code that way.

(I'm not denying an annotation would help - because then at least you'd know statically when you need to rewrite your code as a loop, rather than having to wait until something bad happens. Though... maybe if nothing bad ever happens... it's fine after all...? Sheesh. Ask a philosopher. Too much for me!)

heisenbit · on Nov 4, 2016

And during the maintenance phase someone modifies the code and what was pure tail recursion becomes something else and literally blows up the rocket.

The whole point of these rules is defensive programming. Programming not what is correct but what is robust i.e. if anything breaks it still produces possibly suboptimal but not catastrophic results.

qznc · on Nov 4, 2016

In safety critical projects people often disable compiler optimizations. No tail call elimination.

krylon · on Nov 4, 2016

> tail-call recursion uses no stack space at all

Not all compilers optimize tail-call recursion into - effectively - loops. And even those that can might only do so when certain compiler flags are active.

Kurtz79 · on Nov 4, 2016

Nice.

These are reasoned, common sense measures, most of which I have seen in one form or another in the coding guidelines of the companies I have worked for as an embedded SW engineer.

Some of them are easy to enforce (no dynamic memory allocation, no recursion), other less so (using asserts, limit function size).

The worst is adding your rules-compliant code to messy and bloated legacy code :)

Tepix · on Nov 4, 2016

It's interesting that testing is not among the rules and not among the comments so far.

(The details for rule 5 and 8 mention testing)

atsaloli · on Nov 4, 2016

See also my write-up on Dr. Holzmann's presentation at USENIX Hot Topics in System Dependability mini-conf 2012, http://www.verticalsysadmin.com/making_robust_software/ and the full presentation video at https://www.usenix.org/conference/hotdep12/workshop-program/...

swah · on Nov 4, 2016

What have you guys been using to track requirements? I had a bad experience adapting Redmine for a project, and wouldn't want to repeat that in a next project.

ti32x · on Nov 4, 2016

I have heard that Polarion is an ok requirements management tool for software, but it is expensive.

DOORS is good for tracking and tracing requirements, but from what I can see doesn't do code parsing. This may have changed. However, my opinion of IBM software is irreparably broken so I don't have a huge amount of trust in it.

We use Reqtify to glue requirement documents to code. It also sucks, but I think it's still powerful. You just need to set up some regexes which can recognize your requirement identifier style and you're good to go.

Whoever comes up with a solution to glue DOORS requirements to Jira issues to code seamlessly is going to be fairly rich.

rubidium · on Nov 4, 2016

I've done a bit a research on this. Company uses DOORS and hates it. We've evaluated JAMA and Polarion (both expensive) and both are better than DOORS. JAMA has a nice JIRA integration. Polarion can be integrated it seems but untested.

VinzO · on Nov 4, 2016

We use DOORS. I don't have experience with other tools, so I can't compare but I don't find it a nice experience either.

svec · on Nov 4, 2016

Funny timing: I just suggested those guidelines, and several related ones, in a post over here: http://embedded.fm/blog/2016/11/3/resources-for-software-bes...

(Check out "Rules for Defensive C Programming" by Dinu Madau for starters.)

aetherspawn · on Nov 4, 2016

I wrote my capstone project a week ago on rolling out Simulink for automatically generating SC code en masse and minimising hand written code and manual translation.

I referenced the paper mentioned in my literature review

I would not link it here to the world to see as it is just an undergraduate project and my research did not yield a highly quantitive result, but I am applying the technique on a project right now with good success

PaulKeeble · on Nov 4, 2016

10 Rules isn't going to cut it really, its a lot more detailed than that. Here is the Spark reference manual (a standard for Ada that has been the industrial standard for decades): http://docs.adacore.com/spark2014-docs/html/lrm/

4ad · on Nov 4, 2016

Yeah, the point of these rules (and the 31 JPL rules) is concision. Your programmers will read a 10 page coding standard, and might remember it. Nobody is going to remember hundreds of rules and hundreds of pages. They won't even read it.

In his talks Holzmann always makes the point that a coding standard is a hard sell for most people, and JPL hadn't had any success until they could come up with something simple and concise.

TickleSteve · on Nov 4, 2016

These are fairly typical modern embedded s/w guidelines, safety critical code is much more rigorous specifically on the design & requirements side.

vbtemp · on Nov 4, 2016

Why not just call it "Rules for Writing Code"?

Who would ever, for example, consider checking the return value of some function but then think "ah, this isn't safety critical, I'm just skip writing a quick _if_ statement and then leave this as a place that mysterious bugs can arise"? Same thing with assertions.

pjc50 · on Nov 4, 2016

Have you met programmers?

Less snarkily, there are billions of lines of production code that don't check return values for malloc(), printf(), and so on because those failures are extremely rare and difficult to handle sensibly.

Banning dynamic allocation would ban large areas of programming language - Lisp, Python, Javascript, and so on.

kazinator · on Nov 4, 2016

Even if you check the return value of malloc, that doesn't help when virtual memory is overcommitted. Malloc gets a valid pointer from mmap, but accessing the memory later crashes.

A system OOM may actually show a soft behavior, whereby the whole system thrashes more and more until it grinds to a halt. Whatever your program does is futile and irrelevant.

To QA the program for how it handles null out of malloc, you need to tweak the setup: disable overcommit and simulate low memory conditions. (Or even just switch to an alternative allocator that simulates failures.)

Checking the result of malloc for null is of course good for maximal portability: your code can be reused where the check actually means something.

Checking for null and not testing that logic is almost no better than not testing at all (in keeping with the general hypothesis that untested code is junk).

closeparen · on Nov 4, 2016

1) Some languages have tail call optimization and sometimes recursion is the clearest way to express an idea.

2) Requiring a fixed upper bound when iterating a collection of objects is a direct contradiction of the ZOI rule[0]. Most programs should not place arbitrary boundaries on the number of objects they'll operate on. This is better adapted to systems with relatively fixed databases and a predictably low volume of operator and sensor input (i.e. flight management computers, train control systems) more than internet services.

7) "For this reason, most coding guidelines for safety critical software also forbid the use of all ansi standard headers like string.h, stdlib.h, stdio.h" should speak for itself.

9) Is pretty C/C++ specific.

[0] https://en.wikipedia.org/wiki/Zero_one_infinity_rule

pjc50 · on Nov 4, 2016

> Most programs should not place arbitrary boundaries on the number of objects they'll operate on

I think this can be argued both ways. The classic stateless pipe programs, sure, they can take as many lines as you want. User-facing programs? It can be worth imposing fairly small limits just so you don't have to worry about pathological behavior and security risks. (What happens if someone pastes a million lines of text into a small box, etc)

AnimalMuppet · on Nov 4, 2016

I'm writing a quick program that's supposed to compute something and print out the answer. I don't check the return value of printf(). Why not?

1. printf almost never fails. If it does, I'm just going to re-run the program. If that doesn't work, I'll reboot and then re-run. On average, I waste less time with that approach than with checking the return value of printf.

2. If printf fails, what am I going to do? Print out an error message? But that might fail, too. Log something? That could also fail. Should I check the return values of those attempts, too?

But safety-critical code is usually not calling printf - it may not have a conventional output device at all. (It should have some mechanism for alerting the operator if there are failures, though.)

4ad · on Nov 4, 2016

Because these rules have a cost, and sometimes the cost is not worth it for non-safety critical code. It would be a PITA for general purpose code to avoid all dynamic memory allocation.

vbtemp · on Nov 4, 2016

With the sole exception of item (3) in that list (pertaining to dynamic allocation), which others carry a cost?

jhomedall · on Nov 4, 2016

Not allowing recursion makes functional programming impossible.

Giving all loops a fixed upper bound comes at a cost to code readability (consider the example of traversing a linked list, as given in the article).

BerislavLopac · on Nov 4, 2016

Ask HN mods: How is this URL different from the one in https://news.ycombinator.com/item?id=12864032? I was under impression that submissions are unique...

detaro · on Nov 4, 2016

They are not, it's even explicitly allowed to repost links that didn't see activity as long as it is not overdone. I think currently the system only rejects perfectly identical URLs for a few hours.

triplesec · on Nov 4, 2016

This doesn't always work, for whatever reason. I've had this as well, but there's not much you can do. The Upvote game isn't that important, so don't lose sleep about it. Check the FAQ first, but send an email to HN support if it's a bug.