Rules like avoiding recursion and loop upper bounds are partly justified by helping code analyzers to prove executions are bounded.
Does anyone have experience performing formal static analysis on code? And if so, are there any open-source examples of the state-of-the-art tools being used on nontrivial problems?
I've been looking at this recently, but I've never seen a proof of reasonably standard function like, say, printf. Most of the attempts at formal proof I've seen seem quite complex for very simple functions [1] and the next step up seems too complicated for me to follow [2]. I'm familiar with heuristic-based tools like clang analyser, coverity and flexelint - but none of those claim to produce any sort of proof. I'm also aware of MISRA but again, that doesn't claim to provide any sort of proof.
Those of you who are using formal methods: Is what I've seen really the state of the art? Or are there powerful tools I've overlooked?
Take a look at dafny [1] and F* [2]. Both are open-source and it appears that both of them are able to prove code to be correct with considerably less proof code than traditional interactive theorem provers (such as Coq or Isabelle), mostly thanks to the use of an SMT solver (Z3, in both cases) to aid in proving.
Both languages are being used in Project Everest [3], which is building a formally verified HTTPS stack, including a formally verified TLS implementation [4].
Unfortunately, there isn't a lot of documentation / beginner guides about these languages out there, but with the little there is, I was still able to start proving non-trivial (but still relatively easy-to-prove) code with them, even though I have no formal verification background whatsoever.
Just as a tip, I found the Dafny tutorial to be a lot easier to follow than F-Star's, but from what I could gather, F* appears to be slightly more general/sophisticated in what it can prove.
"F star is an ML-like functional programming language aimed at program verification. Its type system includes polymorphism, dependent types, monadic effects, refinement types, and a weakest precondition calculus. Together, these features allow expressing precise and compact specifications for programs, including functional correctness and security properties. The F star type-checker aims to prove that programs meet their specifications using a combination of SMT solving and manual proofs. Programs written in F star can be translated to OCaml or F# for execution."
And you wonder why you're not getting any traction.
The field is hardly accessible due to the sheer variety of solvers. Getting an overview first seems more important then jumping right in and getting a feel for a particular solver.
Overall, one way to look at formal methods is that the tools range from fully
automated (e.g., abstract interpretation, model checking), to manual (e.g.,
interactive theorem proving). Along this same range, the complexity of
properties which can be proved also moves from simple (e.g., absence of buffer
overflows) to complex (termination of parameterized systems).
As a concrete example, interactive theorem provers can prove the termination of
Paxos for an arbitrary number of nodes (i.e., a system of `n` nodes will reach
consensus, where `n` is a positive integer). But, automatically generating such
a proof for something as complicated and parameterized as Paxos is an open
problem ([1] describes an automatic tool working toward that goal).
Another thing to keep in mind is the goal of the tool: verification versus bug
hunting. Verification aims to prove that a property will never be violated,
while bug hunting aims to find a property violation. Here's a list of some
types of tools off the top of my head.
Bounded model checkers such as CBMC [2] essentially search the entire
state-space of the program; you can conceptually think of this as searching
through a graph which has some start node, and you'd like to find a path from
the start to an error. This is a bug hunting technique since the
software/hardware may have an infinite state space (e.g., `while (true) ++i`),
hence the search space must be bounded (e.g., unroll loops). For finite-state
systems this can generate proofs of correctness. For infinite state systems, it
can generate proofs up to a certain bound (e.g., correct for some number of
clock cycles, correct for some number of loop iterations). The benefit of
bounded model checkers is that when they find an error they generate a
counter-example, which is essentially inputs to the program causing it to fail.
So, you could use the counter-example as a concrete test case to fix the issue.
Techniques such as counter-example guided abstraction refinement, and more
recently IC3 and property-directed reachability make use of these
counter-examples to generate proofs for infinite state systems.
A complementary technique is abstract interpretation, and numerical abstract
interpretation in particular. One successful tool is Astree, though closed
source, has been used to verify properties on Airbus planes. Abstract
interpretation is complementary to something like bounded-model checking since
it can be used to verify properties on infinite state systems. Numerical
abstract interpretation can be used to generate numerical proofs such as, "on
line 10, variable `x` is in the range `[-10, 10]`, or constraints between
variables such as `x - y >= 10`. A difficult part with abstract interpretation,
however, is that when it reports an error it may be the case that the error is
a false-alarm and it doesn't actually exist. You can think of numerical
abstract interpretation as typical "compiler algorithms" such as constant
propagation but keeping track of much more complicated information.
Symbolic/Concolic execution, such as KLEE [4], Pex and Moles [5], and their
successor IntelliTest [6], are sort of a middle ground between model checking,
and traditional unit testing (i.e., hand writing inputs to the program to test
it). The main goal is to automatically generate such tests in order to test
behavior in the program (e.g., generate a set of test inputs to have 100% code
coverage). The modern versions of these techniques make use of dynamic
information (i.e., they actually execute the program under test), and use this
information to make decisions about future executions. More concretely, you can
imagine the program executing past some branch `if (c)` which was not taken
(i.e., `c == false`) and then, the analysis asks the question: "what do the
program inputs need to be such that `c == true`. An answer to this question
generates new inputs and allows another test to be generated and explored.
Stateless model checking is another automated dynamic technique [7,8,9] used to
explore non-determinism in concurrent programs. Essentially, it automatically
generates "test cases" (i.e., new thread schedules) to efficiently explore the
state space caused by a non-deterministic scheduler.
And there's a bunch of other techniques out there too (e.g., symbolic model
checking). There's also a huge amount of work on practical test input
generation for hardware. On top of all these techniques, there are there
applications to solving specific problems, and heuristics to make them more
practical and scalable. Overall, most of the techniques are niche and not
widely used, but have been applied in various areas.
I work on software that goes to space. I believe we and NASA use the MISRA coding standard, which is effectively this but taken to extremes. If you actually want to create safety critical code I would recommend that you familiarize yourself with the MISRA ruleset and DO-178B.
If anyone is looking into this sort of stuff, you can check out the relatively new DO-178C as well, which adds some more details about tool qualification, formal methods, and model-based development. Signed, your friendly neighbourhood aerospace systems engineer.
Gerard Holzmann is the main author of the formal software verification tool "spin" as well as the owner of spinroot.com. I took a class on formal verification under him at Caltech.
As an anecdote, spin was used in Plan 9 to validate the kernel scheduler and the IL network protocol. It was also used to validate a very early version of the Go scheduler.
I work for your friendly Aviation megacorp producing avionics-related widgets. We are currently transitioning to DO-178C. Fun anecdote: I took a training class on Peach Fuzzer last year & got to hear people from MS, Mozilla, etc. talk about the testing they do on their software. I asked them if they ever had to prove things like worst-case execution time, worst-case stack utilization, perform MC/DC analysis, perform source-to-object analysis, etc., etc. They did not ... and that, my friends, is why avionics software is so damned expensive compared to other software :)
EDIT: And to those of you debating whether all of these rules are necessary and good -- they are; however, they are a subset. We use an internal coding standard that resembles MISRA C but does have some modifications. The only rule of his that we don't do nearly the same way is the 60 line limit. We use code complexity metrics rather than a strict line limit, but it's the same idea.
I would add "use static type checker from day 1". It's amazing how much stuff Coverity discovers and it can help with the clutter of rechecking invariants of nested functions.
Rules like this are much more useful if they're enforced. That's why efforts like the C Secure Coding Rules[1], which were designed with the capabilities of reasonably state of the art analyzers in mind, are a good idea. Like any standard designed by a committee, there are some unfortunate compromises.
Many commercial (and open source) static analyzers will have checks that are hard to codify in rules that are digestible by humans. For example, most analyzers that use sophisticated interprocedural analysis will not be easily described as a guideline for humans, unless those humans are compiler engineers. This is especially true for analyzers that have heuristics built in to minimize false positives, which is most often a real-world requirement.
Opinionated, maybe (although it seems to me most rules are reasoned and a justification is given).
But something practical has to be specific.
It has happened time and again to come across something it appeared to me as an arbitrary an unnecessary rule, but with more experience I think it's better to have some "debatable" specific rules that something "open to interpretation", especially if many people will have to work on the same codebase.
Not really. Gerard Holzmann is in charge of software at NASA/JPL. Each rule comes from experience with a mission loss that would have been prevented if people had implemented that rule. They also help static analysers become more accurate.
These rules are specific to ultra-reliable embedded software, flight software or software for life-critical equipment. They are not really relevant for most normal programming.
That doesn't mean that the rules are perfect. There haven't been that many mission losses to have a large corpus of data and finding a root cause of an accident might not lead to the best rules. "The rocket wouldn't have blown up if you asserted this here" doesn't mean that adding a large number assertions necessarily leads to better code.
Sure and this list likely worked well in the NASA/JBL context. The point being is it needlessly specific as a generic advice.
At the risk of stating the obvious, a "goto exit" pattern is the defacto standard for function cleanup in Linux kernel, which may not be flying every space probe, but it's still pretty damn ultra-reliable piece of software.
> [Linux is] still pretty damn ultra-reliable piece of software.
It's really not. I don't think you have a good sense of how reliable the systems in this space really are. Firmwares and other ultra-reliable systems typically use microkernels, and there are billions of units of verified L4 microkernels out there. Linux can't even dream of approaching this level of reliability.
Talk about missing the point and the whole context around mission-critical software.
> a "goto exit" pattern is the defacto standard for function cleanup in Linux kernel
There are many ways to do error cleanup that do not involve goto, and there are many bad ways to use goto. A static analyser can't decide if you do "good goto" or "bad goto", but it's easy to ban goto outright and avoid all the "bad gotos". You pay the price in terms of programmer work (he has to do something different), but you gain better static analysis and eliminate a full class of bugs.
This tradeoff makes sense for JPL and other people who do mission-critical of life-critical software. It might not make sense for the Linux kernel or most C software.
Mission-critical is not the place where you want to rely on programmers to do their work correctly (goto fail anyone?). You rely on specific methodology, standards, and static analysis.
> A static analyser can't decide if you do "good goto" or "bad goto", but it's easy to ban goto outright and avoid all the "bad gotos".
That depends on the goto statement. A good static analyzer can certainly recognize a "goto exit" pattern. There is very little difference in a function graph between an if/then/else clause and a goto statement. In fact, most static analyzers perform a first pass that is pretty similar to a compiler so that it can build a function graph for performing data analysis along each of the potential branches. One this pass is performed, it doesn't matter if you used an if/then/else, a for/while/do-while, a switch statement, or a complex sequence of break or continue statements. A graph is a graph.
The more succinct question is, "What constitutes a 'good' goto statement?" From the perspective of a static analyzer, it would be applying the exact same analysis as it would for any other branch in the function graph. If it detects a memory leak, a use-after-free, or a use of data which isn't always initialized, it makes little difference whether this is because of a bad goto statement or a sloppy set of conditionals. Either the graph shows a problem, or it does not.
Don't know your definition of microcontroller but VxWorks runs on e.g. Cortex-M3/4, MIPS, 68K, SH, SparcV8, PowerPC thus including some devices I could microcontrollers. But there is no clear cut definition of microcontroller, microprocessor, SoC, whatnot.
An ultra reliable software comes down to knowing its behaviour fully in its released form.
If the code has 100s of gotos and their paths are mapped out and its ramifications of the paths are clearly understood for all scenarios, then it is still reliable as its full behaviour is understood and is safe.
Because the justification to avoid recursion is basically avoid running into the limit of stack, and tail-call recursion uses no stack space at all. Of course you still might end up with situations where the function does not terminate, but simple loops can do that too. In relation to Rule No.2, it is also possible to prove an upper bound for recursive depths.
Recursion is recursion. Tail recursion is optimized away by the compiler except when it isn't (I've seen bugs with this exact setup more than once), but now you have to verify that the compiler actually does this. It's simpler to just avoid it all together unless you want to formally verify the recursion depth.
Apart from that, TCO destroys information that is valuable for debugging (especially for non-recursive tail calls), exactly what you want to avoid in safety-critical software.
You can't convert arbitrary tail calls to a loop. You can only convert recursive tail calls to a loop. The information loss associated with non-recursive tail calls is great. It's true that recursive tail calls lose less information, but it is not true that it's equivalent to the information lost in a loop. In a loop you always have access to {head, current, tail} while in a tail call you might (subject to algorithm and compiler optimization) only have access to {current, head|tail}, depending if the loop goes forwards or backwards.
But I've found that non-recursive tail calls are great in handling state machines. I wrote a TFTP client and server in Lua [1] and the code was both much cleaner and more reusable between both client and server. Granted, Lua does proper TCO (as long as you properly code the return statement) and the lack of stack trace wasn't much of an issue there.
These rules are mostly geared towards C, which doesn't guarantee TCO at all.
[1] Partly as way to learn Lua and TCO, and as a way to manage Cisco router config files in a version control system.
Exactly. Tail recursion can always be rewritten as a loop. The code might be uglier but at least you have a better idea of how it's going to behave after compilation.
> The code might be uglier but at least you have a better idea of how it's going to behave after compilation.
The fact that it's uglier suggests you'd have less of an idea how it's going to behave than simple tail recursion. That said, I agree with the idea that if you're after tail recursion, an annotation that the compiler checks that ensures the call is properly tail recursive is a good idea.
Quite often, when time you suggest using a loop rather than a tail call, people pop up to tell you that the code is neater as a tail call. So I'm assuming the comment about ugliness is a way of heading off these people ;)
I'm happy to argue that in C, it's never neater to write a loop as a tail call. Tail calls aren't guaranteed to compile into a goto, so it can be difficult to say whether it's going to end up with a call or a goto - even on a case-by-case basis. And if you write a tail call expecting a goto, and you get a call, obviously you can't just give up and go home - you'll need to rewrite it as a loop, just like you could have done in the first place. This does not strike me as less ugly than just writing that loop.
I know people like to do things the roundabout way, by writing one thing, certain that the compiler will do something else, and then saying this code is more clear rather than less so - but I've never quite understood why. A better rule of thumb to my mind is that if you want something particular to happen: write the code that way.
(I'm not denying an annotation would help - because then at least you'd know statically when you need to rewrite your code as a loop, rather than having to wait until something bad happens. Though... maybe if nothing bad ever happens... it's fine after all...? Sheesh. Ask a philosopher. Too much for me!)
And during the maintenance phase someone modifies the code and what was pure tail recursion becomes something else and literally blows up the rocket.
The whole point of these rules is defensive programming. Programming not what is correct but what is robust i.e. if anything breaks it still produces possibly suboptimal but not catastrophic results.
Not all compilers optimize tail-call recursion into - effectively - loops. And even those that can might only do so when certain compiler flags are active.
These are reasoned, common sense measures, most of which I have seen in one form or another in the coding guidelines of the companies I have worked for as an embedded SW engineer.
Some of them are easy to enforce (no dynamic memory allocation, no recursion), other less so (using asserts, limit function size).
The worst is adding your rules-compliant code to messy and bloated legacy code :)
What have you guys been using to track requirements? I had a bad experience adapting Redmine for a project, and wouldn't want to repeat that in a next project.
I have heard that Polarion is an ok requirements management tool for software, but it is expensive.
DOORS is good for tracking and tracing requirements, but from what I can see doesn't do code parsing. This may have changed. However, my opinion of IBM software is irreparably broken so I don't have a huge amount of trust in it.
We use Reqtify to glue requirement documents to code. It also sucks, but I think it's still powerful. You just need to set up some regexes which can recognize your requirement identifier style and you're good to go.
Whoever comes up with a solution to glue DOORS requirements to Jira issues to code seamlessly is going to be fairly rich.
I've done a bit a research on this. Company uses DOORS and hates it. We've evaluated JAMA and Polarion (both expensive) and both are better than DOORS. JAMA has a nice JIRA integration. Polarion can be integrated it seems but untested.
I wrote my capstone project a week ago on rolling out Simulink for automatically generating SC code en masse and minimising hand written code and manual translation.
I referenced the paper mentioned in my literature review
I would not link it here to the world to see as it is just an undergraduate project and my research did not yield a highly quantitive result, but I am applying the technique on a project right now with good success
10 Rules isn't going to cut it really, its a lot more detailed than that. Here is the Spark reference manual (a standard for Ada that has been the industrial standard for decades): http://docs.adacore.com/spark2014-docs/html/lrm/
Yeah, the point of these rules (and the 31 JPL rules) is concision. Your programmers will read a 10 page coding standard, and might remember it. Nobody is going to remember hundreds of rules and hundreds of pages. They won't even read it.
In his talks Holzmann always makes the point that a coding standard is a hard sell for most people, and JPL hadn't had any success until they could come up with something simple and concise.
Who would ever, for example, consider checking the return value of some function but then think "ah, this isn't safety critical, I'm just skip writing a quick _if_ statement and then leave this as a place that mysterious bugs can arise"? Same thing with assertions.
Less snarkily, there are billions of lines of production code that don't check return values for malloc(), printf(), and so on because those failures are extremely rare and difficult to handle sensibly.
Banning dynamic allocation would ban large areas of programming language - Lisp, Python, Javascript, and so on.
Even if you check the return value of malloc, that doesn't help when virtual memory is overcommitted. Malloc gets a valid pointer from mmap, but accessing the memory later crashes.
A system OOM may actually show a soft behavior, whereby the whole system thrashes more and more until it grinds to a halt. Whatever your program does is futile and irrelevant.
To QA the program for how it handles null out of malloc, you need to tweak the setup: disable overcommit and simulate low memory conditions. (Or even just switch to an alternative allocator that simulates failures.)
Checking the result of malloc for null is of course good for maximal portability: your code can be reused where the check actually means something.
Checking for null and not testing that logic is almost no better than not testing at all (in keeping with the general hypothesis that untested code is junk).
1) Some languages have tail call optimization and sometimes recursion is the clearest way to express an idea.
2) Requiring a fixed upper bound when iterating a collection of objects is a direct contradiction of the ZOI rule[0]. Most programs should not place arbitrary boundaries on the number of objects they'll operate on. This is better adapted to systems with relatively fixed databases and a predictably low volume of operator and sensor input (i.e. flight management computers, train control systems) more than internet services.
7) "For this reason, most coding guidelines for safety critical software also forbid the use of all ansi standard headers like string.h, stdlib.h, stdio.h" should speak for itself.
> Most programs should not place arbitrary boundaries on the number of objects they'll operate on
I think this can be argued both ways. The classic stateless pipe programs, sure, they can take as many lines as you want. User-facing programs? It can be worth imposing fairly small limits just so you don't have to worry about pathological behavior and security risks. (What happens if someone pastes a million lines of text into a small box, etc)
I'm writing a quick program that's supposed to compute something and print out the answer. I don't check the return value of printf(). Why not?
1. printf almost never fails. If it does, I'm just going to re-run the program. If that doesn't work, I'll reboot and then re-run. On average, I waste less time with that approach than with checking the return value of printf.
2. If printf fails, what am I going to do? Print out an error message? But that might fail, too. Log something? That could also fail. Should I check the return values of those attempts, too?
But safety-critical code is usually not calling printf - it may not have a conventional output device at all. (It should have some mechanism for alerting the operator if there are failures, though.)
Because these rules have a cost, and sometimes the cost is not worth it for non-safety critical code. It would be a PITA for general purpose code to avoid all dynamic memory allocation.
They are not, it's even explicitly allowed to repost links that didn't see activity as long as it is not overdone. I think currently the system only rejects perfectly identical URLs for a few hours.
This doesn't always work, for whatever reason. I've had this as well, but there's not much you can do. The Upvote game isn't that important, so don't lose sleep about it. Check the FAQ first, but send an email to HN support if it's a bug.
Does anyone have experience performing formal static analysis on code? And if so, are there any open-source examples of the state-of-the-art tools being used on nontrivial problems?
I've been looking at this recently, but I've never seen a proof of reasonably standard function like, say, printf. Most of the attempts at formal proof I've seen seem quite complex for very simple functions [1] and the next step up seems too complicated for me to follow [2]. I'm familiar with heuristic-based tools like clang analyser, coverity and flexelint - but none of those claim to produce any sort of proof. I'm also aware of MISRA but again, that doesn't claim to provide any sort of proof.
Those of you who are using formal methods: Is what I've seen really the state of the art? Or are there powerful tools I've overlooked?
[1] https://github.com/Beatgodes/klibc_framac_wp/blob/master/src... [2] https://github.com/seL4/l4v/blob/master/lib/Monad_WP/Strengt...