I'm one of the researchers at Galois who's working on the quadcopter platform for HACMS. Let me first say that this headline makes me cringe just as much as anyone. I'd also like to give folks a pointer to some of the work we've developed on the project–it's all open source.
First we have Ivory, the DSL for doing safe systems programming that takes buffer overflows and other memory-safety bugs off the table: http://ivorylang.org/ [1]
Second, the flight control software we wrote for the 3DR Iris+: http://smaccmpilot.org/ The adventurous among you who fly with a Pixhawk can build it and try it out yourselves, and we'd love to hear your thoughts.
[1]: When we began work on HACMS, Rust was around but nowhere near mature enough to base a proposal on. These days, it covers a lot of the same bases, although currently Ivory is more suited for formal analysis with SMT solvers and model checkers.
The goals of Idris and similar languages are different from the goals we had with Ivory, though. We sometimes describe Ivory as "medium-assurance" as opposed to the high assurance one can get from a language with a more expressive type system.
It makes sense for some systems or components to be formally verified, for example the seL4 verified microkernel which is also used on our copter. However we simply would not have been able to formally verify (or even get to typecheck with fancier dependent types) something on the scale of an autopilot given the resources we had on the project. Instead we rely on getting quite a bit of bang for the buck with the memory safety features, and then augment with assertions.
We ended up with a working autopilot (and board-support package with drivers) in only ~4 engineer-years, so we think the tradeoff is working well so far :)
This is very interesting work! Do you have axiomatic semantics for Ivory and Tower, and/or specifications for SMACCMPilot? If so, it might be interesting to show specification preservation from Ivory to the Verified Software Toolchain (I'm just assuming that Ivory targets some C subset compatible with Verifiable C). With the ARM backend of CompCert this would give high assurance right down to the machine code.
In any case, I found your experience report to be very well written and would love to hear more about this project.
You might find our more recent Haskell Symposium paper of interest if you're interested in semantics (this is also a good reminder for us to update the webpage with a link): https://www.cs.indiana.edu/~lepike/pubs/ivory.pdf
Essentially, we built an Isabelle/HOL model of a Core Ivory language, and then used those semantics to prove type safety through the usual progress and preservation.
We don't have specifications of the higher-level properties of SMACCMPilot at the Ivory/Tower level. Instead our partners at Rockwell Collins are developing AADL specification and analysis tools to prove system-level properties at the architecture level.
Hello!
What are the reasons one would want to write C code in Haskell DSL instead of supplying ones C code with annotations and proving those annotations semi-automatically instead
(Frama-C) ?
Most people I know who have tried to write large-scale software with annotation techniques have stories that scare me away from ever wanting to experience such. As I mention in another comment, we sometimes describe our goal with Ivory as "medium assurance". In this sense, we are not precisely modeling the functional properties of the code, but rather use the combination of memory safety and occasional automatically-proved assertions to rule out the large classes of software defects/vulnerabilities that plague typical embedded software.
Another reason is simply productivity. Since the DSL is hosted within Haskell, we get the benefit of a fully-featured, expressive language for metaprogramming and abstraction of Ivory/Tower code. This has been crucial for letting us produce a functional autopilot (and board-support package/drivers) in only ~4 engineer-years.
We also have a stand-alone concrete syntax for Ivory that is intended to be more familiar to C/C++ programmers. It's being used heavily by our partners at Boeing on their Unmanned Little Bird. While they aren't reaping as many of the abstraction benefits as users of the EDSL, the fact that they can mostly focus on writing type-correct code vs. figuring out correct annotations makes them quite productive still.
Quick reminder that verified code is only hacker-proof provided that:
a) The specification correctly describes all correct behavior and correctly proscribes all incorrect behavior (basically, you trade a huge codebase for a smaller one in the form of the spec, which is a massive improvement, but as long as humans write the spec it is not clear it can ever be fully hacker-proof)
b) The trusted computing base works as modeled in the spec. Even on the limit, that means that the physical properties of the hardware are as we expect them to be and that our understanding of the laws of physics is correct. Slightly below this threshold there are things like Row Hammer attacks (https://en.wikipedia.org/wiki/Row_hammer ). Before even this: microcode bugs, proof-verifier bugs, etc.
That said, formal methods do provide a huge improvement over the state of the art of completely unspecified software, and have been shown to work in practice to greatly reduce the number of vulnerabilities of important large systems such as compilers, kernels and now remote controlled helicopters. They might not be appropriate for every piece of software quite yet, but I would greatly like to see this stuff for things like self-driving cars, airplane autopilots, smart grids and smart factories and in general all of the rapidly approaching "heavy" IoT applications. Also, some verified properties of commodity OS would be very nice to have (say container isolation in a cloud environment or non-leaking of keys for a smart wallet process).
The specs are often more abstract than the implementation. Rarely larger. When they're at that size or larger, it's usually one of two things: deficient spec tool that takes way too much syntax to express problem; more often the spec makes explicit contextual information that usually is just in developer's head. You want that stuff explicit when analyzing security or correctness, esp corner cases or integration errors.
You're confusing the specification with the equivalence proof. A spec is likely going to be about as simple as an implementation, with the benefit that it can be read more consistently.
All the specs for real world things I've seen are long and use fuzzy language. Making them formal enough to provide a basis for verification will only make them longer.
I wouldn't be so pessimistic about the possibility to produce useful specifications of real-world systems.
One recent successful large-scale verification is the CompCert C compiler. The compiler is verified relative to a formal specification of a large subset of C. This specification is in the order of <2000 lines of Coq code (see http://gallium.inria.fr/~xleroy/publi/Clight.pdf ). So it is an example of a specification that is much smaller than an implementation and that can be verified manually.
> The striking thing about our CompCert results is that the middle-
end bugs we found in all other compilers are absent. As of early 2011,
the under-development version of CompCert is the only compiler we
have tested for which Csmith cannot find wrong-code errors. This is
not for lack of trying: we have devoted about six CPU-years to the
task.
2. ExpressOS. Uses C# and verifiable Dafny language with annotations that feed into prover. The annotations were 2.8% of code.
3. seL4. This is kind of project that makes it seem like specs and proofs are huge. That's cuz people forget what they're specifying: an abstract CPU + imperative programming + kernel + safety/security properties + equivalence. Writing that in code also takes a lot of lines but they didn't have to in C past kernel. Anyway, this one is (a) for executable spec 5,700loc Haskell + 13,000loc Isabelle and (b) for implementation 8,700loc of C/ASM + 15,000loc Isabelle. They went full proof instead of code review. That added 200,000Kloc of proof steps with a good chunk automated.
4. VCC on Microsoft Hyper-V. Similar problem to 3 where this manages a CPU/machine. Total size was around 100,000loc last I checked. Verifying 20%... 20,000loc... took 13,500loc annotations. Verification automated.
5. Verdi protocol verifier. Spec/code reported is: Key-value store 41/138; Primary-backup 20/134; Raft algorithm 170/520; Verdi itself 148/220. Always smaller in its specs.
So, it's clear that the specs range from being tiny to smaller than code in most cases with heaviest, hardest efforts taking more specs. Proofs range from automated to manual.
> “We’re not claiming we’re going to prove an entire system is correct, 100 percent reliable in every bit, down to the circuit level,” Wing said. “That’s ridiculous to make those claims. We are much more clear about what we can and cannot do.”
This is an important statement that should be highlighted more prominently in the article (whose title I disagree with).
Formal methods are promising, and I'm glad that the researcher acknowledged that when your project relies on so many other abstractions and systems it doesn't matter how rigorously developed your solution is if a hacker can find a way to accomplish their objective through one of your dependencies.
Specifically, hardware level exploits are wholly disregarded by formal verification. This means bit-banging could still work. In many systems I'd also be wary of the OS itself.
They're disregarded by software verification. There's more formal verification deployed in hardware than software. There's also techniques to verify hardware and software together. The hardware problems you see are often deliberately left in there either due to corner cutting or backward compatibility with bad designs of the past. Both inspired by desire to see the number next to net income continue to rise. ;)
Has anyone successfully brought formal methods to an area of code that is newly being developed?
I have successfully convinced management to allow me to use formal methods on portions of very old code (~17 years old at the time) that are being rewritten.. but I've never been able to get management on board with the time required for new development with formal methods. The planning time up-front is more than what a lot of companies are willing to tradeoff for time-to-market.
A while back I was exploring the use of 'soft' formal methods to help in evaluating correctness of some new modules we had to develop.
I had come upon Alloy, which seems like a gentler introduction to formal methods (Unlike Z or Coq).
I've used it several times but the big problem with it is that once you have a nice, correct model that has been checked (and no counterexamples found), there's no way to translate it to code. This meand that your basically back on square one in terms of implementation.
I really like (and use quite often) the ASM method instead.
Can we get an actual example of a formally verified piece of code? If I have a function adding a and b, how does it look when it's "formally verified"?
Good news is that most of them abstract the stuff away for common operations. They'll let you annotate code that's transformed into a set of verification conditions that the prover automatically discharges. SPARK is an example:
Let's say that work isn't done. Then you have to hand model the properties of what you're doing. Arithmetic, expressed in bits, is harder than it appears to describe in a way that covers all the properties. Here's one on SmartMIPS:
"With formal methods, we can get proven-ish correct implementation, now we need bug-free specifications.... this in itself will be the next challenge."
That's much better. See, before you needed to (a) understand a vague specification, (b) get your concrete version of it right, and (c) implement that specification. Now, you just trust a precise specification. That it's precise, esp if in Z or something, let's you check it against the people giving you vague stuff too by asking them questions. Solves lots of problems.
Altran/Praxis Correct-by-Construction provides an illustration of how that works:
Aside from the obvious point that software can obly be as secure as the hardware it runs on, isn't Gödel's incompleteness theorem a problem with scaling from smallish problems to biggish ones?
Granted it has been a long time since I understood it even moderately well, but my main recollection is that a complete system of logic can't be consistent, and a consistent system can't be complete. That would seem to be an issue with extending formal methods to any computer system of broad general use.
Gödel's incompleteness theorems are more of a theoretical limitation than a practical one.
Roughly speaking, Gödel's incompleteness theorems say that all proof systems for number theory are limited in some way. For example, first-order Peano arithmetic is such a system, and one of its limitations is that it doesn't support transfinite induction. (It doesn't matter if you don't know what it is.) In other words, if you want to translate a mathematical proof to Peano arithmetic, you have to come up with a way to do it without transfinite induction. Sometimes, such in the case of Goodstein's theorem, this is impossible. To prove Goodstein's theorem, you have to choose a stronger proof system to begin with.
So, Gödel's theorems guarantee that no matter how strong you proof system is, there are always number-theoretic statements that are beyond its reach. But for reasons that are not currently completely understood, this doesn't really happen in practice. "Naturally occurring" examples of number-theoretic statements almost always turn out to be provable in surprisingly weak systems.
Instead, you run into practical problems: the theorem is provable in the system, but actually writing out the proof is utterly inconvenient. As an analogy, there are Turing-complete programming languages that don't have the concept of functions. In theory, they are capable of all kinds of computations, but in practice you don't want to use them.
And if, instead of mathematics, we concentrate on proofs of correctness, then this is even less of a practical problem. To quote Leslie Lamport, proofs of correctness "are seldom deep, but usually have considerable detail." The proofs may be long and complicated, but as long as they don't use any kind of ridiculously abstract techniques, they are just the kind of proofs where computers can have an advantage over humans.
Static type systems were explicitly designed to get around various inconsistency problems with standard formulations of mathematics (ZFC, etc.). Formal verification generally exploits such type systems. In a nutshell, the incompleteness theorem (and related problems) only apply to sufficiently powerful logical systems (namely, those that can model Robinson arithmetic). Less powerful (or, in some cases, incomparable) axiom schemes exist that are capable of being both consistent and complete. Usually, the cost is that (1) they are not capable of expressing all programs (but many people argue that they are capable of expressing all useful programs), and (2) some programs that are correct in those type systems may require exponentially larger proofs (types) than a less stringent system would require in order to accept the same term (again, many people argue that this does not matter in practice).
I think that the difference is that Gödel's theorem related to the general case, and not the specific case.
For example, the Halting Problem (which is, I admit, what I first thought of when I read this) proves that there is no general algorithm that can state whether any program will halt for any input.
However, it says nothing about whether you can prove that a specific program will halt for any input. You can easily make programs that are guaranteed to halt, provably.
I guess my thought was that if they are moving beyond very limited use cases and into broader ones (e.g. communication, which almost certainly involves very broad allowable inputs) then the existence of provably correct programs for small problems is less relevant.
But as I'm starting to understand, it may be that this is still a very limited case compared to the theoretical general case, even if it is significantly more general than e.g. proving a given program will halt.
I have always been interested in this area of programming. One thing I have always wanted is the ability to enforce restrictions on data types beyond memory. By that I mean say i am using an int, what if I want to restrict this to actually 1 -55 you can solve this using run time error handling but wouldnt it be better if we had static checks where upstream we could see an unbounded int being passed into this method and throw a compile error.
I think you could do that, but as long as you allow the usual operations on bounded variables (say, multiplication), you would have to ensure that the other operand is also bounded, or else you get an unbounded result nonetheless. Meaning, all variables that are used in any expression would have to be bounded eventually. There's also the problem that the compiler would have to infer that [1-10] - [1-2] can be negative, but does not have to. So with every expression on bounded variables, you increase their bounds' range, likely deleting the system's benefits.
I agree its super complex, these bounds wouldn't be limited to integer based variables. I would also argue there are very narrow instances where you would want to bound a variable and also do arithmetic on it. The language should make the bounding optional and unbounded possible.
I think rigorous application of formal methods plus deployment techniques like unikernels have the potential to improve the quality of a lot of software.
I'm more of a pessimist. If you can't get C programmers to handle memory buffers correctly, how are you going to get them to use actual formal methods or annotations that rely on understanding substantially more math than the average programmer has?
Make the infrastructure you run things on provide what appears to be formal methods for interactions, but also make it appear inconsistent over a given regular timeframe. Abusers of the system will find lots of issues with it because it will break unpredictably at random times if they are hammering away. Happy users of the system will having it break every now and again, but will likely avoid making things overly complex as a result of it increasing the chances it will break over a given time period.
Expecting infrastructure to be completely reliable is a fallacy and should be embraced instead of chased.
Suffering on the part of the former "programmer" (hacker) will be paid while obtaining results of various runs to exploit the system from the outside (external). Suffering on the part of the later "programmer" (coder) will be paid internally while working on more efficient solutions.
We'll need to code up the two types of suffering into stores of value, and tie them to cryptocurrencies, but that should be fairly straightforward to implement.
People learned 4GL's, BASIC, Scratch, COBOL, etc pretty easy. That means the solution is make it higher-level with the compiler, type system, protocol generators, etc doing most of the work. They just need a set of tools that let them express themselves in a way constrained enough to let the tooling produce correct code. Eiffel and Ada/SPARK results are also very promising on systematically eliminating errors from large applications. Experienced users will be able to do more than the basic 4GL or whatever.
You might regard unit tests and formal proof to be opposite ends of the same scale with model driven testing being somewhere in between.
One may be well advised to pick according to what you need. Formally proving the behaviour of your whole web app would likely take you an order of magnitude longer to write.
I think the biggest issue is from a business people perspective, formal requirements needed for this type of development (software development as an engineering discipline) is too slow, and not as open to change as the agile (craft) methods tend to be in software.
Exactly... anything dealing with critical hardware will/should be done as an engineering discipline wrt development, where tfa's practices could be better. That isn't the majority of software development, which is one-off line of business apps.
This is a big win for Haskell, since government contracts and approved vendors tend to have long arcs of doing future business. The idea an EDSL in Haskell was used greases the skids for any future Haskell acceptance on jobs. Good luck!
I don't get the idea of a formal method. You're attempting to prevent bugs in code by writing code that describes the final product. In any reasonably complex system the descriptions will be large, complex, and bug prone. How is a "formal method" any better then one of the following:
- Unit testing: Provide every test case possible
- Pure Functions: Use small methods that cannot error and always return sane values
- Fuzzing Testing: To find if the implementation is actually correct or if it is correct by chance
What is the benefit of a formal method? It's just inconceivable for me to see "Hacker-Proof code" and not laugh at whoever wrote this.
Even if your program only takes a 32 bit integer as an input you'd have to provide 4294967296 test cases. Most programs' inputs are a bit larger than that.
- Pure Functions: Use small methods that cannot error and always return sane values
"cannot error" is exactly what formal verification is about. Pureness doesn't prove correctness.
- Fuzzing Testing: To find if the implementation is actually correct or if it is correct by chance
Fuzz testing is good for finding buffer overflows and such, and those bugs can be ruled out even without formal verification methods (i.e. Rust). Otherwise you'd need to specify every test output possible anyway, and you're back to point 1.
> Even if your program only takes a 32 bit integer as an input you'd have to provide 4294967296 test cases. Most programs' inputs are a bit larger than that.
How do I go about defining a test using a formal method?
Also, you know as well as I do that you don't need to test every input, only a representative subset if you use unit tests and use a fuzzing testing library.
Using fuzzing, unit testing, pure functions and liberal preconditions through your project in development you'll likely come up with a product that has exactly what you are promising. Add a linter on top of that and for free you've got the safety of healthy paranoia.
Do you have an example of a project that uses "formal methods" so I can see what they are? I'd also like to see something written that has a comprehensive security audit.
Again, from what I'm seeing you're writing code to verify code and calling it "bug proof" just because you've written double the code. That's insanely misguided in my book.
> Also, you know as well as I do that you don't need to test every input, only a representative subset if you use unit tests and use a fuzzing testing library.
The thing is that you can never be sure that you chose a "representative subset". Unless you test every conceivable input you've verified that your program works for the subset you chose to test, nothing more. For example, how sure are you that your unit tests cover every conceivable edge case?
> How do I go about defining a test using a formal method?
You don't, it's a proof. You don't need tests.
Some links from another comment [0].
I'd also point you to Idris [1] for a friendly dependently typed language if you are interested in learning more.
Ok well what does it look like? Can you show me an example of this proof and how it is tested? You can't just say it's proven and remove the need for you to prove you've implemented it correctly. How is the proof validated and how is the source code checked against this?
Also by the act of checking the source code you've essentially just created a test case. So we are back to my initial phrasing.
What I'd like is for someone to show me how I go about proving something. Could you, if possible, walk through "proving" an implementation of a Fahrenheit and Celsius converter or something? What do you do? Where do you start? What does this look like?
That's whats been driving me crazy about all of this "formal methods" thing. People say "it's proven" and when you say "how do you check your code or how do you verify what you did was correct" they say "we made a proof, it's proven". No information is exchanged other then the assertion of "what I did was right".
Well, an F to C converter is not a terribly interesting case, because it would be trivial to implement. When you think about it, how would you test this in your favourite language of choice? By computing the expected value by using the same formula you used to compute the value you are testing in the first place.
But here's another example that came up recently. Let's say you want to write a function that takes a string and returns its first character. Now, it would be an error to call this function with an empty string, because then what would it return?
In a normal language, you'd probably check if the string was empty and then return null or something. When testing the function, you'd have to remember the empty string case and write an unit test for it (now I don't know about you, but I can't say I'd always remember to write a test case for this particular edge case; normally I'd forget).
Can we do better? It turns out we can.
We can encode invariants about our program on the type level. Types can be more than just strings, ints and chars. They can also be propositions, i.e. statements about our program. In this particular case we can construct a type
NonEmptyString
that can only be constructed by a function like this (using Haskell syntax):
neString :: String -> Maybe NonEmptyString
(This means that neString takes a String and returns a Maybe NonEmptyString). But what is this Maybe (also called Option, etc)?
It's like a union type - it can either contain some value, or it must be empty. Now the cool thing about Maybe is that we can't just take out the potential value contained inside it - we have to pattern match on it! What this means is, that whenever you want to see what's inside a Maybe the compiler forces you to tell it what to do in both cases - when there's something inside and when there's nothing inside.
Maybe you can see where this is going. This is how our takeFirstChar function would look like:
takeFirstChar :: NonEmptyString -> Char
Notice - we say that it only works on NonEmptyStrings! If we try to pass a regular string to it that would be a compiler error. But now we want to read something from the command line and returns it's first char. Let's say we have a read function:
read :: String
Every time you call this function it gives you a string read from the terminal. Ok, let's put our program together now:
a = read
print (takeFirstChar a)
But wait! We can't do that - as I said above, the compiler complains, because we try to pass a String to a function that expects a NonEmptyString. Instead we have to:
a = read
case (neString a)
Just st: print (takeFirstChar st)
Nothing: print "String was empty"
Notice how we pattern match on the result of (neString a). If the string was empty, we'd match the Nothing clause. In all other cases we'd print the first char.
What we did is prove that the program will behave correctly in all cases - no tests needed! We can't put the program together in any other way, because it would be a type error. In a sense, the fact that the program exists is a proof of its type (programs have types too!)
Now this is a pretty trivial invariant to encode on the type level, but in a dependently typed language you could encode whatever you'd like, i.e. for example that a list is sorted or that a number is the GCD of two other numbers, etc.
This is often not trivial to do, and at some point you start seeing diminishing returns, because the time it takes to rigorously encode a proof is longer than any possible return of investment caused by proving that there are no bugs. But doesn't have to be an either-or situation. You can encode as much invariants as you'd want in your code. I think Haskell strikes a good balance in that regard, although Idris is a very interesting language too.
> But here's another example that came up recently. Let's say you want to write a function that takes a string and returns its first character. Now, it would be an error to call this function with an empty string, because then what would it return?
This is wrong because it depends on the language. The behavior depends on the implementation but if you are meant to return the first character in the string and the string has no characters then you should return an empty string if provided a string with no characters.
From there, if given null you should return null back or throw an exception depending on the language.
My function would, abstractly, look:
function get_first_letter(s):
if s == null:
return null
if s == "":
return ""
return s[0]
value = get_first_letter(read_string())
switch (value):
case null:
print("No string entered")
return
default:
print(value)
return
Is this code "proven" just because it handles a every case? What makes this a proof or proven? This is basically just good practice: handle every case that a function can create. This is something that we have all been doing. Why makes a formal method "proven" or different.
> What we did is prove that the program will behave correctly in all cases - no tests needed! We can't put the program together in any other way, because it would be a type error. In a sense, the fact that the program exists is a proof of its type (programs have types too!)
This seems no different from any other language. You've just spread the code you had to write over "neString" and "takeFirstCharacter" and created a new type for it. Why couldn't you have just used the underlying functionality of String to make those checks when needed? For instance, "neString" must be using some sort of string method to check if a String is a NotEmptyString. Why not just put that code where it needs to be?
This just seems like a way to force your hand into using More types that have the same underlying implementation and I just see that as a way to confuse someone trying to come work on your project. When I look at source and say "wait a minute they have 10 types defined for strings and all they do is something that String could do beforehand" I'm a little irked. What makes the implementation that you have created "proven"? Just that the compiler said that all of the types line up? That doesn't make you any safer then before.
If you even make a single cast, a single unsafe block, or anything then you need to test the entire system. Your house of type-verifying cards falls down once you hit even the smallest section of unverifiable code. Just by making types for every one of your billion special cases doesn't mean that you don't need to test the code.
For instance in rust the standard library uses unsafe{} blocks often. This means that any code that goes into or comes out of that block is unverifiable for type. Just by using defined types to create it doesn't make it so the programmer has correctly implemented anything.
Also, you are assuming the implementation of neString is correct. What if that has a bug? Again, just because your passing your data around the correct way doesn't mean the correct things are happening with it.
You can't just make more types and call it all good.
> This is wrong because it depends on the language. The behavior depends on the implementation but if you are meant to return the first character in the string and the string has no characters then you should return an empty string if provided a string with no characters.
How? The type is:
takeFirstChar :: String -> Char
There's no such thing as an empty char. Your get_first_letter essentially has the type:
get_first_letter :: Either Null String -> Either Null String
Which doesn't guarantee anything and is arguably a lot worse than just:
get_first_letter :: NonEmptyString -> Char
This is why you can't have proven code in a lot of languages, because they permit you to lie about the actual types. From there, any proof is invalid because it is based on invalid propositions.
Dependently typed and other languages do not allow you to lie about the types (for example, a value can't be null unless this is reflected in the type; you can't throw exceptions or do IO unless this is reflected in the type, etc), hence a well typed program in such a language is a valid proof. You can not implicitly cast between types, i.e. a function like:
cast :: a -> b
doesn't exist.
A function like print actually has the type:
print :: String -> IO ()
Which tells you that it is doing IO. You can't call such a function from a pure function for example, the types wouldn't permit it.
> Also, you are assuming the implementation of neString is correct. What if that has a bug? Again, just because your passing your data around the correct way doesn't mean the correct things are happening with it.
I was simplifying, but neString would actually have a type similar to:
neString :: String n -> Maybe (String (n != 0))
where n is the length of the string. Notice, the length is encoded in the type! A wrong implementation (one that returns i.e. Maybe (String 0)) will not type check.
But yes, you can have errors in your proof, no doubt. But these are specification errors, which there's no way to get around. What a proof guarantees is that there are no errors in the implementation of your specification.
Ok so if you are given a specification and you use formal methods throughout and it doesn't conform to the specification you were provided with at the end you have made an error. The specification is not at fault here. What you are doing is claiming types into and out of functions decide if they are functioning correctly. This is not the case at all.
I think that most bugs aren't because the type you were expecting was wrong but the data produced by a function was wrong.
Implementations cannot be judged just because they accept the correct types.
For instance, if I am told to write a function to predict lottery numbers and do the following
If I call `get_lottery_number` by your logic it will always provide me with the real winning lottery number since the types check out. That is absolutely incorrect. Types can't prove functionality. They do provide protection from a set of bugs but those bugs are also avoided easily by handling every case that a function can return which was already being done before formal methods where an idea.
You can't say a program will have no bugs because the types check out. Is that what you are saying? Your code's types match in all uses -> this means there are no bugs -> this means I don't need to test my code?
> What you are doing is claiming types into and out of functions decide if they are functioning correctly. This is not the case at all.
When I say proof, I mean it in the mathematical sense. The Curry–Howard isomorphism [0] formalizes the relationship between (mathematical) logic and type theory. Unless you are saying mathematical logic is inconsistent (and thus every mathematical proof is invalid) this is absolutely the case.
The same way you can construct a mathematical proof that e.g. n is prime, you can can construct a type Prime n that proves that n is prime [1].
> I think that most bugs aren't because the type you were expecting was wrong but the data produced by a function was wrong.
This is where your misunderstanding stems from. Continuing with the example from above, if you have a function:
makePrime :: n -> Prime n
the function can not construct (Prime n) if n is not a prime.
> If I call `get_lottery_number` by your logic it will always provide me with the real winning lottery number since the types check out. That is absolutely incorrect. Types can't prove functionality.
If you give me a specification that constructs predicted lottery numbers I could absolutely turn it into a type :)
> They do provide protection from a set of bugs but those bugs are also avoided easily by handling every case
But you don't have proof that you are handling every case :)
> You can't say a program will have no bugs because the types check out. Is that what you are saying?
Yes, if the types correspond to the specification then there are no bugs. The specification can still have bugs though, but that's unavoidable. The thing is that you've proven the program does what you told it to do.
Once you stop thinking about types in the Java sense and start thinking about them as logical propositions it will all make sense. I provided you with a couple of links in the other comments and here if you are still interested.
Formal verification is a bit like using a super high level strict/static/safe programming language (often in conjunction with a classic programming language providing the actual run-time description which is being formally checked). Think C# attributes or python decorators but where they take up 80% of the code and are a language onto themselves. It basically does all of the things you described in your list through automatic formal mathematical proofs, rather than relying on chance or on the programmer directly (obviously if you write a backdoor into a formal requirement it'll make it past the proving step).
- No need to unit test specific cases when all the cases have been formally proven.
- Formal proving methods typically work best (or at least most easily) on pure functions for the reasons you described.
- No need to fuzz random values when the entire range has been formally proven.
If you ever took a theoretical computer science where the teacher had you write the formal induction proof for a loop, it's like that, where every possible case must be covered mathematically. We've simply gotten the computer to deduce the proofs for us automatically.
For example if there is a global integer that can be between 0 and 700, or the value 1000, then the formal verification system will make sure nothing in the program can write any other value to that memory. It will never increment or decrement when a 1000 is in there, it will never subtract 20 when a value less than 20 is in there, etc. Including by buffer overflows, or by pointer arithmetic in some random function, or by a runaway parsing algorithm. And then we do that for ever piece of memory, every subroutine call, and every possible input. And then we prove we covered every case and never violated our spec.
And we can add implications to our spec. For example the only way a command to launch a missile can happen is if the launch missile function is called, no other function no matter the state of the program can cause it to happen. This can prove that there aren't exploits to a complex program that might cause unwanted behavior. Can you prove that sort of thing to me about your software? Like can you prove to me your software is unbreakable? This solution can (with a much smaller error rate than "well I tried 17 things and none of them worked." because it can say "I tried literally every possible thing.").
It's a proactive solution. We are proving that it's mathematically impossible to break (engineering still being the weak point). That's an order of magnitude, if not more, better than what we have now.
So is it just writing a bunch of preconditions for a method? This was practice long before "formal methods" became a "trend".
I still haven't had anyone show me how this works.
If you, or anyone reading, could do the following I think it would really help all of us plebs who aren't much involved with academic computer science: can you write, from start to finish, a formally defined application documenting as you go and why you are doing it. For instance an F to C calculator. Could you make one of those and document your thoughts as you create it.
It's more like writing preconditions and postconditions, and proving that the implementation of the function will always satisfying the postconditions. And then also proving that no other part of your program can call the function in a way that violates its preconditions. The proofs are checked automatically.
I can't point you to a simple example. But you could check out the book "Software Foundations" (available for free at https://www.cis.upenn.edu/~bcpierce/sf/current/index.html). If you have time to read the first few chapters it may clarify things.
So then how is it any different from what most people already do? We already use assets liberally in development, specify everything in documentation, and then use linting systems to tell us if we violate our rules?
This just sounds like another name used by people to sound like they are doing something that was already being done in the professional.
Here [0] is a concrete example of using a formal verification system ontop of C (and the blog contains many other examples) along with an excellent dialog around the example.
First a caveat, this formal verification system is rather simple, and only proves code within the domain of the C standard. Using such a system with a buggy C compiler (pretty much all of them) or the result on subtly incorrect hardware (like most modern hardware) will still allow for the formal verification to be wrong. But it is an order of magnitude better than writing out pre-post conditions in comments, fuzzing, or automated tests. And slightly better than pure functions:
* Pre-post conditions are code and actually enforced at compile time.
* All possible cases are tested for correct behavior, no need to stochastically test ranges (fuzzing) when the entire range is proven or specific test cases (unit tests) when every corner case has been proven.
* Pure functions can't deal with state. Even monads still require a trap door, this system can still prove through those trap doors rather effectively.
Now for the example in question:
Note how the formal verification system points out a problem when the pointers match, could a linter notice that? Maybe, but probably not, at best it would be a "potential problem with equal pointers", it would have to actually understand the source code, and the intent of the programmer, like a formal verification system does, to know whether it's an actual problem. The formal verification system knows it's a problem because it deduced it from the rules of the C standard and the code.
So coding standards ("Always check for equal pointers when expecting different ones." or "Never call this function with equal pointers.") are a decent workaround. But an order of magnitude better is: "We've proven the code never violates our expectations." Accidentally violating those expectations is the purpose of the coding standards in the first place; we've fixed the root problem so the decent workaround is no longer required.
1) functional programming is a subset of mathematical proofs. Isabelle syntax is somewhat similar to what is offered in ML family of functional languages. Unlike functional languages, you can use more advanced constructs than bijections (can model superposition etc.) and it is easier to state things over sets.
2) imperative code is a subset to this too with added order of operations. Language is somewhat different though Isabelle has a module that has necessary proofs to verify imperative programs.
Just thinking out loud: if a program can check the correctness of another program against a formal spec, can't it just write the checked program from the spec?
You've just invented the field of program synthesis, which is a super active research area right now in programming languages! Here's the paper list from a class at UW focused on this area: http://courses.cs.washington.edu/courses/cse599a2/15wi/ . The Sumit Gulwani paper from the first week is a really good introduction to the field.
Aside from program synthesis, there is the more restricted form of code extraction, which turns definitions inside a theorem prover into code in a more traditional, runnable language such as OCaml. But you'll still want to do one or more refinement steps inside the theorem prover where you turn your spec into more efficient definitions (or even definitions that are computable at all - how do you evaluate an unbounded universal quantifier?)
As others have already replied, sometimes it can. But sometimes, and particularly if we care about efficiency, this is so difficult that we're not even close to being able to automate it.
For example, here is my "formal definition" of a primality checker:
IsPrime(Int n) = n > 1 and not(exists a, b in Int: a > 1 and b > 1 and a * b == n)
It is not directly executable, because it uses the "exists" quantifier over all integers. A clever code extractor should be able to somehow convert this to a finite computation. But would it be able to come up with the polynomial-time AKS primality test? [1] I highly doubt it.
Unless, of course, there is a special case for recognizing this particular definition. But I don't think that really counts, because I'm only using primality checking as an example. You can't have a special case for everything.
Actually, Isabelle has methods that check isomorphism of proofs in a brute force way. It is pretty slow, so it is almost always better to just say what specific proof you want to use.
In order to use formal verification you have to provide a complete and correct specification of what you want the software to do. We already have formal languages for doing this called programming languages.
How is this any different from a programming language?
Programming languages have to be (efficiently) executable, specifications can be logical/declarative. E.g. in a specification I can state things like "forall functions, with uncomputable property foo, we have bar". Plenty of things are uncomputable (e.g. quantifiers over natural numbers), but useful in specifications.
There are good and bad specifications, same as for programs. For the semantics of a programming language you could essentially transcribe an interpreter in the form of a structural operational semantics and these are relatively error prone. Instead, you could give an axiomatic semantics, which is a lot more high-level. An equivalence proof between the two gives you a high level of assurance that the operational interpretation you had in mind while writing your interpreter actually means what you think it means.
A recent example, the formal specification of the weak memory model of C11 turned out to be wrong (in the sense that it forbids common compiler optimizations, because programs have access to a time machine), but this was discovered when trying to develop a program logic for C11.
In practice, most broken specifications I have seen were written by people who never really worked in formal verification. I am not aware of a single instance where a piece of formally verified code was broken because of a broken specification. There are cases where the specification had to be extended. E.g. CompCert was initially developed as a whole program compiler and the spec had to be extended for separate compilation. This broke the alias analysis.
At the low end you have things like Python or Ruby, where you can write a loop over a type that can't be iterated, and it will crash at that line of code.
In the middle you have languages like Java, C#, at the mid-high end, even Haskell and F#. In these languages they enforce type safety at varying levels throughout the code, by first inferring the types at each point throughout the code, and then seeing whether they are consistent. You would not be able to loop over a non-iterable object here.
At the really high end you get into the realm of languages like Idris, Agda, and Ivory, and "formal verification" techniques. At this level you can not just write a loop, but you can write a loop that it is possible to prove terminates regardless of the input to the loop. This is done by setting constraints, and proving those constraints all the way through the code - the compiler will infer a lot of this, but it is still more work.
This is a very simplified view of it, but essentially the more formal you get in the specification, the less chance that the program can do something unintentional, crash, or be hacked. The vast majority of the code written today is not that formally specified.
To save others time. I've copied some of the biggest points of the article. My comments are in [brackets], except these^^ first few sentences.
"Between the lines it takes to write both the specification and the extra annotations needed to help the programming software reason about the code, a program that includes its formal verification information can be five times as long as a traditional program that was written to achieve the same end."
"Jeannette Wing, corporate vice president at Microsoft Research. “Any natural language is inherently ambiguous. In a formal specification you’re writing down a precise specification based on mathematics to explain what it is you want the program to do.”"
"“We’re not claiming we’re going to prove an entire system is correct, 100 percent reliable in every bit, down to the circuit level,” Wing said. “That’s ridiculous to make those claims. We are much more clear about what we can and cannot do.”"
[In the experiment the article is mostly based on]
"The hackers were mathematically guaranteed to get stuck. “They proved in a machine-checked way that the Red Team would not be able to break out of the partition, so it’s not surprising” that they couldn’t, Fisher said. “It’s consistent with the theorem, but it’s good to check.”"
"In a return to the spirit that animated early verification efforts in the 1970s, the DeepSpec collaboration led by Appel (who also worked on HACMS) is attempting to build a fully verified end-to-end system like a web server. If successful, the effort, which is funded by a $10 million grant from the National Science Foundation, would stitch together many of the smaller-scale verification successes of the last decade. Researchers have built a number of provably secure components, such as the core, or kernel, of an operating system. “What hadn’t been done, and is the challenge DeepSpec is focusing on, is how to connect those components together at specification interfaces,” Appel said."
Such 'hacker-proof code' can only be as secure as the underlying hardware. If such a cure already existed, we wouldn't be subjected to the current hacking/phishing epidemic.
"these malfunctions could be as simple as a buffer overflow"
No amount of 'formal verification' will detect or prevent buffer overflows, which are a defect in how the hardware allocates memory. The cure must also be found in the hardware.
"Back in the 20th century, if a program had a bug, that was bad, the program might crash, so be it,”
It must of been on a different planet, cause I can remember using networked computers in college way back in 1984. and the first Internet worm occurred in 1988.
Hardware bugs are NOT the major source of widely-exploited vulnerabilities in practice. They're a tiny fraction of exploits being used in the wild, and certainly of the ones being used to target the general population.
Further, a buffer overflow is absolutely not a defect in how hardware allocates memory. It's a logical flaw in the software that reads/writes beyond the limits of the memory the programmer intended to access.
It's also worth noting that some critical hardware components (e.g. CPUs since the infamous division bug) are formally verified as well, so manufacturing defects should be the largest concern for them.
Only a tiny fraction of the CPU is formally verified. There are frequent errata. And plenty of undefined behaviour as well as races you can hit if you really try.
This article was just wildly misleading, the title is well, it speaks for itself.
Interesting language and rather than 'formally proven' as the article wildly claims it is a strongly typed system ala Haskell. So yep looks a great idea for secure systems - but formally proven - bah! :-)
Has anyone here actually formally proved a non trivial program, say a couple of hundred lines on NCSS?
How about tens of thousands, hundreds of thousands or millions of lines of code. The complexity of the proof would be insanely complicated such that the level of expertise to define the requirements would be (if at all within the limits of human endeavour) only for maybe a handful of people. And do those people make good business analysts, or hackers, or entrepreneurs or even just communicators.
> Has anyone here actually formally proved a non trivial program, say a couple of hundred lines on NCSS?
I don't know what NCSS is, so am going to answer as if your sentence stopped two words earlier.
How about the CompCert C compiler [1]? Or CakeML [2]?
And let's not forget seL4 [3], which has a formal proof that the implementation in C conforms to the specification, and a further formal proof that the binary code is a correct translation of the C code.
First we have Ivory, the DSL for doing safe systems programming that takes buffer overflows and other memory-safety bugs off the table: http://ivorylang.org/ [1]
Second, the flight control software we wrote for the 3DR Iris+: http://smaccmpilot.org/ The adventurous among you who fly with a Pixhawk can build it and try it out yourselves, and we'd love to hear your thoughts.
[1]: When we began work on HACMS, Rust was around but nowhere near mature enough to base a proposal on. These days, it covers a lot of the same bases, although currently Ivory is more suited for formal analysis with SMT solvers and model checkers.