Probably. I used to do formal proof of correctness work and headed a project to build a verifier.[1] That stuff is very hard.
The partially initialized array thing is an issue of expressive power. You can't talk about that in Rust yet. This is a classic issue. The three big headaches in C around memory safety are "how big is it", "who owns it", and "who locks it". The language lacks the syntax to even talk about those issues. Rust can talk about those, which is a huge step forward.
Before you can even consider verifying something, you have to be able to talk about it in some formal language. Preferably the one you're programming in. Having to do formal specifications in a separate language is a huge headache. Been there, done that.
There are a few standard trouble spots. I've listed two of them. Most other unsafe code comes from
1) Foreign functions, which can be expected to decline over time as more libraries are implemented in Rust. (How's SSL/TLS in Rust coming along?)
2) "Optimization", which may be premature. This usually consists of bypassing subscript checks. I'd rather have the subscript checks on all the time, and see effort put into hoisting subscript checks out inner loops. (Subscript checks that aren't in inner loops usually aren't significant overhead items.)
3) replicating C/C++ code in Rust. (An early attempt was a transliteration of Doom into Rust, with lots of pointer arithmetic.)
Remember, it can blow at any seam. It only takes one buffer overflow to allow an exploit.
> Before you can even consider verifying something, you have to be able to talk about it in some formal language. Preferably the one you're programming in.
I'd personally be excited too see a modern language implement this. I saw the potential for this type of verification in my (hobbyist) dabbling with Haskell. Which subsequently inspired me to relearn math, including a great book on proofs recommended on HN which really changed the way I viewed/approached math.
The use-case analogies for formal verification can probably best be drawn from automated testing and TDD. Which is another 'optional' part of programming with varying degrees of usage -
although with a lower barrier to entry.
Types/proofs seem to have a positive influence on the 'best practice' part as well, as it promotes a programming style which force you to really consider the implementations you're coding. Very similar to testing.
At the moment formal proofs tools today seems to me like a rabbit hole with questionable practical ROI so I've been hesitant to try out the current state-of-the-art.
Assuming it does get built in to the language, even if it doesn't get used by the user they will likely benefit just by being able to build on layers beneath that were proven in the standard/popular libraries. I felt a similar feeling of assurance when building on top of well-typed Haskell libraries.
We built verification into the language in Pascal-F, 30 years ago.[1] That's rarely been done since in real-world imperative languages. It's much easier to keep the verification statements correct if they're in the same file and the same language as the program.
But Pascal was a small language. Getting this into today's bloated languages is tough. Back then, we looked at Ada, sized the project, and realized it was comparable to building an optimizing compiler for the language.
> We built verification into the language in Pascal-F, 30 years ago.
You sound like a very interesting person to buy a beer for, assuming you'd be patient enough for my questions :p, I'll check out the paper instead when I get the time.
> It's much easier to keep the verification statements correct if they're in the same file and the same language as the program.
Agreed. Hell even using Dialyzer in Erlang/Elixir for static type checking (which I make the effort to do often in my daily side project hacking) just doesn't feel right, even though it's still appended to function definitions in the original files.
This is one of those things that need to be a core part of the language design IMO, not just a tool built on top - 3rd party, by the core team, or otherwise. But, that said, tooling can still be superior compared to the complete absence of it.
Maybe you can answer a question I've been struggling to find the answer to via Google. Do you know the name of this popular older language/tool used by to Microsoft for doing formal verification? For c/c++ style code. It sounds like tkk or something similar? I can't seem to find it.
My other question I'd ask is if you think the testing analogy applies here as I mentioned in my comment above or do you see it as an entirely different paradigm? Basically changing how you program rather than adding on a tool/skillset.
> Before you can even consider verifying something, you have to be able to talk about it in some formal language.
Sure, but it doesn't have to be a programming language.
> Preferably the one you're programming in.
Who says so? Programming languages (justifiedly) optimize for the ability to express computation, not the ability to express proof.
In spite of Curry-Howard, there exist important differences between proofs and programs:
(0) Proofs are primarily for humans to understand. Programs are primarily for computers to execute.
(1) Proofs are sometimes allowed to be non-constructive. Even when they aren't, an easily understandable proof beats one that corresponds to an efficient program.
(2) Allowing non-terminating programs is actually a good thing: it allows for proofs of correctness that use techniques not anticipated by the language designer. OTOH, if your logic lets you prove a contradiction, it's the end of the world (for your logic).
Proofs and specifications are different things. Users need to be able to read specifications. For example, if you want to talk about definedness for arrays, a predicate
defined(arrayname, lowbound, highbound)
is helpful. That goes in assert statements. It's run-time checkable, if you have some extra state in the form of "is defined" flags. But you'd rather prove it once so the run-time checks are unnecessary. With a few simple theorems, such as
defined(a,i,j) and defined(a[j+1]) implies defined(a,i,j+1)
and a simple automated prover, you can deal with most of the issues around partially defined arrays.
Think of proof support as being an extension to optimization of assertions. Most assertions in programs can be proven easily with an automated prover. Users need never see those proofs. Some will be hard and require more proof support.
defined(a,i,j) and defined(a[j+1]) implies defined(a,i,j+1)
is a theorem. It looks like this in Boyer-Moore theory:
(PROVE-LEMMA arraytrue-extend-upward-rule (REWRITE)
(IMPLIES (AND (EQUAL (arraytrue A I J) T)
(EQUAL (alltrue (selecta A (ADD1 J))) T))
(EQUAL (arraytrue A I (ADD1 J)) T)))
Name the conjecture *1.
We will try to prove it by induction. There are three plausible
inductions. They merge into two likely candidate inductions. However, only
one is unflawed. We will induct according to the following scheme:
(AND (IMPLIES (LESSP J I) (p A I J))
(IMPLIES (AND (NOT (LESSP J I))
(p A (ADD1 I) J))
(p A I J))).
Linear arithmetic informs us that the measure (DIFFERENCE (ADD1 J) I)
decreases according to the well-founded relation LESSP in each induction step
of the scheme. The above induction scheme leads to three new formulas:
Case 3. (IMPLIES (AND (LESSP J I)
(ARRAYTRUE A I J)
(EQUAL (ALLTRUE (SELECTA A (ADD1 J)))
T))
(ARRAYTRUE A I (ADD1 J))),
which simplifies, rewriting with ARRAYTRUE-VOID-RULE and SUB1-ADD1, and
opening up ARRAYTRUE and LESSP, to the following eight new conjectures:
...
That finishes the proof of *1. Q.E.D.
"arraytrue" is defined recursively:
(DEFN arraytrue (A I J)
(IF (LESSP J I) T -- the null case is true
(AND (EQUAL (alltrue (selecta A I)) T) -- next element is true
(arraytrue A (ADD1 I) J))) -- and rest of array is alltrue
And yes, there's a machine proof that this terminates.
> defined(a,i,j) and defined(a[j+1]) implies defined(a,i,j+1) is a theorem. (proof)
Sure. My point is just that a theorem is a proposition equipped with a proof. If the user just enters a proposition into the system, then they aren't entering a theorem. They're entering a proposition that the system can turn into a theorem.
> > But you'd rather prove it once so the run-time checks are unnecessary.
> No. I'd rather prove it to rule out the program being wrong. The runtime check is totally besides the point.
That's the same thing. It's just a matter of it being automated.
> > Think of proof support as being an extension to optimization of assertions.
> I never use runtime-checked assertions, so I never need to optimize them away.
If you've ever used a language that uses bounds checked array access, you have. IIRC, you've used Rust some, so you've used bounds checked array access, except where Rust was able to prove that out of bounds access was impossible and elide the code that checks.
> > Users need never see those proofs.
> But, you see, I want to see the proofs. How am I supposed to maintain a program I don't understand?
There's a difference between need and ability. Just because it's stated you don't need to see something doesn't mean you can't. If you want to see a proof, you dig in and find it, or find where it's accepted canon that we don't need to reproduce (do we need proofs for integer addition? That seems of limited use to me, but maybe a formally proved as much as possible all the way to that level language has some interesting benefits).
> That's the same thing. It's just a matter of it being automated.
Re-read what he said. He presented runtime checks as an acceptable but non-optimal scenario for performance reasons. My reply was that runtime checks don't prevent a wrong program from being wrong, and it's this wrongness itself that I consider unacceptable.
> IIRC, you've used Rust some, so you've used bounds checked array access, except where Rust was able to prove that out of bounds access was impossible and elide the code that checks.
Runtime bounds-checked array manipulation is literally the single thing I hate the most about the languages I like the most (ML and Rust). It introduces a control flow path that shouldn't be reachable in a correct program, and hence shouldn't exist. This is particularly painful because beautiful array-manipulating algorithms have existed since, like, forever, yet in 2017 I still can't express them elegantly.
> If you want to see a proof, you dig in and find it, or find where it's accepted canon that we don't need to reproduce (do we need proofs for integer addition)?
I don't need to rebuild arithmetic from scratch again, because I've already done it at some point in time, and once is enough as long as you understand the process.
On the other hand, when I'm first confronted with an already existing program, I don't understand why the program is correct (if it is even correct in the first place), so I do have to set some time aside to properly study it.
The partially initialized array thing is an issue of expressive power. You can't talk about that in Rust yet. This is a classic issue. The three big headaches in C around memory safety are "how big is it", "who owns it", and "who locks it". The language lacks the syntax to even talk about those issues. Rust can talk about those, which is a huge step forward.
Before you can even consider verifying something, you have to be able to talk about it in some formal language. Preferably the one you're programming in. Having to do formal specifications in a separate language is a huge headache. Been there, done that.
There are a few standard trouble spots. I've listed two of them. Most other unsafe code comes from
1) Foreign functions, which can be expected to decline over time as more libraries are implemented in Rust. (How's SSL/TLS in Rust coming along?)
2) "Optimization", which may be premature. This usually consists of bypassing subscript checks. I'd rather have the subscript checks on all the time, and see effort put into hoisting subscript checks out inner loops. (Subscript checks that aren't in inner loops usually aren't significant overhead items.)
3) replicating C/C++ code in Rust. (An early attempt was a transliteration of Doom into Rust, with lots of pointer arithmetic.)
Remember, it can blow at any seam. It only takes one buffer overflow to allow an exploit.
[1] https://github.com/John-Nagle/pasv