Can someone explain give an introductory run down on model checking vs theorem p...

yodsanklai · on July 3, 2014

Here's my understanding:

In model-checking, you describe an abstracted version of your system and you can automatically check some properties. Your model has to be small enough (in term of number of states) to be processed by the tool, and the class of formulas you can express may be limited. Typically, you can check safety properties (such as assert statement in your code).

On the other hand, using a theorem prover (I think "proof assistant" is a better term, you can work on more accurate representations of your system and express any properties, but most proofs have to be done manually which requires time and expertise.

As for dependently typed language, I think most of them are not expressive enough to express complex properties such as "the system will eventually reach a consensus if no more than half of the process fail". And those that are are more akin to theorem provers.

nmrm · on July 3, 2014

Since others have described the high-level difference between checking and proving, I will point out that you might not want to use a dependently typed language even if you plan on doing some proving.

A large cost of formal verification is designing and implementing specifications. For this reason, even if you plan on proofs for some components, it still makes sense to use a tool which supports both model checking and theorem proving using a single specification. Especially if your interests are non-academic.

Type-theoretic proof assistants typically don't support model checking (or, it's at least fair to say it's not in the culture even where supported). So to use model checking, you have to re-implement the specification.

oggy · on July 3, 2014

I'll give it a try, briefly. For both, you first need to give a mathematical meaning to your system, i.e. represent the system as some kind of well-defined mathematical object.

A typical choice (especially for concurrent/distributed systems) are transition systems. To create a transition system, you first need to describe the state of your (entire) system. This could, for instance, be the current values of all variables used in all processes of your distributed system and the set of all messages currently in the network. Next you need to model transitions; ways for the state to evolve. These would look something like "process p_i increments its variable x from 0 to 1, and sends a message to process p_j".

Given such a transition system you can now ask different, well-defined questions about it. The simplest kind are "safety" properties, where you ask whether your system can reach a state which is "bad" by some criterion. In a distributed consensus algorithm for instance, you could ask "can it be that two replicas decide on a different value?", or "can it be that my processes are all in the waiting state (deadlock)?". You need to phrase this question in some kind of formal language the tool can understand.

Model checking and theorem proving then go about different ways to answer the question. Model checking, roughly, tries to use brute force to answer the question and requires no human interaction in doing so. You could imagine it feeding every possible input to every process, choosing every possible interleaving of messages and, for every state reachable in such a manner, checking whether the bad thing happens.

In theorem proving, you try to provide the rationale of why things can't go wrong in form of theorems. This often follows the informal reasoning. E.g. suppose we're in the "earliest" bad state s, then previously X must've happened, which must've been caused by Y previously happening, which must've happened in some state s' which was also bad but earlier than s, which is a contradiction. However, you also have to convince the theorem prover that your reasoning is sound. So first you need to understand what methods of reasoning you are using precisely (e.g. my example essentially relied on induction, which might not be clear to most people), and you also need to somewhat understand the way of how the prover "ticks" and what kinds of reasoning steps it can perform automatically.

So theorem proving requires more training and effort than model checking. On the flip side, model checking typically works only for finite systems. So your model (transition system) could only have, say, 3 or 4 processes, and the range of the variables would be restricted (say my variable x could only range between 0 and 4), as would be the number of messages in the network. Obviously, this doesn't give you such strong guarantees - maybe your system works fine for 4 processes, but does something wrong when you have 5 of them.

Depending (hah!) on the type system used in your dependently typed language of choice, those languages are actually a kind of theorem provers! This includes Coq, but also Idris and Agda. They exploit the so-called Curry-Howard correspondence, which equates mathematical propositions to (dependent) types, and proofs of the said propositions to terms (programs) of the given type.