One of the contributors wrote a memoir of his time spent working on this project as a PhD student and gives an interesting perspective on the challenges encountered:
I use it - I am writing a subset-of-yaml parser, and I've littered the code with a bunch of abort()/assert() statements for things I don't think should be possible. Running klee over my code spits out ~100K test cases (I let it use any 10 byte YAML input), and so while I don't inspect the output of any of the cases - I do make sure that none of them crash.
Similarly, if I make a change which I don't expect to change the output of the parser, then I can use this test suite to be quite sure that my change hasn't changed behaviour.
One thing I am unclear about is what it means for klee to be "done" on my binary. When it has generated all the test cases, if none of them crash my binary, then does that mean it is impossible to crash my binary in 10 bytes of input?
Writing test harnesses specific to the different tools (so far, just AFL and KLEE) has not been particularly difficult - all of them fit easily on one screen. So I'm not sure that deepstate really brings much to the table for what I'm doing.
if your program calls system call or random function which KLEE couldn't handle, some parts of code may not be tested.
KLEE uses its special libc (uclibc) to deal with that cases in glibc. To solve this problem perfectly, we usually uses S2E instead of KLEE in large and complex softwares (http://s2e.systems/)
Right - but being a parser, I don't have those problems. My problem domain is quite well specified and there isn't any place for random numbers of system calls.
There's a large amount of work building on KLEE. Similarly, it builds on LLVM that has its own huge ecosystem. The potential of all that together is what interests me about it.
> KLEE [runs] a program considering its input(or some other variables) to be symbols instead of concrete values like 100 or “cacho”. In very few words, a symbolic execution runs through the code propagating symbols and conditions; forking execution at symbol dependant branches and asking the companion SMT solver for path feasibility or counter-examples.
KLEE is a symbolic execution engine. You can use it to analyse your program. A symbolic execution considers all possible values for all variables. E.g.:
int x = readIntFromIO(); // -2147483648 to 2147483647
On branches constraints are added.
if(x < 3) {
// -2147483648 to 2
} else {
// 3 to 2147483647
}
In this way all possible paths of the program are considered. When the execution terminates, e.g. on reaching a abort() or fail() statemant, a constraint solver calculates the input values to reach this statement. This values could be considers as test cases.
You can add assert() statements to your code for expressions (e.g. assert(x != 3) ) that you assume to be an inconsistent states and solver will give you test cases for this.
http://pgbovine.net/PhD-memoir.htm