It starts with an invalid .jpg (literally a text file containing "hello"), and b...

ulber · on Nov 7, 2014

>In this particular case, the fuzzer is going beyond just throwing random input, as it considers which changes to the input trigger new code paths in the target binary, and therefore should have a higher success rate in triggering bugs compared to just trying random stuff.

To expand on this, techniques like this are called whitebox fuzzing (or maybe graybox in afl's case). In their extreme whitebox fuzzers even incorporate constraint solvers to directly solve inputs that take the program to previously unexplored paths. One very impressive project is the SAGE whitebox fuzzer [1,2,3] that's in production use at Microsoft (an internal project sadly). I work in the related field of automated test generation, but all my tools are very much research-grade. However, in SAGE they've done all the work of figuring out how 24/7 whitebox fuzzing can be integrated into the development process. I am somewhat envious of the researchers getting to work in an environment where that is possible. If you're interested I very much recommend reading the papers on SAGE.

[1] Poster about SAGE: http://research.microsoft.com/en-us/um/people/pg/public_psfi...

[2] An approachable article on SAGE: http://research.microsoft.com/en-us/um/people/pg/public_psfi...

[3] The paper with all the details: http://research.microsoft.com/en-us/projects/atg/ndss2008.pd...

f- · on Nov 8, 2014

The main problem with SAGE is that at least outside Microsoft, it exists just as a series of (very enthusiastic) papers :-)

So, while I suspect it's very cool, it's also a bit of a no-op for everybody else. It's also impossible to independently evaluate the benefits: for example, its performance cost, the amount of fine-tuning and configuration required for each target, the relative gains compared to less sophisticated instrumented fuzzing strategies, etc.

contingencies · on Nov 7, 2014

Great links, thanks.

3rd3 · on Nov 7, 2014

Why does it use fuzzed input in the first place? Couldn’t one just use random input from the beginning instead? It would be effectively equivalent but fuzzing of a "hello" string seems to be roundabout.

0x0 · on Nov 7, 2014

Well, "hello" is pretty random. :) It was probably just used for dramatic effect in the demo, and you have to start with something - of course even a 0 byte file would be enough.

You could also have a started with a valid .jpg with lots of complicated embedded exif metadata sections etc, and have a good chance of triggering bugs in those code paths without having to "discover exif" first.

sp332 · on Nov 7, 2014

From the article: it works without any special preparation: there is nothing special about the "hello" string.

Houshalter · on Nov 8, 2014

He said it took a day to find good jpg images. If you started the program with a valid input, then it would take much less time to explore the other code paths.

Zikes · on Nov 7, 2014

In this case "hello" was just a pseudorandom starter to seed the fuzzer.