I would strongly recommend fuzzing for testing any software. You are likely to find interesting things quickly, if the software hasn't been fuzzed before.
I fuzzed early clang for segfaults, just for fun. I found dozens of bugs by just taking a large preprocessed C++ file, deleting a few lines, and then when finding a segfault, reducing by deleting lines and tokens until I reached a minimal example of the same error (make sure you are crashing in the same place, or you will probably find you keep rediscovering the same bug).
Reported some great fun minimal crashes, like:
:)
:{}
namespace
I also fuzz in my day job. There we want to fuzz valid inputs -- we have a bunch of ways of generating two inputs which should produce the same output -- generate a pair of inputs, test outputs, repeat. It has found hundreds of bugs, and almost no bugs (famous last words) have been reported in releases.
Seconded. I reported bugs in GNU Awk, found via fuzzing, and discovered that fingerprinting bogus SSH keys could result in null-pointer dereferences. Of course I also discovered bugs in my own software!
(In my case I fuzzed a simple virtual machine and the display-mode of a console-based email-client.)
It is disappointing how often it is trivial to find bugs via fuzzing, which I guess just underlines the point that we SHOULD be fuzzing more often. Even on programs/libraries that are "old".
> I found dozens of bugs by just taking a large preprocessed C++ file, deleting a few lines, and then when finding a segfault, reducing by deleting lines and tokens until I reached a minimal example of the same error
To run with this idea: this process could be automated, and applied to any number of different compilers.
Out of curiosity, did these bugs tend to be things like buffer-overflows? How many of the bugs you reported do you think would have been avoided had the compiler been written in a 'safe' language rather than C++?
Actually, many of the bugs were asserts triggering (segfault isn't quite right). In the case of C++, there are just so many weird states the compiler can find itself in, particularly in the case of clang which tries hard to produce good errors, so keeps parsing even after an error has been found.
I would love if it was easier to do "external checks" (for example network fuzzing). Something like a dissociation of the "submit payload" and the "tortured piece of software" would be really really nice.
> I have never had such a pleasant bug-reporting experience. Within a day of reporting a new bug, somebody [...] would run the test case and mark the bug as confirmed as well as bisect it using their huge library of precompiled gcc binaries to find the exact revision where the bug was introduced.
That's a really cool setup. Precompiled binaries to quickly git bisect a change.
How often did you run into not being able to use git bisect because you encountered problems due to code changes that were unrelated to the bug you were hunting?
Quickly being able to reproduce and find the commit for a bug is tremendously useful.
A little off-topic but... what is the interest for large company like Oracle to pay employees on such research effort ? What is the value added for the company ? real product ? or keeping links with advanced research and maintaining high level of knowledge in the company?
Oracle owns Java, so has a direct reason to be interested in compiler tech, and other parts of the software suite also would benefit from similar types of fuzzing (e.g. their query processing for the database).
But apart from the direct value, there's a huge value in keeping high paid staff motivated and sharpening their skills of course, on top of the good will it creates among potential hires.
I wrote a toy compiler/VM and decided to fuzz test with radamsa. The language is quite forth-like, so there is minimal syntax and every program is valid so long as the words that are used are defined and the stack is balanced, which makes it a perfect subject for fuzzing. After finding some low hanging fruit almost immediately (segfaults), I let it run for another couple of hours.
Then, the computer started swapping like hell and became unresponsive, which didn't settle for another 10 minutes after I had shot down the process. Looking at the case that radamsa had generated, it had found a billion laughs attack vector. Macros in my language can be defined recursively, and the code is stored in an array that gets reallocated and grown when the code no longer fits, unboundedly. Radamsa had created an initial macro and then redefined it over and over such that it always referred to the previous definition twice.
I was optimistic about fuzzing, but I never really had any expectations of it finding anything other than stack smashing and segfaults.
For the Mill project, we tested a large corpus of test programs and csmith-generated programs with native output against our simulator. We found lots of bugs in the various stages of our tool chain and simulator, but so far none in llvm itself.
Not really. I took a look at some of the reported bugs, and all of the test cases seemed to be invalid code in some way, though often due to type or semantic errors rather than bad syntax. But the fuzzer is looking for crashes or internal compiler errors, which represent a bug regardless of how bogus the input to the compiler was.
You're quite right. I used to work on fuzzers/fault-injectors in a previous job so I was interested because I thought they had figured out a way to find bugs by generating valid grammar. Not putting down what they've done though..