Compiler fuzzing, part 1

bufferoverflow · on June 24, 2018

This is a beautiful bug:

    inline namespace {
            namespace {}
    }
    
    namespace {}

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84707

CJefferson · on June 25, 2018

I would strongly recommend fuzzing for testing any software. You are likely to find interesting things quickly, if the software hasn't been fuzzed before.

I fuzzed early clang for segfaults, just for fun. I found dozens of bugs by just taking a large preprocessed C++ file, deleting a few lines, and then when finding a segfault, reducing by deleting lines and tokens until I reached a minimal example of the same error (make sure you are crashing in the same place, or you will probably find you keep rediscovering the same bug).

Reported some great fun minimal crashes, like:

:)

:{}

namespace

I also fuzz in my day job. There we want to fuzz valid inputs -- we have a bunch of ways of generating two inputs which should produce the same output -- generate a pair of inputs, test outputs, repeat. It has found hundreds of bugs, and almost no bugs (famous last words) have been reported in releases.

stevekemp · on June 25, 2018

Seconded. I reported bugs in GNU Awk, found via fuzzing, and discovered that fingerprinting bogus SSH keys could result in null-pointer dereferences. Of course I also discovered bugs in my own software!

(In my case I fuzzed a simple virtual machine and the display-mode of a console-based email-client.)

It is disappointing how often it is trivial to find bugs via fuzzing, which I guess just underlines the point that we SHOULD be fuzzing more often. Even on programs/libraries that are "old".

jwilk · on June 25, 2018

Minimizing testcases can (and should!) be automated.

For C(-like) code, there's venerable Delta (http://delta.tigris.org/) and more powerful C-Reduce (https://embed.cs.utah.edu/creduce/).

For binary formats, there's afl-tmin, part of AFL (http://lcamtuf.coredump.cx/afl/).

CJefferson · on June 25, 2018

Don't worry, I did automate! However, I did it myself, for the experience. Some horrible bash and awk scripts did sufficiently for my purposes.

MaxBarraclough · on June 25, 2018

You're doing fine work!

> I found dozens of bugs by just taking a large preprocessed C++ file, deleting a few lines, and then when finding a segfault, reducing by deleting lines and tokens until I reached a minimal example of the same error

To run with this idea: this process could be automated, and applied to any number of different compilers.

Out of curiosity, did these bugs tend to be things like buffer-overflows? How many of the bugs you reported do you think would have been avoided had the compiler been written in a 'safe' language rather than C++?

CJefferson · on June 25, 2018

Actually, many of the bugs were asserts triggering (segfault isn't quite right). In the case of C++, there are just so many weird states the compiler can find itself in, particularly in the case of clang which tries hard to produce good errors, so keeps parsing even after an error has been found.

MaxBarraclough · on June 25, 2018

> many of the bugs were asserts triggering

I'd say that's the best kind of bug. Reflects well on the codebase.

C++ parsing is an infamously untidy problem.

kakwa_ · on June 25, 2018

Fuzzing is really helpful. Even trying really hard to check for buffer overflow, it's nearly impossible to catch it all.

Kudos for the AFL creator http://lcamtuf.coredump.cx/afl/

I would love if it was easier to do "external checks" (for example network fuzzing). Something like a dissociation of the "submit payload" and the "tortured piece of software" would be really really nice.

bhaak · on June 25, 2018

> I have never had such a pleasant bug-reporting experience. Within a day of reporting a new bug, somebody [...] would run the test case and mark the bug as confirmed as well as bisect it using their huge library of precompiled gcc binaries to find the exact revision where the bug was introduced.

That's a really cool setup. Precompiled binaries to quickly git bisect a change.

How often did you run into not being able to use git bisect because you encountered problems due to code changes that were unrelated to the bug you were hunting?

Quickly being able to reproduce and find the commit for a bug is tremendously useful.

sligor · on June 25, 2018

A little off-topic but... what is the interest for large company like Oracle to pay employees on such research effort ? What is the value added for the company ? real product ? or keeping links with advanced research and maintaining high level of knowledge in the company?

vidarh · on June 25, 2018

Oracle owns Java, so has a direct reason to be interested in compiler tech, and other parts of the software suite also would benefit from similar types of fuzzing (e.g. their query processing for the database).

But apart from the direct value, there's a huge value in keeping high paid staff motivated and sharpening their skills of course, on top of the good will it creates among potential hires.

boomlinde · on June 25, 2018

I wrote a toy compiler/VM and decided to fuzz test with radamsa. The language is quite forth-like, so there is minimal syntax and every program is valid so long as the words that are used are defined and the stack is balanced, which makes it a perfect subject for fuzzing. After finding some low hanging fruit almost immediately (segfaults), I let it run for another couple of hours.

Then, the computer started swapping like hell and became unresponsive, which didn't settle for another 10 minutes after I had shot down the process. Looking at the case that radamsa had generated, it had found a billion laughs attack vector. Macros in my language can be defined recursively, and the code is stored in an array that gets reallocated and grown when the code no longer fits, unboundedly. Radamsa had created an initial macro and then redefined it over and over such that it always referred to the previous definition twice.

I was optimistic about fuzzing, but I never really had any expectations of it finding anything other than stack smashing and segfaults.

willvarfar · on June 25, 2018

For the Mill project, we tested a large corpus of test programs and csmith-generated programs with native output against our simulator. We found lots of bugs in the various stages of our tool chain and simulator, but so far none in llvm itself.

jwilk · on June 25, 2018

Link to the project? The name is not very googlable...

ksk · on June 25, 2018

Interesting, but I suppose you still need a human to verify whether its valid or invalid grammar.

comex · on June 25, 2018

Not really. I took a look at some of the reported bugs, and all of the test cases seemed to be invalid code in some way, though often due to type or semantic errors rather than bad syntax. But the fuzzer is looking for crashes or internal compiler errors, which represent a bug regardless of how bogus the input to the compiler was.

ksk · on June 25, 2018

You're quite right. I used to work on fuzzers/fault-injectors in a previous job so I was interested because I thought they had figured out a way to find bugs by generating valid grammar. Not putting down what they've done though..

bjoli · on June 25, 2018

When the syntax is invalid, you should get an error saying that the syntax is invalid. Not a compiler crash and definitely not a segfault.

ksk · on June 25, 2018

Agreed.

berti · on June 25, 2018

And be careful that you don't break real code trying to fix something obscure that's never likely to be seen in real code.

dang · on June 24, 2018

Url changed from https://www.phoronix.com/scan.php?page=news_item&px=Prog-Fuz..., which is just a short paragraph that points to this.

In such a case where there's a much better original source, could you please post that instead? This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.