> *Fuzz testing has enjoyed great success at discovering security critical bugs ...

> Fuzz testing has enjoyed great success at discovering security critical bugs in real software. Recently, researchers have devoted significant effort to devising new fuzzing techniques, strategies, and algorithms. Such new ideas are primarily evaluated experimentally so an important question is: What experimental setup is needed to produce trustworthy results? We surveyed the recent research literature and assessed the experimental evaluations carried out by 32 fuzzing papers. We found problems in every evaluation we considered. We then performed our own extensive experimental evaluation using an existing fuzzer. Our results showed that the general problems we found in existing experimental evaluations can indeed translate to actual wrong or misleading assessments. We conclude with some guidelines that we hope will help improve experimental evaluations of fuzz testing algorithms, making reported results more robust.

Oh, I've been looking for an overview like this!

There's a ton of fuzzing papers, and they all claim some speedups over AFLs using some intuitively reasonable optimizations, but if you start stacking them you lose AFL's main draw, its simplicity. And the community maintaining AFL++ seems skeptical about most of these optimizations. So an overview of the ecosystem is very welcome.

EDIT: Oh, it's from 2018. That's depressing. I don't think the ecosystem has improved much since this paper was introduced.