"We find that typ3c automatically converted 67.9% of pointers in our benchmark programs to checked types, which improves on the 48.4% inferred by unification-style algorithms used in prior work. boun3c was able to infer bounds for 77.3% of pointers that required them."
That's reasonably good. The real goal is to convert all pointers to either run-time checked pointers (slower) or compile-time checked pointers. It's already possible to convert everything to "fat pointers". GCC used to have that as an option. But the goal is to eliminate the need for that for code that's frequently executed. That is, inner loops.
It's good to see activity in this area.
The output code is kind of clunky looking, but that could be fixed.
At Correct Computation (https://correctcomputation.com/, we are hiring), we are developing a tool called 3C to provide automation assistance for the conversion of C code into Checked C. A completely automated approach is impractical (i.e., without making lots of changes to the target program's code and adding lots of overhead), but we have found a "best effort" approach works well.
Hopefully soon, “moving in that direction” can be done by slowly porting to Checked C, while always retaining an executable artifact. https://github.com/Microsoft/checkedc
The last comment is not justified by the paper. The paper shows one case where the dumb thing finds tons of crashes, but several other cases in the same figure where it finds zero crashes. And the figure there is just one target program. More experiments on this issue might be interesting.
One other thing to point out: Finding crashes and finding bugs are not the same thing, as section 7 carefully argues. It could well be that many of those crashes are duplicates. The blog post I linked above also summarizes this point.
The paper does not make strong statements of the form, "fuzzers are/do X" or "all prior papers' claims are bogus." Rather, we say that the standard of evidence should be higher, and we demonstrate why failing to reach that standard could result in bogus claims. It's quite possible that a paper's idea really is an improvement. But it's also possible that additional evidence would cast doubt on that, or nuance it. For example, our experiments show that AFLFast probably does improve on AFL, though perhaps not as much as that paper made out (note we didn't do enough experiments ourselves to make that definitive statement).
You can do a lot with load balancing on web architectures, but some things remain problematic. If you read the introduction of the Kitsune paper, linked from the kitsune-dsu.com site, you'll see some argumentation about this.
To add: The number of changes tends to be very small, as reported in the paper. We are talking 100-300 LOC even for applications that are 100 KLOC. And these changes are robust in the sense that once you retrofit to include them, you rarely need to make further changes of that sort -- new versions will just work.
If such states are preserved while the process that used them is still running then there is nothing to do: They will still be available to the updated program.
https://dl.acm.org/doi/abs/10.1145/3527322