Cranelift, Part 3: Correctness in Register Allocation

monocasa · on March 19, 2021

Tangentially, one of the most clever things about ebpf IMO is that it pushes register allocation out to user space and just verifies that register allocation (or more correctly the full data flow including the usage of the registers) doesn't break the VM guarantees. Then the JIT just has a fixed translation of ebpf registers to machine registers.

Even compcert (the formally proven compiler with gcc -O2 competitive code generation) essentially shells out to non proven ML and then verifies the result in contrast to the rest of the compiler design.

danobi · on March 20, 2021

Both parts (compiler backends & fuzzing) are immensely complicated stuff on their own. I can only imagine the trickery and head scratching that comes with the combination.

I’ve worked on simpler approximations of both [0][1] before and can attest even the simple parts are hard to get working reliably.

[0]: https://github.com/danobi/btrfs-fuzz

[1]: https://github.com/iovisor/bpftrace

phkahler · on March 20, 2021

>> Is this a hard problem? In fact, it is not only hard in a colloquial sense, but NP-complete

I'm probably mistaken, but I thought some years ago this got reduced to polynomial time via transforming the program to SSA form and doing some other magic.

Edit: here we go: https://dl.acm.org/doi/10.1007/978-3-642-37051-9_1

psykotic · on March 20, 2021

Just to clarify (and as the abstract says), that paper's result is not restricted to programs in SSA form but to structured programs.

The chordal interference graphs that arise from (not necessarily structured) programs in SSA form are optimally colorable in linear time. That approach is a solid basis for a register allocator but it's worth noting that it's not solving the same problem because the introduced phi nodes split the original live ranges at CFG join points, so you're not coloring the same interference graph the program had prior to SSA conversion. In practice a good register allocator already needs to have well-tuned heuristics for spill/fill/split/coalesce placement regardless of whether it uses SSA-based coloring, so the extra splits introduced by the SSA conversion are optimized as part of that process.

In the special case where your SSA program is structured and the code is presented to the backend in its natural order, the perfect elimination order for the chordal interference graph is just backward code order. As a result you can do optimal register coloring for such programs in a single backward pass (which can be integrated into a backward code generation pass if you're trying to go fast).

psykotic · on March 20, 2021

Since it's too late to edit my post: It's not just the phi nodes that correspond to splitting. A straight-line program can generate a non-chordal interference graphs if variables are assigned multiple times. During SSA conversion, each time a given variable is assigned in a straight-line program, its original live range is split at that point.

cfallin · on March 20, 2021

Indeed; footnote 3 on the sentence you quoted links to exactly that paper :-) This came up in the Reddit discussion on the post a few days ago. Specifically it works for reducible control flow. In the general case (irreducible control flow) I think it is still NP-hard, but I could be mistaken. (Regardless, the algorithms are nontrivial, hence the need for fuzzing!)

cwzwarich · on March 20, 2021

This is probably more along the lines of what you were thinking about for the approach based on SSA form:

https://hal-lara.archives-ouvertes.fr/hal-02102286/file/RR20...

felixge · on March 21, 2021

Lovely article. The symbolic checker approach is awesome. It seems trivial/simple in retrospect, but what a powerful testing approach for this problem!

foobarbecue · on March 20, 2021

I hadn't heard of Cranelift but I hope they are aware of this video: https://www.youtube.com/watch?v=gYpMz63WAjM