Klint: Compile-time detection of atomic context violations for kernel Rust code

PaulDavisThe1st · on March 18, 2023

This is a fantastic example of the sort of real world problems encountered when writing "system software" that are not addressed by a language change. Yes, klint offers assistance, and it does draw in some limited sense on properties offered by Rust. But in essence it is a standalone tool that could also be implemented for C or other languages (it relies on annotations, just like a C version would).

This is not to deny the things that Rust does bring to the table, but it's cool to see how it cannot really address the central problem:

> How RCU is implemented in the Linux kernel lifts sleep in atomic context from "it's bad because it might cause deadlock" to "it's bad because it can cause use-after-free".

Klint doesn't provide a way to prevent code from doing the wrong thing, it only acts a checker that things have been done right (probably). That's cool (and very useful), but quite different from e.g. "you can't make this mistake in Rust".

tialaramex · on March 19, 2023

Rust offers at least one way to prevent this which would work today (sleep tokens) in the language, which was rejected by the person doing this work mostly because it seems like they thought it'd be a bother, I'd want to see that much better explored before I'm convinced - but then who am I to insist.

You can't attempt sleep tokens in C or C++ because their type systems have the off-by-one error, they assume the smallest type has at least two values and requires at least one byte of storage, whereas Rust acknowledges that types might have zero or one value, and so either be unrepresentable or require no storage at all. The latter makes a sleep token possible.

avgcorrection · on March 19, 2023

Zero-sized types really is an elegant concept.

mastax · on March 18, 2023

Yeah, the bespoke ways the kernel does things makes me wonder if a linux-kernel-specific language could be a good idea. Almost certainly not, but I think that Rust/Nim/Zig/Carbon/etc. have shown that it's not insurmountably hard get a new language up to standard nowadays, mostly thanks to LLVM.

> Klint doesn't provide a way to prevent code from doing the wrong thing, it only acts a checker that things have been done right (probably). That's cool (and very useful), but quite different from e.g. "you can't make this mistake in Rust".

I don't disagree, really, but I'm trying to get a more precise idea about the distinction. Let's pretend that klint makes all the improvements listed in "future work" and all kernel developers `alias rustc=klint`. Is it any worse than the borrow checker or the type system with respect to "only [acting as] a checker that things have been done right"?

PaulDavisThe1st · on March 19, 2023

I don't know of any bespoke solutions in the kernel that belong in a language.

RCU does not belong in a language. Spinlocks do not belong in a language. etc. etc.

These are things that belong in libraries/modules, which leads me to think that there's absolutely zero need for a (linux) kernel specific language.

tialaramex · on March 19, 2023

The kernel's extended BPF is in fact an entire virtual machine and so necessarily has a virtual machine language. Arguably it would be better if that software was written in a language intended to be translated into eBPF rather than in C, C++ or Rust compiled for the eBPF virtual machine.

EtA: Also, while the kernel does have some mechanisms which exist elsewhere, the kernel also has its own memory model, distinct from the C++ (and eventually C) memory model, because Linux has been here a long time and needed a model before C++ had one. So this means there's a conflict between what the ISO document says (based on its memory model) and how Linux works (based on its memory model).

So it might be easier on folks if the Linux memory model wasn't just a different memory model for the C language which already has a memory model. That applies just as well for Rust.

avgcorrection · on March 18, 2023

It’s cool to see? If it’s the same solution as what one would do in C, how is it “cool to see”?

Is having the same meal two times in the same day also “cool”?

PaulDavisThe1st · on March 18, 2023

It's cool (to me) because it highlights a point some people (including me) have made about the claimed benefits of using new languages.

JonChesterfield · on March 18, 2023

I didn't follow the transition from the type system to linter here. I've read that rust implements reference counting as a mixture of compile time and residual code for runtime, which suggests it should be able to do the reference counting the linter is doing from within the type system, raising an error if it would have to emit runtime counters for some control flow.

Why can't rust do this in the type system?

bonzini · on March 18, 2023

It's here: "People familiar with paradigms in Rust might also wonder if a token type, or some possible context and capabilities extension might help with this, but unfortunately it would not help this scenario. You can't do negative reasoning with token types thus a token-based approach would require almost all functions to carry tokens in their signatures".

In other words, arguments and the type system can be used to tell a callee what it can do, but not what it cannot do.

tialaramex · on March 18, 2023

I'm not convinced that "almost all functions carry tokens in their signatures" is a disaster given the desired result. I'd like to read more about why this way forward was rejected.

If I am trying to call X and I need a CanSleep token but I haven't got one, this prompts me to investigate why I haven't got one. Oh! I gave my CanSleep token to this Spin Lock. Why did the Spin Lock need my CanSleep token? Ah, because sleeping while holding a spin lock is death, that makes sense. I can get my CanSleep token back but only by releasing the spin lock, however I need this spin lock. Hmm. Aha! I should re-factor this code so that I just remember we need to call X, and I ensure I do it right after I have released the Spin Lock and retrieved my CanSleep token. Excellent.

bonzini · on March 18, 2023

It's not a disaster, it's quite inconvenient though and the effect would be the same as the lint that the article discusses.

tialaramex · on March 19, 2023

Can the inconvenience be mostly dissolved by a proc macro though?

Leveraging the type system scales better than ad hoc lints. Maybe later we need to keep track of several tokens, Rust has no problem making a Vec<CanSleep> and will optimise that exactly the way you expect (it's just a counter, the CanSleeps don't need storing anywhere) likewise a [Option<CanSleep>; 4] is just four bytes to track whether each element is None or Some(CanSleep).

the8472 · on March 18, 2023

> These kinds of mistakes are easy to make and hard to debug, especially when the sleepable call is deeply nested.

This means you either have to pass around permission-to-sleep tokens or you need additional function colors to make sure that it is enforced on the entire function call tree. The former might be possible with the lifetime trickery that chostcell does (I'm not certain, advanced type system hacks aren't my forte) but it's cumbersome. The latter would require an effects system which rust currently doesn't have.

JonChesterfield · on March 18, 2023

That helped a little, thanks. I can see how having an argument passed down the call tree is equivalent to colouring. Make sleep forms take the token, no token, no sleep. Maybe generic over counter embedded in the token, that's kind of ugly either way.

"Effects system" seems to mean different things in different contexts. I found a few rust specific blog posts but they didn't obviously apply to this. Probably distinct from dependent types.

The rust compiler has the counting machinery to do this, perhaps it's not surfaced to the language outside of built-in types (arc?). In fairness I can't think of many languages which are up for that sort of hooking into the compiler.

It looks to me like a callgraph walk accumulating some state on known function calls plus error on taking address-of the mutator functions would work for sufficiently static graphs, which is probably what you have for careful concurrent code.

Semantic linters running across llvm IR are probably useful things to have. Not really related to rust at that point though.

iudqnolq · on March 18, 2023

I think you were confused by someone who described that "counting machinery" at too high a level of abstraction. Compile-time counting is implemented in the compiler, runtime counting requires you to opt-in by wrapping the type in a special wrapper type.

This wrapper type requires two special language features: a way to run code when a type is "dropped" (think destroyed), and a way to have a type forward method calls to another type. But any regular code can use these features.

That lets you write a straightforward wrapper that forwards calls and manages the reference counting (increment on clone, decrement on drop). There's a bit more cleverness in the math, but it's all regular rust code. See <https://doc.rust-lang.org/src/alloc/rc.rs.html#15896>

This lack of special standard-library-only magic is a key part of rust's appeal to me. Those same features let the crate parking_lot implement a better mutex than the standard library, for example.

eqvinox · on March 18, 2023

> This means you either have to pass around permission-to-sleep or you need additional function colors

I wonder if it's possible to do this better. Somehow applying the type system onto the "execution context" feels like it would be the "ideal" solution here. From just 2 minutes of superficial thought, it would really be quite similar to permission-to-something tokens. Just a bunch of syntactic sugar on how it's done, plus making it easier to extend for additional tokens/colors/...?

ameliaquining · on March 18, 2023

I think you want something like https://tmandry.gitlab.io/blog/posts/2021-12-21-context-capa.... The Rust maintainers have expressed interest in this possibility, but I don't think there's anything concrete yet.

JonChesterfield · on March 18, 2023

That ^ is interesting in that it's (a syntax heavy version of) dynamic scoping with an implicit restriction that the call graph must be static. I.e. as described that does not work with function pointers (there is a todo reference to 'dyn', which looks like it means vtable dispatch, i.e. a special case of a function pointer call).

This suggests a general pattern given the above on memory allocation - some things are expressible on static call graphs. On dynamic call graphs, they need to either fail (could be at compile time) or induce a runtime cost.

yx827ha · on March 18, 2023

This sounds like something the Keywords Generic Initiative could help with. Maybe create a ?sleep keyword. https://blog.rust-lang.org/inside-rust/2023/02/23/keyword-ge...