How do these Rust kernel modules handle out of bounds access in an array? In a C...

tialaramex · on Sept 13, 2022

The Rust for Linux implementation converts a Rust panic into a Linux kernel BUG macro call. I believe this will expand to an invalid CPU instruction (at least on popular architectures), and if you're a kernel thread you die immediately with the kernel reporting the state where this happened. Obviously in some cases this is fatal and the report might only be seen via say, a serial port hooked up to a device in a test lab or whatever.

So, it's not a kernel panic, but it's extremely bad, which seems appropriate because your code is definitely wrong. If you're not sure whether the index is correct you can use get() or get_mut() to have an Option, which will be None if your index was wrong (or of course you could ask about the length of the array since Rust remembers how long arrays are).

megous · on Sept 13, 2022

BUG() will panic the kernel.

https://elixir.bootlin.com/linux/latest/source/include/asm-g...

I guess that's quite drastic for a checked out of bound access, when there's no actual memory safety issue and the compiler can simply return an error from the function, or do something else less drastic.

couchand · on Sept 13, 2022

In Rust code, if you're not able to locally reason that an array index is valid, it should be written with .get() and the None case handled appropriately.

It's impossible to claim there's "no actual memory safety issue" when a program's invariants have been broken: all bets are off at that point.

AshamedCaptain · on Sept 13, 2022

> It's impossible to claim there's "no actual memory safety issue" when a program's invariants have been broken: all bets are off at that point.

When the underlying _runtime's_ invariants have been broken, not when the program invariants have been broken. i.e. you can recover from almost everything save for a VM error in a VM language like Java, since there's no way for the program to mess up the VM's data structures in a way that they cannot be brought back to a defined state.

couchand · on Sept 13, 2022

    let innocent_var = operation_that_panics_but_returns_something_random_instead();

    unsafe {
        do_something_assuming_validity(innocent_var);
    }

megous · on Sept 13, 2022

Why allow indexed access at all if the compiler is emitting a conditional check anyway?

connicpu · on Sept 13, 2022

Because in the common case you assume that, if your code is correct, all of your indexing will be in bounds, but for memory safety reasons we need a bug to be reported if memory safety would have been violated. So we allow direct indexing with a panic on out of bounds because it's the most ergonomic for that common case

couchand · on Sept 13, 2022

I've come to believe ergonomics is a siren song here, mostly because recently I've been considering panics as forbidden as memory unsafety is... it's never okay for your embedded system or web server to panic, so don't act like that style is somehow preferable.

If you "know" the index is in bounds, you can get_unchecked. Otherwise you should get. Either would be a sane choice for the index operator.

megous · on Sept 13, 2022

The bug is not just reported here, the whole computer shuts down and all your unsaved work gets lost. That's not very ergonomic either.

nextaccountic · on Sept 14, 2022

Because it's convenient and familiar to most programmers. Not providing bounds-checked indexing makes some kinds of code very hard to write.

But note his problem also happens with integer division.

In Rust, a[x] on an array or vec is really a roughly a shortand for a.get(x).unwrap() (with a different error message)

Likewise, a / b on integers is a kind of a shortand for a.checked_div(b).unwrap()

The thing is, if the index ever is out of bounds, or if the denominator is zero, the program has a bug, 100% of time. And if you catch a bug using an assertion there is seldom anything better than interrupting the execution (the only thing I can think of is restarting the program or the subsystem). If you continue execution past a programming error, you may sometimes corrupt data structures or introduce bizarre, hard to debug situations.

Doing a pattern match on a.get(x) doesn't help because if it's ever None (and your program logic expects that x is in bounds) then you are kind of forced to bail.

The downside here is that we aren't catching this bug at compile time. And it's true that sometimes we can rewrite the program to not have an indexing operation, usually using iterators (eliding the bounds check will make the program run faster, too). But in general this is not possible, at least not without bringing formal methods. But that's what tests are for, to ensure the correctness of stuff type errors can't catch.

Now, there are some crates like https://github.com/dtolnay/no-panic or https://github.com/facebookexperimental/MIRAI that will check that your code is panic free. The first one is based on the fact that llvm optimizations can often remove dead code and thus remove the panic from a[x] or a / b - if it doesn't, then compilation fails. The second one employs formal methods to mathematically prove that there is no panic. I guess those techniques will eventually be ported to the kernel even if panics happen differently there (by hooking on the BUG mechanism or whatever)

vgel · on Sept 13, 2022

Some people have argued that indexed access is a wart, but it would be quite heavyweight to always have to unwrap an option when accessing a known-good index:

    let foo = [0_u8, 1, 2];
    foo[0].unwrap(); // really?

Instead, indexing on arrays / vecs is (essentially) sugar for .get(index).unwrap(), if you don't want the unwrap behavior use get. This is very similar to Python, though Python throws an exception which obviously isn't available to Rust.

couchand · on Sept 13, 2022

But in real code you never want unwrap, so why provide sugar for it?

sealeck · on Sept 13, 2022

In which case you can call the get method (which returns an Option - i.e. either the value or null) rather than indexing and return an error value.

throwaway894345 · on Sept 13, 2022

Would a kernel module be written as a normal Linux-targeting Rust program, or would it be more like a bare metal target with its own (user-provided) panic handler?

loeg · on Sept 13, 2022

More like the latter. Kernel modules don’t run in userspace.

nicoburns · on Sept 13, 2022

Worth noting that one doesn't need to use raw array accessing in Rust nearly so much as in C because you have things like iterators and for..in loops that will ensure correct access.

But I would assume it would be a kernel panic. It definitely won't be UB.

scoutt · on Sept 14, 2022

How does the Rust compiler handle, check bounds and ensures nothing bad can happen with a dynamic array that is passed to a Rust module from the Kernel, another C module or a C userspace program?

If an array and a length can be passed to a Rust module, I can just lie about it's size and, unless there is a runtime bound check (which can be slow), I guess bad things can happen too.

dcomp · on Sept 13, 2022

From what I understand a rust panic will just call BUG(). There is no support for unwinding as such.

Most likely you would have to use .get() which returns an Option rather than [] array index which panics.

ogoffart · on Sept 13, 2022

Exactly. A rust panic will call the panic_handler, implemented there: https://github.com/Rust-for-Linux/linux/blob/459035ab65c0ebb...

So accessing an array out of bound will have a runtime check that will call the panic handler, and that panic handler calls BUG() which means kernel panic.

hiimkeks · on Sept 13, 2022

You can also use .get(idx), which gives you either Some(data) or None in case of out-of-bounds access.

EugeneOZ · on Sept 13, 2022

You can catch the panic. It will panic, but I don't know if the driver will catch it. I hope so :)

kzrdude · on Sept 13, 2022

Can you, in kernel context?

EugeneOZ · on Sept 13, 2022

After comments below, I'm not so sure. I was talking about regular Rust, I didn't know that Linux Rust is patched. Sorry.

ncmncm · on Sept 13, 2022

This not actually substantively different from throwing an exception.

davidatbu · on Sept 13, 2022

I believe Rust in linux was made so that it never panics. Here's a patch that removes panicking allocations for example: https://lore.kernel.org/lkml/20210704202756.29107-1-ojeda@ke... (but I think all other instances of panicking were removed as well).

EDIT: Look at replies. "Linus considers panics acceptable in some cases".

tialaramex · on Sept 13, 2022

Linus considers it acceptable to panic for some programming mistakes, since after all the C code also blows up if you make some programming mistakes.

One I ran into (in the sense of read about, not experienced) was if I flatten a Vec of arrays, it's theoretically possible that the flattened structure has too many items in it to represent as a machine word integer. If this happens the flatten operation will panic.

This can't happen in a language like C (or C++) because their smallest type has size 1, so all the arrays can't possibly be bigger in total size than the amount of memory, that's nonsense. But Rust has two smaller sizes than this. The relevant one here is the Zero Size Type, Rust has no problem with the idea of an array of empty tuples, such an array could have say, a billion empty tuples in it, yet on a 32-bit system it just needs 4 bytes (to remember how many empty tuples are in it).

We can see that flattening a Vec of arrays of empty tuples is a pretty wild thing to choose to do, and nevertheless even if we do it, it only panics when the total amount of empty tuples won't fit in the integers of our native word size. But the function could be asked to do this, and so it might panic.

[ You might be wondering how can there be two sizes smaller than C's single byte types in Rust. The answer is the Never type ! and its pseudonym Infallible. The Never type is Empty, no values of this type are possible, so not only does Rust never need to store this type, it doesn't even need to emit machine code to handle this type - the code could never run. This makes sense in Generic programming, we can write Generic error handling code, but it evaporates when the error type was Infallible ]

davidatbu · on Sept 13, 2022

This is exactly the kind of stuff I come on HN for. Thank you!

ncmncm · on Sept 13, 2022

I presume it will still Oops...

davidatbu · on Sept 13, 2022

I'm not sure what you mean by "it will still Oops".

dogleash · on Sept 13, 2022

https://en.wikipedia.org/wiki/Linux_kernel_oops

davidatbu · on Sept 13, 2022

Gotcha. Thanks!

EugeneOZ · on Sept 13, 2022

Indeed, but it's much better than undefined behavior :)