>In 2017 alone, 434 linux kernel exploits where found, and as you have seen in this post, kernel exploits can be devastating for containerized environments. This is because containers share the same kernel as the host, thus trusting the built-in protection mechanisms alone isn’t sufficient.
More than one kernel exploit _per day_. Exploiting Linux is just a matter of finding one such vulnerability and using it. This can be done in a single day.
There's just no fixing megabytes of buggy kernel code.
It really drives home the need for a proper OS based on a verified, capability-enabled microkernel such as seL4.
I'll surely get a lock of flak for this, but these kinds of bugs would be trivial to avoid in C++. All you need is to make the pointer arguments to syscalls be some other data type (say, user_ptr<T>) that performs an access-check upon conversion to a raw pointer. Then the compiler simply wouldn't let you bypass the access-check, so you simply could not forget to do so. That's the fundamental difference between C++ and C: one of them actually lets you write code that cannot contain many classes of mistakes, and the other, well, doesn't. For the life of me I don't understand the stubbornness behind sticking to the same languages and tools from decades ago.
Well, the exploited code used unsafe interfaces in unsafe manner. It is effectively equivalent to calling something like reinterpret_cast<T*>(user_ptr.get()) to bypass the safety provided by the compiler. How do you avoid that with C++ alone? I guess you will need some external static analyzer. Linux kernel does have one: Sparse. IIRC, it can report casting out __user declarations from pointers.
As for stubbornness, C can be used safely with proper discipline. Kernel development does require a certain amount of experience and discipline, so arguably C can be used by kernel developers in mostly safe manner. That's why some view it as a feature: if you don't have the required discipline then just stick to userspace development.
Thanks for the explanation! __user and sparse seem to be almost exactly the kinds of tools I had in mind, with the caveat that __user would be the default for a pointer argument to a syscall, so that it wouldn't need to be specified.
I'm not sure I understand what interface you're referring to that was "unsafe" and subsequently "used in an unsafe manner". What is the "unsafe interface" here that was being used in an unsafe manner? It seems to me that the problem was that the pointer was not marked as __user? Which is awful, because shouldn't __user be the implicit default behavior for a pointer argument to a syscall? Why should the default behavior be the unsafe one you pretty much never want?
System call pointer arguments are usually marked with __user annotations (I cannot recall if there are some weird calls that may need a kernel pointer, none should need it, but there may be some legacy one). In particular, the infop argument to waitid() is marked as user-space pointer [1].
Before using a pointer to user-space one should check if access_ok() to it. The usual safe interfaces — copy_{to,from}_user(), put_user(), get_user() — always perform this check and fail with an error if the pointer is not an okay user-space pointer.
The commit that introduced the vulnerability [2] replaced the safe interface with unsafe ones, possibly for performance improvements. The code used put_user() function to set individual fields of a struct. Multiple calls to put_user() were replaced with multiple calls to unsafe_put_user() which does not perform access_ok() check every time. A check for NULL pointer was added before the stores. unsafe_put_user() still checks whether the address points to an actually mapped memory location, but does not verify whether the location is in user-space.
The commit was not really discussed in-depth on LKML [3] as it came from Al Viro who should know better, is one of the Sparse maintainers, etc. Some projects require human justifications for any usage of unsafe interfaces during code review (like, flagging a review with 'needs-check' or something that requires a sign-off by another human that the unsafe thing is actually safe). This may have been the case where it could matter, as the static analysis tool should not produce bogus warnings for interfaces which are designed to perform unsafe stuff. Though, it may also be useful to add a check to Sparse which will verify that unsafe_{get,put}_user() calls are preceded by an access_ok() call in the same function.
According to Linus, programmers who prefer C++ are so bad that he would have chosen C solely to avoid dealing with their "total and utter crap" code, and C++ is only good for kernel development if you limit yourself to a C-like subset anyway [1].
Only if you're lucky. Most of these exploits probably took weeks to find and analyze properly, it's not like one person found more than one a day. They're found because whole teams are working with the linux kernel at the same time and either happen by them or actively look for them.
More than one kernel exploit _per day_. Exploiting Linux is just a matter of finding one such vulnerability and using it. This can be done in a single day.
There's just no fixing megabytes of buggy kernel code.
It really drives home the need for a proper OS based on a verified, capability-enabled microkernel such as seL4.