I haven't yet read the paper proper but this statement surprised me a little:
> In particular, ownership has forced us to avoid concurrency in the kernel and include more code than we would have liked in the TCB.
One of the strong points of Rust and its owenership system is that it allows writing concurrent code with greater confidence. I would have hoped that it'd encourage rather than discourage using concurrency.
I was surprised by this as well, so I checked their paper:
"When multiple threads need to share a resource, they can
pass ownership through channels. However, as we discuss in the rest
of this paper, it does not work well in systems that use non threadbased
concurrency models and when resources must be shared [...]
While hardware interrupts
are asynchronous, and therefore run concurrently with other kernel
code, in our operating system interrupt handlers enqueue tasks to
be run in the main scheduler loop, which is single-threaded. As a
result, on_receive and send can never run concurrently and no
data race is possible.
We overcome both issues by declaring most resources as static
variables and borrowing them with ’static lifetime. [...]
Being able to mutably borrow static variables multiple times
means our operating system can allow multiple drivers to reference
hardware resources or each other. However, doing so means we lose
the ability to detect real data races at compile time, for example,
if the kernel passes a shared resource to an interrupt handler that
may run concurrently. Instead, we resort to an overly conservative
convention in which interrupt handlers perform no work except
posting to the main scheduler’s task queue. An ideal mechanism
would, instead, allow for shared state between components that will
never run concurrently, but disallow sharing when they may."
> Instead, we resort to an overly conservative convention in which interrupt handlers perform no work except posting to the main scheduler’s task queue
This doesn't seem so bad. OS X does exactly this in order to minimise the time spent in interrupt mode, and so reduce scheduling jitter:
Rust's rules here aren't purely for multithreading, though. For example, even in a single-threaded context, disallowing mutable aliasing prevents things like iterator invalidation.
At the same time, I noticed this paper late last night, and just woke up, so haven't fully let it all sink in yet.
It's understandable that they're not proficient in Rust, given how new the language still is. :) In particular they weren't aware of the Cell type in the stdlib which allows zero-cost shared mutability for any type that's copyable, which may have addressed some of their problems. You're right in that this means that we need more teaching materials, which is an active area of effort!
The "new hardware protection mechanism called MPU available on some new ARM Cortex-M microcontrollers" is a device which lets the operating system define up to eight memory regions (sixteen on the M7), optionally with holes in them. In each region, permissions can be set for read, write, and execute, and the policy for cacheing and sharing between cores can be controlled (although this is presumably not really relevant for single-core microcontrollers).
They're used in lots of smartcards and embedded designs going way back. I designed, but didn't implement, similar extensions because virtual memory model was a complex mess I couldn't secure. Segments were simple and had been used for effective security before combined with isolation aspect of MMU's. MPU's were the logical step from that. Easy to implement with highly assured methods, too, due to simplicity. Not sure how far back they go but here's an analysis of issues done in 2004 that reference MPU advantages:
The documents from Gemalto, Infineon, etc for their EAL5+ or EAL6+ smartcards give details on how they do the high security variants of MPU's. Rockwell-Collins ACL2-based method can also be used given they built a whole processor (AAMP7G) with it.
* Wants const size_of closures (for statically allocating that many bytes). We just need RFC #1245 to get implemented (it has been accepted), and for someone to mark the size_of intrinsic as a const fn. However this might get into the weeds pretty bad since "size" is handled by trans (LLVM even). Dunno those details.
* Shared mutability of hardware registers should be able to be soundly handled by Cell.
This leaves the big issue:
The kernel has been architected as a single "main" thread, and a bunch of interrupt handling threads. All the interrupt-handling threads do is enqueue a message for the main thread to handle. The main thread then drives all the drivers off of this queue, so they all run on one thread. However they want to do some shared mutable stuff. In particular, I think they want closures that close over the same value mutably? RefCell isn't particularly acceptable because crashing is Super Bad, and they would like to avoid runtime checks anyway. They just found out about Cell, so it might work, but they seem to think this wouldn't be right.
You can make Cells pretty fine-grained. You can also consider /u/SimonSapin's proposed extension to Cell which has a replace method for non-Copy types so you could replace (say) an arbitrary Cell<Option<T>> with None temporarily through a shared reference.
Alternatively, it might be possible to manually thread the shared mutable state since it's all being driven by some top-level event-loop (as far as I can tell). This was actually an old std pattern: https://doc.rust-lang.org/0.11.0/std/collections/hashmap/str... (the A is "shared" closed state).
Interested to hear more about the finer details of the shared mutable state!
As someone who works on the innards of a cooperative multitasking embedded OS for a living, I am sick and tired of C and C++.
Closures is a huge one. Everything is async in HW land, an interrupt fires and calls you when data is ready, so doing a read looks like
MyOpInitialFunction()
{
// Business Logic
AsyncReadFromFlash(sDataBuffer, sizeOfBuff, sizeToRead MyOpFirstCallback, contextVar);
}
MyOpFirstCallback(byte* dataBuf, size_t sizeRead, void* context)
{
//Business logic, use data read in some how
AsyncReadFromFlash(sDataBuffer, sizeOfBuff, sizeToRead MyOpSecondCallback, contextVar);
}
MyOpSecondCallback(byte* dataBuf, size_t sizeRead, void* context)
// and so on and so forth
It gets boring really fast. The system has a nice DMA controller that we use a lot, so a similar pattern exists for copying large chunks of data around.
You can actually implement a variant of closures in C using Macros, and right now the entire team wishes we had done that! (It isn't hard, in embedded land you have complete memory control, you can just memcopy the entire stack down to a given known location and memcopy it back to resume!)
Rust has a lot of features that are really useful for embedded, the problem is that it is going to be a long time, if ever, that the embedded world switches over.
Embedded systems tend to have very little code space, so efficiency of generated code is a must. LLVM and even GCC aren't as efficient as ARM's (horribly overpriced, feature lacking) compilers. The one thing ARM's compilers do though is minimize code space, and they do a very good job at it.
The other issue is that any sort of language run time is questionable. Even the C Run Time is typically not included in embedded (teams opt out of using it), if you want printf you typically write whatever subset of printf functionality you want!
Oddly enough memory safety is not a real issue, though slightly better buffer overrun protection than what we get from static analysis would be nice!
Consumers hand over a destination buffer to Producers, who return it in a callback. Because of the above async pattern, the consumer isn't even running until the data is ready.
That's solved 99% of our issues. :) (If you want someone else's data, you ask for it and get it passed in a buffer you provide!)
(This is a world where v-tables are still a concern, they take up way too much code space if inheritance is allowed to get out of hand.)
Allocation is basically verbotin. This line of code is very real in embedded:
#define malloc assert
Back to being on topic, current C++ template, with futures et al., is horrible and may cause eye cancer, Microsoft's final code is obviously cleaner. (The counter proposal is much better written though, they cover a lot more information.)
Now, if I am in happy sappy land where I get my ideal languages, it'd look something like this:
CopyBuffers(//... assume stuff is passed in)
{
bytesRead = 0;
while(bytesRead < sizeOfFile)
{
error, bytesThisLoop = AsyncRead(fileHandle, &buffer, bufferSize);
if( error )
{ // do stuff }
if( bytesReadThisLoop == 0 ) // something has gone wrong, error should have caught it
{ // Something is wrong }
bytesRead += bytesThisLoop;
error, bytesWritten = AsyncWrite(destHandle, &buffer, bytesRead);
if( error )
{ // do stuff }
if( bytesWritten != bytesRead )
{ // Something may be wrong, depending on our write contract }
}
Someplace in AsyncWrite my entire state is captured and handed off to someone. The counter proposal suggests I handle the state myself, which is fine (but not as cool looking!)
Now on to said competing paper;
"Memory usage is exactly that required to represent the resumable function’s maximum stack depth,"
Which for a cooperative multitasking system, that is pretty well defined, you have a while(1) somewhere dispatching tasks! (Or an equivalent thread/fibers/etc manager.)
I agree that the "up and out" model leads to more boilerplate. Although I actually like requiring use of "await" to indicate that calling this function is going to pause the thread, but I'd like it used purely as an FYI to the programmer that is compiler enforced to be put in place. (I'm not a fan of surprises.)
They have an idea of a resumable object, which I think they take too long explaining, all coroutine systems have this, you have to at least copy the stack off to somewhere.
"Resumable objects are non-copyable and non-moveable"
Sort of sad, I can think of some very fun abuses, but since 99% of them would result in a crash... ;)
So overall, and this is the part I am not clear about, they want to restrict this to resumable expressions. From what I grok, that seems to mesh with my request for "await" being enforced. (though here it seems needed).
The final syntax at the end is a bit ugly, I am not a fan of out parameters. Ignoring that, it may be less work overall on the programmer to get the base case up and running, I don't want to have to wrap all my async functions in adapter classes, it is nice if the compiler does it for me.
The more complex cases get ugly though. I'd like to see a really simple non-template example of just how to read from a file from the consumer's perspective.
IMHO that is 90% of the problem space for async. "IO is slow, get back to me when you're done."
In some ways embedded is easier, in others much harder. Doing it all on your own is hard, but if you have a good team backing you I think it is actually easier than trying to write a large desktop app in native.
No OS to get in your way. No decades of legacy APIs to clog things up. Because the entire world is understandable, no need to be concerned about someone else's thread ruining your day.
All your I/O channels are well measured and have know latency and throughput. If you are lucky enough enough to work with SRAM, then you even get a nice constant access speed to memory.
Interrupt routines are a fancy way to say "callback" and Direct Memory Access is a nice way to say "memcopy without CPU overhead".
That said, the initial getting up and running part is hard, that is why lots of pre-existing systems with their own little runtime exist.
Re: ripping out libc, the folks at https://gandro.github.io/2015/09/27/rust-on-rumprun/ seem to have had success in replacing Rust's libc usage with their own restricted implememtation, and have expressed interest in providing a fork of libstd that bypasses libc entirely and just does raw system calls (which itself probably wouldn't be useful for your purposes, but at least proves the concept).
Indeed, and while I think their specific proposal is quite unlikely to end up in the language proper (it's a lot of machinery for a very specific purpose, and still isn't quite "safe" by Rust's definition), it's great food for thought for future enhancements to the language (such as making size_of a const function so that it can be evaluated at compile time).
There's been quite a bit of work at securing embedded systems with safer languages (esp Java) or MPU's (esp smartcards). Gemalto's EAL7 JavaCard work, MULTOS, IBM's Caernarvon... all made stuff to enforce strong protection w/ similarly, rough constraints. Usually no MPU although some smartcards had them with some public data on design or proper usage. I recommend the authors dig up stuff on systems like that to find strategies for countering various issues.
Most that I saw handle resource and security decisions via type system, safe transformations to bytecode, and safe handling of that. Fast paths are sometimes implemented in assembler with safety put into HLL or VM interface to that... if vendor felt like it. (shakes head) Where this didn't work, there were coding rules or frameworks for carefully designing and decomposing the system to manually prevent/catch issues. Most do all of this on development machine with target system just blindly accepting or trusted booting a signed binary representing system or apps. Cuz what else can you do with such constraints... (shrugs)
Far as specific hardware or Rust issues, I can't help as I don't work with those.
Basic question — when you're writing a kernel in Rust with its own threading implementation, how does the rust compiler even know what threads are? It has to identify thread boundaries for send/sync, right?
The compiler has no concept of what a thread is anywhere, even in the standard library. The safety is imposed by careful bounds on exactly the right functions: those that interact/create new threads, e.g. std::thread::spawn has signature:
Other than implementations of `Send`/`Sync` themselves, this is essentially the only function in std that needs to bound by them.
The same approach can be used in a kernel context, e.g. place bounds on the functions that create (or, at least, initialise interaction between) concurrently executing pieces of code.
> In particular, ownership has forced us to avoid concurrency in the kernel and include more code than we would have liked in the TCB.
One of the strong points of Rust and its owenership system is that it allows writing concurrent code with greater confidence. I would have hoped that it'd encourage rather than discourage using concurrency.