The Jury Is In: Monolithic OS Design Is Flawed [pdf]

pweissbrod · on Aug 15, 2018

I wonder if there exists a parallel dimension where linux is microkernel design and folks are pushing for monolothic citing the driver friendliness and performance

andrewstuart2 · on Aug 15, 2018

If parallel dimensions exist, then it's most certainly one of the closest dimensions to ours.

If I'm sure of one thing, it's that as soon as we decided to build everything as Microkernels, we'd have the same squeaky wheels touting the massive benefits of Monotlithic OS design. We're hilariously cyclical in our preferences.

"Think of all the runtime efficiencies of the shared memory, and how much easier it would be to develop on the shared codebase!"

KMag · on Aug 15, 2018

The dirty secret is that we take a microkernel, rename the syscalls as "upcalls", throw in some hardware emulation code as a less efficient and uglier API for software that can't be bothered to be ported, and call it a hypervisor.

Also, your phone's baseband processor almost certainly runs a microkernel, and likely so does your car.

mrsteveman1 · on Aug 15, 2018

Any recent iPhone/iPad, Apple Watch, the iMac Pro, and Macbook Pro touch bar, all contain Apple's "Secure Enclave", which runs the L4 microkernel:

> The Secure Enclave runs an Apple-customized version of the L4 microkernel. This microkernel is signed by Apple, verified as part of the iOS secure boot chain, and updated through a personalized software update process.

https://www.apple.com/business/docs/iOS_Security_Guide.pdf

monocasa · on Aug 15, 2018

I wonder which L4 they use. L4 is more a family than a single kernel. The different variants are essentially totally different kernels from the same school of thought.

namibj · on Aug 15, 2018

For that security I'd assume seL4, though I did not see anyone obviously apple on the devel mailing list...

monocasa · on Aug 15, 2018

Looking into it more, it looks like a heavily modified fork of L4Ka::Pistachio. That's pretty fun, I may have to go exploit hunting this weekend.

https://www.blackhat.com/docs/us-16/materials/us-16-Mandt-De...

bryanbuckley · on Aug 16, 2018

Same for many Android phones and AMD chips with PSP (trustonic's l4 forked OS).

pjmlp · on Aug 15, 2018

And what many seem to be unaware, is that project Treble made Android Linux into a microkernel where drivers use Android IPC to talk with the kernel.

https://source.android.com/devices/architecture/kernel/modul...

https://source.android.com/devices/architecture/hidl/

So everyone running Android Oreo or newer on their phones, not only has a microkernel on their basebase radio, they also have a Linux tamed into a microkernel on their main ARM SOC.

ysleepy · on Aug 15, 2018

I don't see a microkernel architecture here. Requiring drivers to be loadable modules is just arriving in the 1990s of monolithic kernels. Also, only new devices with SoCs introduced for Android 9 are required to use kernel modules and kernels newer than 3.18. [1]

This means most (updated) Android 8 devices in the field are not Project Treble and are running old-style module-less kernels and don't have A/B system partitions.

HALs only seem to be there to abstract over the APIs of different hardware implementations and access to them. Not necessarily their drivers.

I would love to be wrong about this, so I would really appreciate evidence to the contrary here.

[1] https://source.android.com/devices/architecture/kernel/modul...

pjmlp · on Aug 15, 2018

Because you just stopped at the kernel modules part and haven't spent time reading how HAL in Android actually works, starting at the HIDL link.

There are two HALs in Android.

The old HAL, previous to Treble, which was hardly used by OEMs.

And the new HAL, which is enforced by Treble.

On the new HAL, drivers are implemented as Android services using Android IPC via interfaces defined in HIDL, or by using the new shared memory APIs introduced in Oreo as well.

https://source.android.com/devices/architecture/hidl/service...

https://source.android.com/devices/architecture/hidl/binder-...

https://source.android.com/devices/architecture/hidl/memoryb...

https://source.android.com/devices/architecture/configstore/

ysleepy · on Aug 15, 2018

I actually looked into the HAL/HIDL docs before posting and found no evidence that they are there to implement drivers in userspace.

As I understand it HALs and HIDL are used to provide a standardized way to implement new device features in a compatible way. So a vendor introduces a odor-sensor and can define an HIDL interface for the userspace to call the device driver provided in a kernel module.

I do not see a requirement to implement drivers in userspace, nor provisions for it, like interrupt handling, i2c/spi access or similar.

pjmlp · on Aug 15, 2018

HIDL is the basis of Android IPC between processes, known as Binder.

"Binderized HALs. HALs expressed in HAL interface definition language (HIDL). These HALs replace both conventional and legacy HALs used in earlier versions of Android. In a Binderized HAL, the Android framework and HALs communicate with each other using binder inter-process communication (IPC) calls. All devices launching with Android 8.0 or later must support binderized HALs only.

Passthrough HALs. A HIDL-wrapped conventional or legacy HAL. These HALs wrap existing HALs and can serve the HAL in binderized and same-process (passthrough) modes. Devices upgrading to Android 8.0 can use passthrough HALs. "

https://source.android.com/devices/architecture/hal-types

Here are the user space driver APIs for i2c/spi access on Android Things.

https://developer.android.com/reference/com/google/android/t...

ysleepy · on Aug 17, 2018

Android Things is not included in normal Android as it seems. Android Treble/9 is just not even close to a microkernel.

It if was, there would be documentation on how to write a userspace device driver, but there isn't.

monocasa · on Aug 16, 2018

Binder allows for endpoints in kernel space.

pjmlp · on Aug 17, 2018

Of course, otherwise how would they support passthrough legacy HALs for devices upgrading to Oreo as described on the documentation?

Interprocess RPC is only for Oreo and newer.

monocasa · on Aug 17, 2018

The current HAL works via dynamic libraries that get loaded into your process and talk to kernel space with a platform specific API. The passthrough support is just opening both sides in the same process, loading those legacy libraries, and wrapping the libraries in the treble API. So it actually uses IPC where both sides are in user space. That's what they mean when they describe it as 'in process' in the documentation.

Going forward, I expect the vendors to modify their kernel drivers to export the treble API over binder IPC directly while in kernel space.

So I wouldn't be surprised if it ends up that interprocess RPC only ebds up getting used in systems before Oreo.

scintill76 · on Aug 15, 2018

I'll admit to not being steeped in the terminology, so feel free to educate me.. but some quick Googling suggests nobody else has called Treble a "microkernel." There are hits about Fuchsia (a totally separate OS), and some Android forks on microkernels.

I'm guessing that the Treble modules expose the traditional /dev and /sys interfaces like before, and these new HALs talk to devices through those, right? Is that not still a runtime-linked monolith, with fairly-thin userspace services between the kernel and the framework?

The bulk of driver code logic (e.g. interfacing with a touchscreen controller) is probably still running in the kernel - show me if I'm wrong. I would think this would be how you define a microkernel. One exception may be the GPU -- those drivers are often very thin in the kernel because of GPL, with a fat userspace library that they can legally keep closed-source.

pjmlp · on Aug 15, 2018

They use Android IPC, which is a kind of RPC between processes.

https://source.android.com/devices/architecture/hal-types

ysleepy · on Aug 17, 2018

This is exactly how I understand it.

snaky · on Aug 16, 2018

Recently committed Gasket (Google ASIC Software, Kernel Extensions, and Tools) kernel framework is going even further in that direction.

https://lwn.net/Articles/758745/

> Could allow for APK installable kernel drivers, allowing modules to be upgraded (even live)

https://twitter.com/MishaalRahman/status/1029064974805688320

pjmlp · on Aug 16, 2018

Interesting thanks for sharing. Looking forward for when it pops up on AOSP.

snaky · on Aug 16, 2018

If it pops up on AOSP.

bitwize · on Aug 15, 2018

So, MkLinux 2?

agumonkey · on Aug 15, 2018

and according to someone, your Nintendo console too

It's quite crazy to count all the layers

    cpu microcode
    cpu isa
    hypervisor
    os kernel
    userspace api
    browser runtime
    js vm
    application
    your mouse

I want my 386 back

pjmlp · on Aug 15, 2018

If you mean the Switch, yes it is a microkernel, so fast enough to play all those flashy Vulkan/NVN games.

https://media.ccc.de/v/34c3-8941-console_security_-_switch

etatoby · on Aug 16, 2018

Soon it will be:

    ...
    webassembly vm
    (insert variable height custom stack)
    application
    ...

Sad times.

agumonkey · on Aug 16, 2018

Oh I forgot about wasm. Maybe one day someone will just collapse the whole thing and embed the internet in silicon.

mpax · on Aug 16, 2018

"This, Jen, is the internet."

https://youtu.be/iDbyYGrswtg

msla · on Aug 15, 2018

https://utcc.utoronto.ca/~cks/space/blog/tech/HypervisorVsMi...

> Microkernels are intended to create a minimal set of low-level operations that would be used to build an operating system. While it's popular to slap a monolithic kernel on top of your microkernel, this is not how microkernel based OSes are supposed to be; a real microkernel OS should have lots of separate pieces that used the microkernel services to work with each other. Using a microkernel as not much more than an overgrown MMU and task switching abstraction layer for someone's monolithic kernel is a cheap hack driven by the needs of academic research, not how they are supposed to be.

> By contrast, hypervisors virtualize and emulate hardware at various levels of abstraction. This involves providing some of the same things that microkernels do (eg memory isolation, scheduling), but people interact with hypervisors in very different ways than they interact with microkernels. Even with 'cooperative' hypervisors, where the guest OSes must be guest-aware and make explicit calls to the hypervisor, the guests are far more independent, self-contained, and isolated than they would be in a microkernel. With typical 'hardware emulating' hypervisors this is even more extremely so because much or all of the interaction with the hypervisor is indirect, done by manipulating emulated hardware and then having the hypervisor reverse engineer your manipulations. As a consequence, something like guest to guest communication delays are likely to be several orders of magnitude worse than IPC between processes in a microkernel.

People never seem to want to admit that sometimes, technology just dies, or at least becomes obscure, and so will invent bizarre "connections" between what we used to be working on and what we have now.

nickpsecurity · on Aug 15, 2018

OKL4 was both a microkernel and hypervisor. They coined term microhypervisor to differentiate that.

https://microkerneldude.wordpress.com/2008/04/03/microkernel...

NOVA is an open-source micro-hypervisor available in GenodeOS that similarly combines hypervisor functionality with a microkernel-like design:

http://hypervisor.org

The Xen hypervisor also got many of its design elements from the Nemesis OS, which had a lightweight kernel. There's plenty of overlap possible. A number of research and production systems are also both hypervisors and microkernel-like, too. So, it's more an existing concept than a bizarre, hypothetical connection. For OKL4, also deployed in over a billion phones they claim.

msla · on Aug 16, 2018

The fact people disagreed by downvoting rather negates your points.

KMag · on Aug 16, 2018

Your second paragraph is exactly what I meant by adding hardware emulation code to create a less efficient and uglier API.

The hardware-like interface to the hypervisor is both inefficient and not very abstract. For instance, a guest kernel attempts to perform a series of manipulations of the page table, each of which traps to the hypervisor, or else it performs an equivalent upcall to manipulate the page tables.

The hardware emulation code ads a lot of potential bugs to the hypervisor. I hope our current hypervisors are an evolutionary transitional form and in the future we'll interact with hypervisors solely through upcalls.

bena · on Aug 15, 2018

I think it's just a matter of being deep in the weeds on whatever thing we're currently invested in.

When we choose a direction, we see all the difficulties that path provides and kind of forget about all of the things it gives us.

And when looking at the other option, we see all the things that would overcome our difficulties while assuming that we get to keep everything good from the current path.

Everything is a trade off.

gregmac · on Aug 15, 2018

This isn't specific to software: The grass is always greener on the other side.

ngrilly · on Aug 16, 2018

Reminds me of the TV show "Counterpart"!

lainga · on Aug 15, 2018

the Hurd-verse lives! Richard Stallman is a clean-shaven, foulmouthed autocrat, and Linus is composing folk songs about joining hands with Intel and Nvidia!

tomcam · on Aug 15, 2018

I think you mean Gnu/Linus

theandrewbailey · on Aug 15, 2018

Or, as I've recently taken to calling him, GNU plus Linus.

stcredzero · on Aug 15, 2018

Who is dressing up as a monk?

hinkley · on Aug 15, 2018

Corey Doctorow. And Eric Raymond wears horn-rimmed glasses.

SkyMarshal · on Aug 15, 2018

It would be the dimension where Andrew Tannenbaum licensed Minix3 under a Free license back before Linus hacked up his own monolithic kernel version.

pjmlp · on Aug 15, 2018

Lack of performance on microkernels is a myth nowadays.

QNX and many embedded OS, some of which driving high integrity software, are all microkernel based.

Including the one most likely handling the real time communication of this mobile radio.

monocasa · on Aug 15, 2018

It's not a myth. Real time doesn't mean fast, it just means deterministic.

And I'll throw out there that many times the the term microkernel in a lot of embedded OSs has been contorted by marketing speak into something unrecognizable. Basically if you have multiple threads in the kernel, structure the kernel code into modules (but perhaps don't even allow dynamic loading of modules), and can communicate through async queues, then it's called a microkernel. For instance VXWorks until recently didn't have memory protection between tasks, but still called itself a microkernel.

naasking · on Aug 15, 2018

> Real time doesn't mean fast, it just means deterministic.

Strictly speaking, "hard realtime" means strictly bounded latency. Full determinism is overkill.

dkersten · on Aug 15, 2018

> It's not a myth.

As was posted elsewhere, they seem to be fast enough for games on the Nintendo Switch: https://news.ycombinator.com/item?id=17768537

monocasa · on Aug 15, 2018

I mean, on the switch, the filesystem is incredibly slow, and the GPU drivers use a ported version of Linux's Nvidia DRM drivers that runs most of the driver in the same process as the user code. So the only place where they've really optimized for performance between subsystems, they took a more exokernel like model.

snaky · on Aug 16, 2018

Quote from Linus, 2003

>><rant>

>>Why do the file systems have to be so tightly integrated in the "ring0" core? This is one subsystem that screams for standard callouts and "ring1" level.

>></rant off>

> Because only naive people think you can do it efficiently any other way.

> Face it, microkernels and message passing on that level died a long time ago, and that's a GOOD THING.

> Most of the serious processing happens outside the filesystem (ie the VFS layer keeps track of name caches, stat caches, content caches etc), and all of those data structures are totally filesystem-independent (in a well-designed system) and are used heavily by things like memory management. Think mmap - the content caches are exposed to user space etc. But that's not the only thing - the name cache is used extensively to allow people to see where their data comes from (think "pwd", but on steroids), and none of this is anything that the low-level filesystem should ever care about.

> At the same time, all those (ring0 - core) filesystem data structures HAVE TO BE MADE AVAILABLE to the low-level filesystem for any kind of efficient processing. If you think we're going to copy file contents around, you're just crazy. In other words, the filesystem has to be able to directly access the name cache, and the content caches. Which in turn means that it has to be ring0 (core) too.

> If you don't care about performance, you can add call-outs and copy-in and copy-out etc crap. I'm telling you that you would be crazy to do it, but judging from some of the people in academic OS research, you wouldn't be alone in your own delusional world of crap.

> Sorry to burst your bubble.

JdeBP · on Aug 15, 2018

vezzy-fnord had a lot to say on that subject.

* https://blog.darknedgy.net/technology/2016/01/01/0/ (https://news.ycombinator.com/item?id=10824382)

* https://news.ycombinator.com/item?id=10301375

monocasa · on Aug 15, 2018

Wow, that's a gish gallop if I've every seen one.

But hey, it's lunch and why not.

1) Microkernels are defined by their small size

> In summary, the microkernel provides mechanisms corresponding to hardware features. It doesn’t provide services, just fundamental mechanisms. In particular, it does not duplicate OS services. This misunderstanding was one of the causes for the failure of first-generation microkernels. The OS community understands this.

I'd argue that exokernels fit this model much more effectively. I love XOK's kernel level support for what we'd now call DPDK with BPF (they had user space multiplexed packet queues with an in kernel vm doing filtering). And it's capability based file system cache was an amazing way to address something that no microkernel has tried to, and I've never seen anyone try to replicate anything like that yet. In fact exokernels became so good at multiplexing hardware that Xen came out of the exokernel research.

2) Microkernels are unperformant

He does what most microkernel fans do, which is regurgitate a bunch of papers from the early 90s about how if you use a 486 or early RISC (where trapping to kernel space is only as costly as a branch mispredict) then it's not that expensive. Well guess what, it's not 1992 anymore, trapping to kernel space is on the order of 1000 cycles these days instead of 5, and even outside of microkernels, that bridge has gotten so expensive that you see people not even using the kernel anymore, but instead running drivers inside the user process (see DPDK).

And yes there's a lot of hand waving about FUSE, but nobody claims that FUSE is performant solution just that the slow model that can't crash the kernel is useful in a lot fo cases.

The QNX stuff is a little disingenuous as it's essentially marketing documentation. Yes, if you push everything through a pipe on unix, you won't get the best speed. Let's compare QNX to the Unices performant solutions (ie. mmap and shared memory). Yes they made some microbenchmarks faster, but most of that is viewed as control plane rather than data plane stuff on Unix.

I'll extend a peace offering here and say that a lot of the microkernel is unperformant was a reflection of gen 1 and 2 ukernels like mach that had to do a permission check on every port access. More modern system (specifically thinking about the L4 variants here) have a capability based model that needs far fewer checks at runtime. However what I said about how traps into the kernel have gotten more expensive still hold true.

3) Microkernels are a diversion, because a userland server failure will be just as catastrophic as a kernel failure anyway

Monolithic kernels typically have the same layering internally that Minix calls out. It's about as easy to add layers to linux block device structures as it is on minix, including all of the checksumming, etc.

And neither linux nor minix protects you from logical bugs in filesystems screwing up anything and everything. We like to think of the kernel in a ukernel as being totally separate and pure from the FS and block device drivers, but it had to be loaded from somewhere...

IMO, the monolithic/microkernel dichotomy here is a false one. Neither really address the issue per se, we need to move to type safe machine checkable languages. A recent example of microkernels screwing this stuff up was the original exploit chain that opened the nintendo switch.

4) Microkernels turn well understood memory-protection problems into poorly understood communication problems

He just spends the whole time here handwaving away how difficult distributed systems are because the internet works a lot of the time. Huge red flag for me.

5) Microkernels are a bourgeois plot to undermine free software

I used to write off what he's arguing against here, but I'm really coming around to the idea. Google's Zircon in Fuschia seems specifically designed to remove GPLv2 requirements from systems like ChromeOS and Android. Seems like a net loss for software freedom.

6) Hah, those stupid fucks are running Linux on top of their microkernel! What happened to microkernels being so great, fags?

(Ignoring the slur for the purposes of this discussion) he has a decent argument here, but I'd like to see the argument compared to containers where spinning up new OS spaces is even cheaper than a whole VM for a personality. I think a blend of the techniques a la the NtPicoProcess stuff that makes WSL here is going to be the winner here more than current ukernel design where that code runs in a different address space.

7) If microkernels are so great, why is nobody using them?

This is where he hits that disingenuousness I bring up in my parent post.

TRON isn't a microkernel, hell it doesn't even require an MMU.

OKL4 loves bringing up how basebands they're running on, but how many codebases is that?

Same with MINIX.

Done for now, haver to get back to work

naasking · on Aug 15, 2018

> 3) Microkernels are a diversion, because a userland server failure will be just as catastrophic as a kernel failure anyway

Yeah, no. A kernel has access to all of the data on the machine. A compromised userland is much more strictly limited.

Certainly if a compromised disk driver exposes a lot more data, not so much if you compromise your video driver. Compromising either in a monolithic kernel provides the same level of access. Strong isolation boundaries strictly limit risk.

> 6) Hah, those stupid fucks are running Linux on top of their microkernel! What happened to microkernels being so great, fags?

Driver support is always an issue. I don't see the problem.

monocasa · on Aug 15, 2018

So throwing out there that the numbered items are copied directly out of what I'm replying to.

> Certainly if a compromised disk driver exposes a lot more data, not so much if you compromise your video driver. Compromising either in a monolithic kernel provides the same level of access. Strong isolation boundaries strictly limit risk.

About the only part of the video drivers that runs in kernel space even on monolithic kernels these days are the drivers for the GPU's MMU. You screw that up and you're right back to where you started, corrupting random memory without regard for protection boundaries regardless of where that code lives. Everything else runs in user space, but not in an isolated process like on a ukernel, but directly in user process code, more like an exokernel.

> Driver support is always an issue. I don't see the problem.

My point underneath was comparing to other paravirtualized kernels was a best case comparison for ukernels, and zones/jails/containers has added a new more performant option for them that doesn't look as good for ukernels.

nickpsecurity · on Aug 15, 2018

Blackberry Playbook was QNX-based. Performed better than iPad:

https://www.youtube.com/watch?v=PYZDl4RNEVE

Maybe they had some stuff in kernel mode in there. They still show you can start with a microkernel as the baseline then optimize from there retaining the architecture with its benefits in average case or most cases.

ahartmetz · on Aug 15, 2018

The BB10 phones were just as amazing as the Playbook - the UI was oh so smooth and responsive. I'd still be using my Q10 if it had more native apps or better Android compatibility.

nickpsecurity · on Aug 16, 2018

One problem with Android is little delays that happen while typing. I thought a QNX-based platform could avoid that if they made sure UI parts were given adequate time slices. Was there any typing lag in those products?

ahartmetz · on Aug 16, 2018

I'd have to try it to say for sure, but it felt like everything was 60 fps all the time. Additionally, the navigation scheme was very quick and efficient, too.

Vogtinator · on Aug 15, 2018

"Fast enough design" does not equal "faster design".

KMag · on Aug 15, 2018

Particularly if you're doing kernel bypass for performance reasons, the smaller cache footprint of a microkernel is an advantage.

When I was in college, the first assignment was to run a provided benchmark program that jumped all over a gradually growing buffer and graphed average latency vs. buffer size and we had to identify the sizes of thle various caches from the generated graphs. I was tripple-booting Linux Windows, and QNX at the time and did the same assignment on each OS (identical hardware). QNX had more sudden transitions between latency levels due to delaying transition later due to smaller cache footprint.

If your userspace NIC driver and TCP/IP stack are implementee as a library and you are running your high performance application on a dual NIC box, your application can be the driver for one of the NICs and the other NIC can handle everything else. It's hard to beat zero copy/zero context switch I/O. Of course, putting the application in the kernel would yet be faster, at least if in both cases the kernek's cache footpint is tiny.

tropo · on Aug 15, 2018

Cache footprint of the OS shouldn't matter because the system should be otherwise idle and benchmark shouldn't be interacting much with the OS. The OS mostly doesn't run during the benchmark.

Assuming you didn't have those mistakes, the difference was likely due to memory allocator choices. Scattering the pages of memory is very different from laying them all out one after another. Scattered pages will cause benchmark results to be less predictable, with evictions seeming to happen at random. QNX probably made one great big allocation, in both the virtual and physical address spaces.

KMag · on Aug 16, 2018

The benchmarks weren't particularly well written, using lots of syscalls for timing instead of RDTSC. The impact was small compared to cache size, and the benchmark worked perfectly fine for education.

I noticed a later filesystem simulation benchmark for the same class ran faster when I moved my mouse. After a few minutes of investigation, I submitted a patch to prefer /dev/urandom to /dev/random on hosts that had /dev/random.

roblabla · on Aug 15, 2018

No, but fast enough, more secure, more flexible, easier to understand design tends to equate better design.

Sporktacular · on Aug 16, 2018

And in that dimension there would be debate about whether we should cut corners by reducing security and reliability to slightly improve speed and mollify lazy programmers.

That debate would be short and the answer would be "nope, we shouldn't because that would be stupid".

snaky · on Aug 16, 2018

https://yarchive.net/comp/microkernels.html

edit added

> Guys, there is a _reason_ why microkernels suck. This is an example of how things are _not_ "independent". The filesystems depend on the VM, and the VM depends on the filesystem. You can't just split them up as if they were two separate things (or rather: you _can_ split them up, but they still very much need to know about each other in very intimate ways).

https://yarchive.net/comp/linux/user_space_filesystems.html

MisterTea · on Aug 15, 2018

You mean a parallel universe where GNU/HURD was actually finished.

nickpsecurity · on Aug 15, 2018

I dont know haha. I do know most microkernels in commercial space support POSIX or Linux in user mode. OKL4 and L4Linux also supported using a minimal version of Linux just to get its device drivers. Then, a native app or other VM use them with virtual drivers.

JdeBP · on Aug 15, 2018

This would be the world of Windows NT in between version 3.5 and 4.0.

jacquesm · on Aug 15, 2018

The jury was in in the mid 1990's, but Linus Torvalds doesn't know when he's wrong and to listen to his betters. Linux succeeded because of its community, not because of its architecture. QnX has shown the strength of microkernels for decades, they are far more stable and much easier to work on than monoliths. The (small) speed penalty should be well worth the price of admission.

iamcreasy · on Aug 16, 2018

> QnX has shown the strength of microkernels for decades

Could you elaborate on this one please?

jacquesm · on Aug 16, 2018

QnX is so common that if you removed it from the industrial world the world would literally grind to a halt. Machinery and vehicles would stop moving, factories would stop producing, chemical plants would (hopefully!) shut down and airplanes would no longer take off (or would have a much harder time trying to land without the usual guidance systems), boats would drift and messages would stop being sent through many systems that you probably would never have heard of.

For one example of such a system: A computerized way to communicate the availability of cargo and space to brokers all over the world, on a good Monday morning several million such messages are sent with a guaranteed maximum time between the first and the last such message sent (to ensure a fair market).

QnX is extremely pervasive and it - and other OS's like it - are so reliable that people tend to forget the systems it powers.

Blackberry made a pretty smart move with their acquisition of QnX, pity that it did not end with QnX being open sourced, that would have been very nice.

zug_zug · on Aug 16, 2018

Just as a casual reader of this comment, a few tips:

- The rhetoric about the world grinding to a halt isn't really pertinent and makes you look unreliable. Also, no, not "literally." I think what you're getting at is that it's chosen for high-reliability systems, with the implication being that it's chosen for its superior reliability(?)

- But I say this because it sounds like you have some knowledge as to how this obscure microkernal is used and sharing that is valuable to the community. It's just a shame if the message gets lost in the delivery.

jacquesm · on Aug 16, 2018

The world really would literally grind to a halt, you are severely under-estimating how common this particular OS is. It has been the embedded system of choice for a very large variety of applications, it is anything but obscure it is just obscure to those who spend their time working on web stuff but in industry you would come across it (and RTOS: https://rtos.com/) very often.

Neither of those names will ring a bell for people who spend their days with your usual web toolsets.

nickpsecurity · on Aug 17, 2018

It really wouldnt. They'd keep using the OS and tools to sustain their systems until they got a replacement vendor supporting QNX or replaced the systems with non-QNX software. They might also sue original supplier.

So, rather grind to a halt, it would all continue running in legacy mode, users would spend more money, and it gets supported or replaced. Lots of precedents for that sort of thing.

Between acquisition price and QNX's revenues, it's unlikely support will be terminated any time soon. We wont get to see the hypothetical tested unless RIM bankrupts with nobody acquiring and supporting their assets. That a losing company acquired QNX did worry me a bit, though.

iamcreasy · on Aug 17, 2018

Is there an open source alternative to QnX?

jacquesm · on Aug 17, 2018

It's sitting on my hard drive, never released. And I probably never will.

enitihas · on Aug 15, 2018

Reminds me of the famous Torvalds Tanenbaum debate. https://groups.google.com/forum/m/#!topic/comp.os.minix/wlhw...

CGamesPlay · on Aug 15, 2018

Ah, this was great!

> Linus "my first, and hopefully last flamefest" Torvalds

travbrack · on Aug 16, 2018

>I also agree that linux takes the non-portability to an extreme: I got my 386 last January, and linux was partly a project to teach me about it.

Times sure have changed

thsowers · on Aug 16, 2018

Thanks so much for the share, great gems in here, I still found myself surprised at this:

> True, linux is monolithic, and I agree that microkernels are nicer.

It's hard to believe that he was only 23 when he wrote this

tinco · on Aug 16, 2018

I think thinking about operating systems is precisely the sort of thing 23 year old college students do. I know a few in my class did, I was hyped up by MS's Singularity OS back then.

Unfortunately they were not aware yet that Windows was going to lose the Server OS war and they never pulled Singularity out of research.

Operating Systems is a 2nd year course in most universities.

pjmlp · on Aug 16, 2018

Why? In Portugal we finish our 5 year engineering university degrees at the age of 23.

acd · on Aug 15, 2018

There is a reason why kernel code run in privileged mode, speed! If you run more kernel code in privileged mode then you do not need to copy as much data between the kernel and user space. Vs a micro kernel you will have to copy more data up to user space. Copying data to user space causes context switches and gives less performance.

Larger mono kernels: Speed

Micro kernels have advantages such as: smaller privileged attack surface and thus more secure, more crash proof as you can restart user land processes for example device drivers

https://en.wikipedia.org/wiki/Microkernel

naasking · on Aug 15, 2018

> If you run more kernel code in privileged mode then you do not need to copy as much data between the kernel and user space.

Microkernels don't copy data into the kernel address space, they copy data between userland address spaces. Which still happens in monolithic systems anyway when you're doing IPC. These are typically short messages, often only data passed in registers.

And if copying is going to be a bottleneck, then you negotiate a shared address space just like in Unix, and no more copying.

Sporktacular · on Aug 16, 2018

In a world of constant security threats and 2 GHz CPUs dedicated to cat videos - needing speed can no longer be the excuse for poor design.

And "embedded CPUs need every precious cycle" is not an argument either. As the paper says modern microkernals have a negligible speed penalty while IoT/networked industrial controllers are a security backwater.

pradn · on Aug 15, 2018

Do you really need to do a full copy? What if you had a shared page and notified the user process when the data in the page was available at a certain offset.

Take a look at this paper (FlexSC: Flexible System Call Scheduling with Exception-Less System Calls):

https://www.usenix.org/legacy/event/osdi10/tech/full_papers/...

tlb · on Aug 15, 2018

Sharing pages between user processes and kernel is extremely bug-prone, because threads in the user process can mutate the data while the kernel is reading it, leading to all kinds of race conditions. You can't depend on user processes respecting mutexes for security.

You can make this work by removing the page from the user process before making it available to the kernel, but the synchronization overhead of doing this (especially on a NUMA system) is probably worse than copying moderate amounts of data.

pradn · on Aug 15, 2018

Hmm. Perhaps temporarily switching the TLB entry to read-only mode for that page during kernel accesses? There might be some interesting software-hardware co-design solutions here.

hinkley · on Aug 15, 2018

Code doesn't run faster because it's in the kernel. The speed you're talking about comes from avoiding transitions in and out of a given space. If you stay out or stay in the results are pretty similar.

Except for your tooling. Cloudflare article from a couple years ago on why they don't use user-space network stack: https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp... and the tl:dr is a profound lack of feature parity. They use everything from iptables to tcpdump. If someone else worked on feature parity (they say it's too expensive for too small a gain for them), I expect they'd change their tune.

Dylan16807 · on Aug 16, 2018

> If you stay out or stay in the results are pretty similar.

Which is impossible when service A and service B are both in user space, because they are in separate spaces.

Ring 0 isn't special, but you need to be monolithic if you want to avoid transitions.

imglorp · on Aug 15, 2018

The title is missing "from a security standpoint". Of course, everything is a tradeoff. TLDR:

> We have presented what is, to the best of our knowledge, the first quantitative empirical assessment of the security implications of operating system structure, i.e. monolithic vs microkernel-based design.

> Our results provide very strong evidence that operating- system structure has a strong effect on security. 96% of crit- ical Linux exploits would not reach critical severity in a microkernel-based system, 57% would be reduced to low severity, the majority of which would be eliminated alto- gether if the system was based on a verified microkernel. Even without verification, a microkernel-based design alone would completely prevent 29% of exploits.

> Given the limited number of documented exploits, we have to assume our results to have a statistical uncertainty of about nine percentage points. Taking this into account, the results remain strong. The conclusion is inevitable:

> From the security point of view, the monolithic OS design is flawed and a root cause of the majority of compromises. It is time for the world to move to an OS structure appropriate for 21st century security requirements

gjm11 · on Aug 15, 2018

So, they've looked at a sample of exploits that were critical on Linux and established that most wouldn't have been critical on a hypothetical otherwise-similar microkernel system.

But they haven't looked at a sample of exploits that were critical on an actual microkernel OS and seen how many would have been less serious (or not arisen) on a hypothetical otherwise-similar monolithic-kernel system.

It reminds me of a nice observation in "Surely you're joking, Mr Feynman". Feynman developed some nonstandard ways of solving mathematical problems. Other people came to him and he repeatedly solved problems they'd been stuck on. "He must be much smarter than us!" But the problems they brought to him were selected as ones they couldn't do, so of course he'd look better than them on those. They never bothered asking him problems he couldn't do but they could, because they'd already done them.

Now, maybe the authors of the paper are confident that there's no way that a microkernel design could encourage or exacerbate vulnerabilities. But so far as I can see they don't offer any actual argument for that proposition.

nickpsecurity · on Aug 15, 2018

"But they haven't looked at a sample of exploits that were critical on an actual microkernel OS and seen how many would have been less serious (or not arisen) on a hypothetical otherwise-similar monolithic-kernel system."

Cuz they don't exist so far. Microkernel-based systems will have same kinds of bugs as monoliths if coded in same language for same hardware. From there, the microkernel architecture leads less bugs in number (less code), less in severity (more isolation), and sometimes less difficulty in patching or recovery. If looking for microkernel-based bugs, I'd look for errors in concurrency and passing data over IPC. Monolithic system are using more concurrency and middleware than ever now, though. Even more than microkernels from what I see if talking about all the strategies and their implementations vs a few, standardized primitives. So, even those areas central to microkernel design seem like problems shared with modern monoliths.

So, the status quo is that the monoliths mostly add problems and increase their severity. Vice versa, the microkernels mostly subtract them in number and/or severity. The field evidence shows this with most of the data on bugs and vulnerability coming from monolith users. From there, someone might want to try to see if the opposite is true. Burden of proof is on them, though, with the status quo being quite reasonable. And that investigation, as I said, might find "microkernel" problems that hold in how modern monoliths are used (esp service and web architectures). Still worth attempting since they might surprise us with what they find. :)

To be clear, that's all about the architectural patterns. I think combining all the potential benefits of microkernels in a system vs a simple monolith could lead to more bugs in microkernel. Most of the problems in software will come from complexity and QA level regardless of architectural style. So, my post is written with assumption that we're talking about large, complex systems done with one style or the other.

da_chicken · on Aug 15, 2018

> The title is missing "from a security standpoint".

I mean, kind of, but since maintaining system security and integrity is a core function of the OS -- in fact, it is the primary gatekeeper in terms of all system security -- it means that "being secure" and "being correct" are often synonymous terms for an operating system.

After all, if we don't care about security at all we can all run CP/M or run everything as root.

Now, sure, you can say that the whole thing is bullshit because verified microkernels are so difficult to design that the end result would be an unusable system, but all that suggests is that when you design your kernel you should aim for a hybrid and compromise more on the side of a microkernel where you can.

digi_owl · on Aug 15, 2018

The age old joke about the computer encased in concrete at the bottom of the ocean does indeed come to mind.

fulafel · on Aug 16, 2018

It would be more credible if the authors were able to distinguish between exploit and vulnerability.

Steko · on Aug 15, 2018

I examined all fatal car crashes in the United States between pi day and Bloomsday in 2015 and assigned them a Mitigation Score based on the hypothetical that the people involved were instead walking. 98.3% of fatalities would have been prevented. The jury is in: ban all horseless carriages.

nasoieu · on Aug 15, 2018

That chart with the growth of the Linux kernel discredits everything. The Linux kernel continues to grow because they are obsessed with keeping all drivers in mainline instead of having a stable API for them as any sane project would.

rhencke · on Aug 15, 2018

I believe one of the reasons Linux continues to be successful today is _because_ they keep the drivers in mainline _without_ a stable API.

It is a key motivator to ensure drivers remain available and supported well into the future of Linux.

pjmlp · on Aug 15, 2018

Which given the way legacy AMD drivers story has been handled on Ubuntu, it is quite far from actually being so.

My Asus Netbook no longer gets all the acceleration options that it used to have pre 16.04.

Linux is successful because it is a free beer UNIX clone.

monocasa · on Aug 15, 2018

You still have the source to the old drivers. It's just not a priority for others to support anymore, but you have the means to go add that support yourself, or pay a third party to do it for you.

Contrast with when windows changed the video driver architecture in Vista, and the only option you had was to pound sand.

bunderbunder · on Aug 15, 2018

Seems like a pretty easy cost/benefit analysis to me:

I can either spend thousands of dollars worth of my time on updating the driver, or I can pay someone else thousands of dollars to do it. Or I can spend a fraction of that on a new video card.

Seems like the only solution for Vista users is the only solution I'd ever have gone with, anyway.

monocasa · on Aug 15, 2018

Sure the cost/benefit works out the same for you (which is why no one has done the work), but that's different from the full set of options available to you in both cases.

bunderbunder · on Aug 15, 2018

How expansive do we want to be about what the full set of options is, though?

I tend to fall on the side of the root poster - if having a stable driver API means Linux gets more commercial driver support, then that's the one that has the highest cost/benefit in my book.

In practical terms, being able to hack on open-source drivers isn't particularly useful to me, but having an easier user experience with Linux on the desktop (and, more generally, having Linux on the desktop be something more than an also-ran) would benefit me immensely.

pjmlp · on Aug 15, 2018

I skipped Vista, however my XP drivers were fully supported on the upgrade to 7.

iforgotpassword · on Aug 15, 2018

Some xp drivers work on 7, most don't. Also unlike user space, you cannot run 32 bit drivers on 64 bit windows, so unless you were running 32 bit win7 you needed 64 bit xp drivers, which were rare. Are you sure you explicitly installed xp drivers and didn't just happen to have hardware where windows shipped matching drivers or could download them from the windows driver database? (Or downloaded an installer bundle from the manufacturer's website that just contained drivers for all supported platforms..)

core-questions · on Aug 15, 2018

Womp womp. Most of those netbooks are unusably slow anyhow, and were when they were brand new too. Run an older kernel on there and call it a day.

pjmlp · on Aug 15, 2018

Sorry but a DirectX 11 class dual core APU is good enough for general purpose gaming.

And runnig legacy kernels? No thanks.

monocasa · on Aug 15, 2018

Which GPU is it specifically?

pjmlp · on Aug 15, 2018

AMD Fusion Brazos APU, Radeon HD 6250

monocasa · on Aug 15, 2018

Certainly looks supported by /dev/radeon, just not /dev/amdgpu.

pjmlp · on Aug 15, 2018

I didn't said it wasn't supported, I said:

> My Asus Netbook no longer gets all the acceleration options that it used to have pre 16.04.

Which holds true, because /dev/radeon no longer offers hardware video decoding support it once did unless I force enable it, and even then it usually leads to random X crashes when watching videos.

And then there is this,

"For one, AMD users can’t use applications that require OpenGL 4.3 or later without the fglrx/Catalyst drivers."

https://www.omgubuntu.co.uk/2016/03/ubuntu-drops-amd-catalys...

monocasa · on Aug 15, 2018

OK, that calls out X.org as the reason why the drivers aren't being supported rather than Linux. You can use fglrx with newer kernels just fine, it's just that user space went out of it's way to break support.

I really fail to see how that has anything to do with Linux's unstable kernel driver API.

AnIdiotOnTheNet · on Aug 16, 2018

> OK, that calls out X.org as the reason why the drivers aren't being supported rather than Linux.

Ah, the Linux evangelist blame-game. It's a big advantage of a system being so haphazardly thrown together from separately developed components. Start by blaming the choice of distro, end up at "Linux is just a kernel".

monocasa · on Aug 16, 2018

I mean, I feel like that's valid in the context of this discussion. That being 'Linux's unstable driver API designed to push drivers as source into mainline causes ISVs headaches'.

How does that apply here when Linux didn't change and you can still use the same kernel driver, but some other component decided to not work with the driver anymore?

pjmlp · on Aug 15, 2018

Given that the radeon driver doesn't provide the same feature set as fglrx used to provide, with the same stability, it is surely a driver issue.

monocasa · on Aug 15, 2018

... a brand new, mainline kernel will run fglrx just fine. It's user space (specifically the x srever) that decided to break fglrx. So how is that the fault of the kernel's unstable API again?

pjmlp · on Aug 15, 2018

Graphics drivers run on the kernel.

monocasa · on Aug 15, 2018

I mean, everything runs on the kernel... but I assume you meant in the kernel.

Graphics drivers are split into three main pieces these days.

1) Kernel space that mainly sets up the GPU's MMU, and adds a context to the GPU's hardware thread scheduler. There'll be some modesetting too, but that piece is really simple (and fglrx never used the main kernel API for that anyway, so it doesn't really matter if it was stable or not. But it actually was pretty stable over the time frame we're talking about).

This piece still works fine on fglrx with a modern kernel.

2) A userspace component that lives with the window manager setting up the actual display output, and accelerating compositing. Stuff like EGL works by making IPC calls to this layer.

This is the piece that broke in your case, and only because the x server decided to change.

3) Another piece that runs in userspace and is ultimately what gets called by graphics APIs and is linked into every process making graphics calls. This is the vast majority of the driver.

nasoieu · on Aug 15, 2018

Linux on the desktop would have succeeded with a stable driver API. Having hardware with closed source drivers supported could have being a walk in the park for OEMs. Bad 3D, wireless, etc. support is what killed Linux. OEMs would have probably made more and better drivers if they didn't have to make changes to them every few months or go through the effort of releasing and mainlining the source.

btilly · on Aug 15, 2018

This is your opinion. The opinion of informed observers who are actually doing the work is the opposite - OEMs produce buggy, incomplete drivers and don't care to fix the bugs. In fact, given the choice, they don't want to allow anyone else to dig into their drivers because doing so can only lead to embarrassment. Plus exploits that are discovered are likely to be cross-platform exploits.

Any operating system that wishes stability therefore has to put barriers between themselves and the driver. And, preferably, should use drivers that they can audit themselves rather than trusting OEMs.

As an example of this outside of the open source world, the biggest reason why Windows used to have a reputation for BSODs is that they were dependent on third party drivers. Windows made it harder for third parties to take down their OS, shipped tools to audit drivers for bugs, and forced OEMs to clean up their act. (Which they didn't do voluntarily.)

hinkley · on Aug 16, 2018

Three words:

Creative Fucking Labs.

Someone told me back then that the sound blaster drivers were responsible for more crashes than the next several causes combined, so I started watching and I’ll be damned if I didn’t see the same thing.

renox · on Aug 16, 2018

If I remember well, there was a sound card which talked on the PCI bus after it should have stopped talking, these kind of bugs can break any OS, microkernel or not..

nasoieu · on Aug 15, 2018

It's better to have buggy, incomplete drivers than no drivers at all.

Dylan16807 · on Aug 16, 2018

If you control the desktops and take strong measures to stop horribly-buggy drivers from working, 95% of manufacturers will fix the drivers because they want to sell the product.

In that tradeoff, I'll easily pick the marginally smaller market with vastly better drivers.

iforgotpassword · on Aug 15, 2018

This is your opinion. I prefer using hardware with stable drivers, working reliably.

jonny_eh · on Aug 15, 2018

That's why I use a Mac. Seems like a good solution is to have the OS and HW manufacturer to be one and the same.

rleigh · on Aug 15, 2018

On the flip side, there are vastly fewer drivers. I can plug a random USB device into a Linux system and the chances are it just works. Same with Windows, though it might require the driver installing first. MacOS might not have any driver at all.

nasoieu · on Aug 15, 2018

That's not what I said. Would you rather have no hardware or hardware that BSODs once a month?

btilly · on Aug 15, 2018

That's a false dilemma.

The actual choice is whether to have limited hardware choices or hardware that crashes regularly. Put that way, it is obvious to me that limited hardware is the right answer.

Why? Because what I care about is having a system that works well for me. Limited choices are fine as long as I can determine in advance whether the system that I'm considering will work. (I usually can.) So now my choice boils down to, "Do I want to be able to buy a reliable system, or be forced to put up with a buggy one?"

Put that way, who wants to be forced to put up with bugs?

AnIdiotOnTheNet · on Aug 16, 2018

> Linux on the desktop would have succeeded with a stable driver API.

I doubt it. Linux desktop's problems run far, far deeper than that one questionable choice.

agumonkey · on Aug 15, 2018

So we should have two numbers, actual kernel, whole tree.

I am quite convinced that if they kept it that way it was because some unplanned property caused worse driver status with an API; too many delays, improper usage I don't know.

If there's no solid reason, then I'd be happy to have split codebases.

drewg123 · on Aug 15, 2018

RHEL does provide a stable kernel ABI (kABI) that can be and is used by vendors to ship binary drivers. See https://elrepo.org/tiki/FAQ

When I worked for a NIC hardware vendor, we would ship our driver in 4 forms:

1) source tarball

2) upstream kernel

3) RHEL/Centos kABI compliant source and binary rpms

4) Debian pkg using dkms

The upstream kernel driver wasn't good enough for a variety of reasons. For example, on Ubuntu LTS and RHEL, the in-tree driver was often based on a kernel that was several years old and which lacked support for recent hardware or features.

Jweb_Guru · on Aug 15, 2018

It's not misleading because most Linux kernel drivers run in kernel space; hence compromising them indeed potentially compromises the whole system, which is exactly the article's point. The fact that they're often buggy and poorly supported, unlike the "real" kernel, makes things worse and doesn't invalidate anything.

hinkley · on Aug 16, 2018

Nobody seems to understand that it’s possible to have one source tree and multiple binaries.

You can have hybrid systems where the code runs in isolation but the pull requests are still self contained instead of split into multiple pieces that have to be coordinated.

naasking · on Aug 15, 2018

> That chart with the growth of the Linux kernel discredits everything.

What claim does it discredit exactly?

AnIdiotOnTheNet · on Aug 16, 2018

If you insist that all drivers must be a part of the kernel, then it is perfectly fair to count all driver code as part of the kernel code.

digi_owl · on Aug 15, 2018

While equally obsessed with maintaining a stable API towards userspace. Sadly much of userspace is an unstable churn of API changes.

gnufx · on Aug 16, 2018

Contrary to what's written about microkernel speed: Systems I used in the '80s to great effect used at least the moral equivalent of a microkernel ("Nucleus"). They were fast (compared with VAXen etc.) for interactive use and supported real-time processes. (Some visitors thought context switching was a bit slow, assuming "microseconds" meant "milliseconds".) The filesystem was fast enough to dispel the assumption a "database" was always required for speedy experimental data access rather than a file per spectrum.

https://en.wikipedia.org/wiki/OS4000

The performance wasn't just because Nucleus initially was in hard/firmware; two later software implementations were performant (on faster hardware). Also, as the article is about security: at least the original Nucleus also supported an A1/B3-level secure OS.

mikkergp · on Aug 15, 2018

Well, the obvious solution is to design our Kernel's on Kubernetes.

steveklabnik · on Aug 15, 2018

https://www.qubes-os.org/

snug · on Aug 15, 2018

A reasonably secure os?

I think that's one thing I don't want just reasonable

steveklabnik · on Aug 15, 2018

It's kinda self-deprecating humor; the (now ex) Tor developers I know use it. Nothing is ever 100% secure.

przmk · on Aug 15, 2018

It's definitely better than software claiming to have military grade encryption.

yjftsjthsd-h · on Aug 15, 2018

It's like the lesswrong folks; step one was accepting that there are too many flaws to fix everything, and then step two was to do our best anyways.

qu4z-2 · on Aug 17, 2018

Also mutt, with "All mail clients suck. This one just sucks less."

wpdev_63 · on Aug 17, 2018

It's not secure if the hardware compromised. Not much is.

nwmcsween · on Aug 15, 2018

IMO a microkernel isn't a design worth pursuing as there will always be overhead. Instead an exokernel with a simple monolithic 'multiplexing' kernel or a language that has 100% safety (not really possible).

77pt77 · on Aug 15, 2018

Still waiting for GNU hurd.

xj9 · on Aug 15, 2018

hurd isn't even interesting. seL4 is faster, has production applications, and is formally verified. you can even use seL4 as a base platform in genode[0].

[0]: https://genode.org/documentation/platforms/index

htor · on Aug 15, 2018

gnu hurd was released in 2015, but very few people are working on it i'm afraid. https://www.gnu.org/software/hurd/news/2015-04-29-debian_gnu...

swiley · on Aug 15, 2018

In a perfect wrold microkernel OSes would be perfect but then it's all pointless.

In real life there are certain parts of the OS that have to work or the whole device stops working. Furthermore: the isolation of dynamic and less tested application code from these parts is generally a good idea, that's why monolithic OSes are so popular; they're simply less demanding.

ysleepy · on Aug 15, 2018

Nobody claims faults of core functions must be survivable in microkernel designs. The restarting of driver processes is a nice trick, but in the end the isolation and clean interfacing are particularly useful in an imperfect world.

The goal is to minimize the trusted computing base (TCB) so that it is, at least to some degree, verifiably correct. Then tack on features using isolated components.

Sporktacular · on Aug 15, 2018

Some super low quality commentary here. Straight to the Monolithic/Microkernal tribalism, talk of parallel dimensions, as long as we ignore the content of the paper - which remains far more empirical and convincing than any rebuttal seen here.

nine_k · on Aug 15, 2018

I'd hazard to say that every design is "flawed" in some regards: there's no way to achieve all desirable qualities and none of undesirable qualities. For one, some desirable qualities contradict each other.

So "${thing} is flawed" is not precise enough; an interesting statement would be "${thing} is not the best choice for ${conditions}". A monolithic OS is not the best choice for a high-reliability system on unreliable hardware. A microkernel OS that widely uses hardware memory protection is not the best choice for a controller with 4KB of RAM. A unikernel setup is not the best choice for a desktop system where the user is expected to constantly install new software. Etc, etc.

In other words, the ancient concept of "right tool for the job" still applies.

loup-vaillant · on Aug 15, 2018

The main point remains: server and desktop OS kernels are a critical piece of infrastructure, worthy of full machine checked verification.

In this context, it should be pretty obvious that a design that fails to minimise the trusted computing base is flawed. Even if a kernel vulnerability is unlikely to kill any given user, the sum of all the crashes, hacks, patching effort… is huge.

I bet my hat rewriting the entire kernel for all popular OSes would be far cheaper worldwide than not doing it. (Of course, this won't happen any time soon, because path dependence and network effects. Unless maybe someone works seriously on the 30 million lines problem described by Casey Muratory.)

> In other words, the ancient concept of "right tool for the job" still applies.

Absolutely. Monolithic kernels are clearly the wrong tool for the job.

zokula · on Aug 17, 2018

> Monolithic kernels are clearly the wrong tool for the job.

Says who exactly?

loup-vaillant · on Aug 17, 2018

Says Simon Biggs, Damon Lee, and Gernot Heiser. Have you even read the abstract?

> The security benefits of keeping a system’s trusted computing base (TCB) small has long been accepted as a truism, as has the use of internal protection boundaries for limiting the damage caused by exploits. Applied to the operating system, this argues for a small microkernel as the core of the TCB, with OS services separated into mutually-protected components (servers) – in contrast to “monolithic” designs such as Linux, Windows or MacOS. While intuitive, the benefits of the small TCB have not been quantified to date. We address this by a study of critical Linux CVEs, where we examine whether they would be prevented or mitigated by a microkernel-based design. We find that almost all exploits are at least mitigated to less than critical severity, and 40% completely eliminated by an OS design based on a verified microkernel, such as seL4.

If the effect is that huge, of course security trumps pretty much all other considerations. These are consumer OS kernels we're talking about. One that fails to more or less maximise security is obviously the wrong tool for the job. As the title of the paper suggests, in case you failed to read that as well.

Sporktacular · on Aug 16, 2018

If we can't make reasonably secure controllers with only 4KB of RAM, perhaps we shouldn't be making controllers with only 4KB of RAM.

Tech and market pressure will make more capable processors affordable if security is prioritised.

Dylan16807 · on Aug 16, 2018

Bah, don't conflate 'microkernel' with 'secure'.

For a small system like that, the shining ideal is probably a formally verified single program with no real OS to speak of.

Sporktacular · on Aug 16, 2018

I don't need to - the paper just did that by conflating monolithic kernels with security failures that were avoidable.

Dylan16807 · on Aug 16, 2018

That doesn't mean they are the only way to get security.

Sporktacular · on Aug 17, 2018

If you can afford formal verification and application programmers also fluent in hardware level coding, you can probably afford more RAM.

I'd want to see a solid paper comparing the security of the approach you mention to a microkernel+app. Until then we have this.

But agreed, there are other ways, of varying practicality, to achieving security.

naasking · on Aug 15, 2018

> For one, some desirable qualities contradict each other.

Such as?

badpun · on Aug 15, 2018

Security, performance, ease of use.

naasking · on Aug 15, 2018

I don't see why any of those are mutually exclusive. Capability based operating systems provide least privilege security, experiments with capability-based UIs have shown they are quite intuitive and secure because they align user actions with explicit access grants, and they don't perform any worse than monolithic or microkernel operating systems.

The problems are really the flawed mental models people insist on bringing to these problems, despite decades of research showing these models are irreparably flawed.

nine_k · on Aug 15, 2018

Capability chains are relatively more expensive to check than a bitmask, or just nothing (as e.g. in an embedded system). Also, performance asks for shortest code paths and out-of-order execution, security asks for prevention of timing attacks and Spectre-like attacks.

Security requires to identify yourself with hard-to-fake means, it takes time and effort (recollecting and typing a password, fumbling with a 2FA token); ease of use asks for trusting and immediate response (light switch, tv, etc).

Performance asks for uninterrupted execution and scheduling of tasks based on throughput; ease of use asks for maximum resources given to the interactive application, and scheduling based on lowest interactive latency.

Also feature set vs source code observability, simplicity vs configurability, performance vs modularity, build time vs code size vs code speed, etc.

naasking · on Aug 16, 2018

I'm not sure you and I are referring to the same thing by "capability". There is no chain of capabilities that needs to be checked, the reference you hold is necessary and sufficient for the operations it authorizes. In existing capability operating systems, this is a purely local operation, requiring only two memory load operations. Hardly expensive.

Every OS is vulnerable to the hardware upon which it runs, but capability security at least makes side channel attacks somewhat more difficult because of least privilege and limiting access to non-deterministic operations, like the clock.

Identity-based security built on an authorization-based model like capabilities also ensures it's difficult to make promises you can't keep. Access list models let you easily claim unenforceable properties and then people are surprised when they are easily violated.

> ease of use asks for maximum resources given to the interactive application, and scheduling based on lowest interactive latency.

That's unnecessary. You need a low latency upper bound, not a "lowest latency". This does not necessarily conflict with throughput, and furthermore, and install requiring high throughput and interactivity rarely overlap.

I'm not even sure what the rest of the properties are supposed to be about. I don't think most of those are mutually exclusive either.

amelius · on Aug 15, 2018

> In other words, the ancient concept of "right tool for the job" still applies.

Or use a multi-tool ;)

SteveNuts · on Aug 15, 2018

The trade-off with a multi-tool is additional complexity, more difficult maintenance, it's heavier, takes up more space, etc.

beat · on Aug 15, 2018

Multi-tools - for when you need to do a mediocre-to-bad job with lots of different things!

renox · on Aug 16, 2018

OK so seL4 is safer than Linux, that's not really news.. I have questions about seL4 though: is it able to manage several multicore CPU efficiently? What about power management, does it work?

pjmlp · on Aug 16, 2018

Given that it is responsible for powering many embedded systems deployed into production, yes.

monocasa · on Aug 16, 2018

They only added multicore support in the past year, and it contains no power management code beyond sleeping in the idle loop.

pjmlp · on Aug 17, 2018

So I wonder how do those devices manage themselves.

monocasa · on Aug 17, 2018

They dont. You don't need power management for a lot of embedded systems that are high assurance. Power is generally plentiful in high assurance situations for a lot of reasons.

matachuan · on Aug 15, 2018

Of course it's about Gernot Heiser's verified ukernel...

star-trek-fleet · on Aug 15, 2018

Now it's a good time to revive microkernel for prime as server less offers the abstraction for making transition transparent to users and application developers.

Anyone tried that?

nostrademons · on Aug 15, 2018

Much of the cloud/virtualization industry is actually built on exokernels; a hypervisor is basically the commercially-viable version of an exokernel. The unikernel movement is an attempt to revive the LibOS and use exokernels as they were originally designed (i.e. move OS abstractions into user-space and not have a whole OS running on top of an OS), but there've been several pragmatic barriers that've kept them from gaining all that much adoption.

star-trek-fleet · on Aug 16, 2018

I think I am mostly want to say to the advocate of a certain on machine first-level software infrastructure above hardware to feel more free to try to prove their technology is the best.

I am not too much about particular alternatives.

brewski · on Aug 15, 2018

This is like saying Linux is more secure than Windows because not all of Window's critical vulnerabilities appear in Linux.

Jweb_Guru · on Aug 15, 2018

What classes of vulnerabilities are you referring to that are likely to show up in a microkernel-based design but not in a monolithic one? Do you think adding those classes of vulnerabilities to the picture will result in more, about the same, or fewer critical security issues between the systems overall, and why? Please be specific.

paulie_a · on Aug 16, 2018

True but in the case of windows gdi has been a never ending pinata of critical bugs.

throw7 · on Aug 15, 2018

The Jury Is In: Microkernel OS Design Is Flawed.

See I can make clickbait titled papers also.

Jweb_Guru · on Aug 15, 2018

Great, now all you have to do is build a secure (preferably formally verified) monolithic kernel, exhaustively, thoroughly research all the vulnerabilities in popular micokernels, find evidence to support your thesis, write up the paper, and get it peer reviewed, and you're all set!

flurrything · on Aug 15, 2018

Can you? Where is that paper? I only see a title.

nicc · on Aug 15, 2018

Linux is the most popular OS in the world (of course, counting Android), so I guess this means it doesn't really matter is monolithic is flawed, or it is not flawed enough.

JdeBP · on Aug 15, 2018

Actually, MINIX, the operating system used in the ME in modern Intel chipsets, is possibly now one of the most widely deployed operating systems on the planet.

* https://news.ycombinator.com/item?id=15642116

* https://news.ycombinator.com/item?id=15634014

* https://news.ycombinator.com/item?id=15697888

* https://news.ycombinator.com/item?id=15641592

pjmlp · on Aug 15, 2018

Sure it matters, because Android Oreo has actually forked Linux into a microkernel like design, where drivers run as separate processes and use Android IPC to talk with the kernel.

Symmetry · on Aug 15, 2018

And Fuchsia, if it ends up replacing Android, is a microkernel design from the bottom up.

gnulinux · on Aug 15, 2018

Isn't GNU/Linux (so, no Android) alone most popular OS too, thanks to servers and cloud? Any data on this?

nicc · on Aug 15, 2018

Ah, possible--but not consumer.

gnulinux · on Aug 17, 2018

Well, by using the internet, pretty much everyone is a consumer, even though they don't necessarily know of it.

m3mpp · on Aug 15, 2018

What popularity has to do with OS design?

0xdeadbeefbabe · on Aug 15, 2018

> We find that almost all exploits are at least mitigated to less than critical severity, and 40% completely eliminated

This probably sounds trollish, but security isn't that important. An OS architecture has to take into account other things too.

guy98238710 · on Aug 15, 2018

It's kind of pointless to have a faster server at the cost of seeing that server owned by someone else.