Oh, it's a paper I am (rather peripherally) involved with. If you have questions then ask away, although it's possible I might not have all the answers ...
Architecturally, microkernels and unikernels are direct opposites.
Unikernels strive to minimize communication complexity (and size and footprint but that is not relevant to this discussion) by putting everything in the same address space. This gives them many advantages among which performance is often mentioned but ease of development is IMHO equally important.
However, the two are not mutually exclusive. Unikernels often run on top of microkernels or hypervisors.
While a unikernel might be structured to some extent internally into modules, it's basically all linked into a single blob and running in a single address space. Some programming languages may support some kind of PL-level separation, but there is no hardware enforcement. In the case of what Ali is doing because all the code is written in C (it's all Linux, glibc and memcached) there is neither software nor hardware separation internally.
To give a list of these PL-level separation mechanisms, and their languages, these are the kernels I know of that work in this fashion.
Spin OS (Modula-3), Singularity (Sing#), Midori (M#), TockOS (safe rust).
As far as I know TockOS is the only one I know of which has some form of both PL-level and hardware enforcement of separation, albeit on an MPU rather than a full MMU, PL-level for kernel modules, and an MPU protected userspace.
I at least think it is worth addressing that none of these separation mechanisms are actually mutually exclusive.
Long ago, an OS I worked inside had a tool which took shared library indirect calls, and wired them out so you did not have a nested indirect function call cost, but went direct from the call to the backend without the pass-through the null function mapping into the specific .so library. It made things faster by one layer of the onion skin.
Strip, as a UNIX tool removes the textual crap in a debuggable binary, to leave behind only the bits you needed.
It feels to me like if you can do runtime code call checks, and confirm which actual calls you make, then stripping the bits of libc and associated libraries out, being left with only the strictly required calls, and then by extension the syscalls, and then by extension the kernel elements, is actually possible a lot of the time.
So, library -> reduced library -> reduced calls -> reduced syscalls -> reduced kernel state is a sequence or set or something, of applied minimisations which can be done, if you can predict all the call paths in your code and their dependencies.
But then its not a generic OS any more: its an application specific binding to a general purpose CPU.
Why not go all the way, and work into the ALU and remove the bits you don't need? And then go into the micro code, and the associated FPGA, and everything else..
You have to make sure your test cases exercise every possible code path. German demogroup .theprodukkt made a demo one time, .kkreiger, a first-person shooter game in 96kb. They used a technique like this to strip every line of source code that was not hit during a playtest. But the playtester didn't press the up-arrow in the main menu, so that doesn't work. More importantly they didn't get hit by any bullets, so in the final release the player is invulnerable because the code that handles damage was never compiled.
Yes. And its almost impossible to do this without a huge investment in time and effort, except: If you use languages with FP, and are rigorous, then I believe (possibly wrongly) its actually somewhat simpler to understand because the style of coding in FP exposes much of this "better" than in classic imperative coding
So yes, I think you're right: great dream, hugely hard to do, if not actually impossible in most cases near as damnit, but there are places where program proofs take you to this, Military/Space/Medical needs to know the code calls don't have unexpected outcomes. And language choices which express as "how" you express code, can also help. Probably not enough.
The halting problem states that it's impossible to know if computation will terminate in the general case, but there are lots of specific cases where we can prove that it will terminate. The Idris language compiler (think haskell with dependent types) will actually warn/error if you've written a function that it can't prove will terminate. It's actually really cool! If you're interested in learning more, I'd check out any talks by Edwin Brady or pick up his book[0]
I'm not sure it's relevant. All real machines have limitations that we accept when we write programs and feed them data. Anyway just because you can't prove that all possible programs will or won't run correctly doesn't mean you can't prove this for a useful subset of programs.
Well, the alternative is that you limit the set of things you can do, such that you no longer have a Turing machine. But that's certainly less fun, isn't it?
FPGA designs are usually created with high-level synthesis tools, where the input is not that much different from writing "software". See Chisel, and other such languages.
The designs for some RISC-V CPUs are available as open-source, so one could in theory take all that, and make it available as a library that you can compose as you wish. Slight downside is that now you have to be knowledgeable at FPGAs, Operating Systems and Programming Languages to build something coherent. If the components are well documented, simple and easy to use then perhaps the resulting system would be of less complexity than your typical system today.
It would be interesting to see how one could go in one direction to create a minimal self-contained system with such an architecture, i.e. the VPRI STEPS projects, but including FPGAs this time.
On the other hand there is tremendous value in using some well-defined standards along the way so you can benefit from all the debugging, analysis and verification tools that already exist, so such hardware/software co-design is always a balance.
>Why not go all the way, and work into the ALU and remove the bits you don't need? And then go into the micro code, and the associated FPGA, and everything else..
There is a lazy binding optimization in ELF shared libraries on GNU/Linux. The first call to the "PLT stub" of a shared library routine will self-modify so that the next call will go directly to the destination without having to calculate it again from the global offset table.
You can also do things like runtime profile optimization and link time optimization.
get a runtime profile, then optimize the code based on how it will run.
linktime optimization has the compiler output bytecode files, then the optimization happens at the link step where it happens globally across all objects.
The article is Tl;Wbmloc;Dr (too long, way beyond my level of comprehension, didn't read), so forgive any stupid questions. Could the unikernel technology be used to build extremely small Linux distros that contain just the bare minimum to satisfy all dependencies for running say a single software or a small group of them. I'm not interested in virtual machines for network related services but rather in small embedded systems where the user might need to use just one single application for the entire board lifetime. Security wouldn't be a concern there, assuming the user already has complete access to the hardware, that layer could be moved outside of the box by inspecting and firewalling the network traffic.
If yes, does anyone know if there is/will be any automated way say to analyze a software, example: a .deb package containing an user software, then build a list of dependencies and create a bootable image for the requested architecture which will run just that software?
It doesn't really make much difference in terms of size in itself, I suspect. You're still carrying along the whole kernel and the application code, they're just linked together into a single binary. You can certainly then trim things down a lot by stripping the kernel to its bones by removing drivers etc., but you can do that even without linking it together, and you can go a lot further than what the normal kernel configuration mechanism allows if you're prepared to strip out functionality you know you'll never need, but I'm hard pressed to think of approaches like that which you can apply to the kernel that will only work if you move your application code to kernel space.
In the past I've brought up Linux on embedded boards with 4MB RAM and 4MB flash for both the kernel and the application, and we gained most by linking statically to a smaller libc on the user side and stripping out drivers on the kernel side, ditching bash for ash (currently dash) and using a stripped down barebones init. We could have ditches ash and init as well and booted straight into the app, but ash made debugging and testing slightly easier and we could fit it in. 4MB at the time was not even a challenge.
I think the gap from something that you can fit a Linux kernel and an app mashed together into a unikernel up to being able to run them separately will be very small, so in most instances if you can make it work as a unikernel odds are you can produce a nearly a small solution without it.
Going full on unikernel potentially costs you a lot of resilience against failure, it's not just about security. The main reason for wanting to do it is to reduce the amount of context switches and data copying to reduce latency.
The linking stage automatically removes all parts of Linux (eg. drivers) that are not needed. We can also apply various link-time and whole process profile-guided optimizations. Apparently most of the performance improvement comes from getting rid of system call overhead although we still have a student measuring these things more precisely.
> The linking stage automatically removes all parts of Linux (eg. drivers) that are not needed.
So does the normal kernel build process once you've specified which drivers you need compiled in. That's in no way unique to a unikernel approach. There'll be extremely little code that can be stripped from a unikernel that can not be stripped from a kernel built for the purpose of running a specific application.
It's just not where the benefits of a unikernel are. As I said, and you reiterated, it's really cutting latency from syscalls that is the big benefit of a unikernel.
That's not how most distro kernels are built, but sure you can build a custom kernel with everything statically linked in. What you can't do is link your application in as well and do whole-program optimizations on the whole binary, which is what Ali is trying to do here. We're also hoping that the eventual solution will be distro friendly enough that we can pack the "kernel library" into a future Fedora which you can use to link your own servers into unikernels.
Of course it's not, but the comment I replied to was about whether using it to build binaries for extremely small embedded systems made sense. In that scenario a distro kernel from one of the desktop distros is a total non-starter.
If a few MB of space matters, then you'll want a suitably configured kernel that disables a ton of stuff that is totally irrelevant for such a system, and you'll need to take that configuration into account whether or not you're building a normal kernel or a unikernel, as it involves not just excluding drivers and the like, but disabling functionality that if left enabled will not get removed by the linker because there will be calls into it from code paths that are reachable.
Do that, and you'll find it is pretty straightforward to get a Linux kernel down under 1MB.
Now, it's possible your tooling makes that easier, but my point is that it is equally possible to disable those parts of the kernel for a non-unikernel system as well, and you're not going to make significant additional savings. And if you want to, then ditching glibc for one of the smaller libcs would be a better starting point.
This is not a criticism of this project - size is simply not what most people look to unikernels to address, especially not one built on a general purpose kernel.
Linux as a library is unlikely since GPL enforces a boundary between Linux and its users. Otherwise it'd've become just another statically-linkable asset copied into a bunch of corporate codebases long ago. glibc uses the same strategy too w/ LGPL. Folks wanting to overcome the latencies and bloat these barriers create, should consider the problem might be more about politics than engineering.
It probably helps security a ton if you are the only binary: there is no shell, no other executables, and you can even disable fork() and exec() system calls.
Alpine Linux is a reasonably small (20mb) Linux distribution, I've seen it run on bare metal but it's more common to see it in Docker environments.
Building minimal bootable to be very small is possible, but is difficult in comparison to an apt install - I imagine for the most part porting docker container configs to a bootable OS might be the best approach for small-medium projects.
This is brought up quite a lot and isn't answered correctly enough in my opinion.
The problem with this approach is that you can make it as small as you want but it's still Linux. At a certain point are you going to start patching things out like support for users? Support for management of multiple processes? There's a non-trivial set of syscalls and data structures designed solely for these constructs. You can't just seccomp it and call it a day.
For us it's not about the size (that's nice of course) but it's more about the performance and security.
I'm not suggesting it's a good idea, but it's there. I'm sure there's more minimal, and less minimal options available.
I don't think there's any security impacts with using alpine Linux specifically, aside from default credentials in a bunch of containers a few months back.
This is how embedded linux systems are built already, AFAIK. There are vendors, such as MontaVista, that provide a toolchain to build images in this manner. There are free software alternatives too to do that, such as Embedded Gentoo.
"Unikernel" is a overly generic term. Technically the Linux kernel is already a "Unikernel". It's just most of the time its discussed as a negative not a positive and the term used is "monolith".
The project you want is "Yocoto", a complete toolchain to build customized embedded OS images.
You would have to build your own analyzer. Grepping the dependencies from the makefile/build data or just parsing the output of dpkg and translating that into yocto build specs is not unreasonable.
> Unikernels have demonstrated enormous advantages over Linux in many important domains, causing some to propose that the days of Linux's dominance may be coming to an end.
I don't think that this statement describes the reality we live in.
To me, unikernels feel quite a bit like statically linked server binaries running under an unprivileged UID - but you're choosing not to trust Linux' (or any other kernel's) user separation facilities, but your hypervisor's domU separation facilities instead. In exchange, you lose virtually all of your existing OS's amazing debugging and performance analysis/tuning tools. It's not a tradeoff I'd readily consider.
True, it's a different picture. It's about how to create a really efficient computing mesh. If unikernels are faster (they start faster, then they run faster), and simpler, and you offload security and topology to kubernetes (or some other orchestration system), then you have a superior system, both in terms of performance and in terms of control.
When I first read about unikernels I thought its a great way to remove a layer forced upon you by cloud providers.
Most public clouds run some form of hypervisor, on top of which you run your VM and applications. I only want to run my application, and I don't get to choose to remove the hypervisor, thus I can only remove the kernel: a unikernel is like running an application on a hypervisor:
your hypervisor is your "OS kernel" and your unikernel is your application ... and we gained back what we lost compared to a bare-metal deployment.
That is a simplified view, and there are other advantages as well: you probably don't need to exit to your hypervisor quite as often as you do with syscalls, so you benefit from reduced overhead.
One disadvantage is that memory allocation is almost static (unless you do ballooning which is tricky to get right), so you "waste" some memory compared to a bare-metal deployment where unused memory can be used for the page cache instead.
i think this kindof misses the mark. A unikernel is both the kernel (just the things you need) + app together. It’s build for one thing: whatever the app is doing.
Tooling/Perf/etc is not needed when running (do you really want to debug in production) but tooling can be used in the process of development.
Why consider unikernels? 1) they’re stupid fast. You have only the things you need (booting in nanoseconds? yepp) 2) small attack surface (because you only have your app that’s the only attack surface. you don’t have cruft that build up in you os/kernel over years) 3) light resource usage (you could run thousands of these on the same physical machine) 4) true isolation via the hypervisor
definitely worth keeping an eye on the developments in this space
> Tooling/Perf/etc is not needed when running (do you really want to debug in production) but tooling can be used in the process of development.
Whether or not you want to debug in production, reality often means that you will see things in a live environment that you will not see in other environments.
Unikernels are very interesting and have a number of compelling attributes, but let's not pretend that the current state of available tooling for troubleshooting, instrumentation, and general debugging isn't a challenge.
You're moving the goal posts and invoking a straw man. No one is advocating for "pretending", and the anti-unikernel argument is that the inherent cost of unikernels in general is the loss of kernel debugging tools; not simply that "right now the unikernel debugging experience is subpar".
in general right now, running on a full-featured monolithic kernel, the debugging experience is really pretty bad. especially in the target environment of horizontal clustered services or lots of little micro services.
so I actually believe there is an opportunity here to focus on the important pieces (network messages, control flow tracing, memory footprints, etc) after ejecting a huge amount of irrelevant stuff
Totally agree, there's definitely an opportunity there if the surface area of what the system is doing gets smaller. The big difference today is that it's a lot easier to compose a debugging suite using additional tools on top of a traditional host-based runtime for now.
Definitely excited to see how the technology evolves over the next few years. It hasn't moved as fast as I'd have expected over the last 2-3 years but I'd love to see that accelerate.
No, I'm not moving any goal posts. I was reacting to the statement "Tooling/Perf/etc is not needed when running". I'm not sure where you'd get the idea that I'm anti-unikernel, I just wanted to not disregard that challenge since it does matter.
I took care not to say that you were anti-unikernel; I was articulating the anti-unikernel position. I'm not sure what you actually meant, so I'll take you at your word that you weren't moving goal posts--in whatever case, the OP is correct that unikernels aren't inherently less debugable even though mature debug tooling may not exist for them today.
> Tooling/Perf/etc is not needed when running (do you really want to debug in production)
Do I want to debug in production? No.
Do I have to do it anyways? All the fucking time. I'm not perfect, I sometimes ship bugs, and when they show up in production, I need to diagnose, determine if rolling back will solve them, or whether I need to fix forward and how. Not being able to debug in production is simply unacceptable.
I want to ship perfect code. But I don't. So, instead, I debug. If the issue shows up in production, I debug in production.
Of course I'm in already in trouble. That's why I'm debugging. I tend not to debug things that are working perfectly.
If I can't attach a profiler to find, for example, the code that contains a regex that's doing too much backtracking with production data, then I'm dead in the water. If I can't get a sample out of a system that's received a query of death, then I'm dead in the water. If I can't attach dtrace (or the equivalent) and get stats on how much I/O a system is doing in response to various events, I'm dead in the water.
Being able to dive into the depths of a system is the #1 criteria for confidently being able to put it into production.
> Tooling/Perf/etc is not needed when running (do you really want to debug in production) but tooling can be used in the process of development.
I don’t follow. Are you saying that development should build normal user space binaries, and you should only build the unikernel for production?
You will inevitably run into a behavior difference between the unikernel and user space outputs. Even if you’re not debugging in production (shudder) you need to be able to debug the unikernel.
Unikernels also run on bare metal, so it's not about swapping one sort of protection for another. Besides the differences in protection and separate processes, unikernels can do anything an OS can do. Some unikernels will run application software with no changes (but obviously things like fork don't work anymore).
The point (for me at least) is discarding features that you don't need in return for performance and simplicity gains. Those gains are not insubstantial when you're scaling your system and you don't want to spend your money on admin staff.
Except that those amazing debugging capabilities are also possible, as anyone that used Erlang, Java or .NET debugging capabilities in production is aware of.
Prometheus is great, but it won't tell me why that container gets "no route to host" when taking to that service. That's when I need a shell for running traceroute, tcpdump etc.
You still retain all of the hypervisor monitoring and tuning tools. And you can tune the unikernel using different build options. Debugging is probably not as fleshed out, but it does not seem to be an impossible task.
To be fair that's not exactly what the paper says. It says that unikernels have demonstrated "enormous" advantages (a bit of an exaggeration), but they are not widely used right now and no one is claiming that.
Why would you do such a thing? Creating masses of crippled, non-general operating systems out of linux?! For the actually unheardof performance.
Basically, you've taken the whole /sbin/init link out of the OS init chain, and replaced it with... A single binary you may remember.
To do this took a little extra work. Your glibc and any other dependency ended up in kernel space.
Single-use virtual machines obviously have the biggest boost, but other uses exist. Your kerberos-daemon isolated into its own vm on the network. Someone's gotta serve network filesystems. Apache servers for various websites you may host - the possibilities are endless.
The processor itself has a Massive timesink for isolating kernel-space from user-space (even root is just another user at this level, I'm afraid). Even Server-side admins have simply gotten used to it - every application, server, program Ever runs on that user-space.
So you have a bunch of single use VMs... and that much "context-switching" between virtual-drivers and memory allocation on one side, and the actual server the vm is for on the other. Now these VMs are running on an OS's hypervisor with the same context-switching performance problem!
So, we package each vm as a unikernel - shove each virtualized server into the kernel. We avoid that context switching.
Build that kernel lean enough, and sometimes the unikernel VM can perform almost at baremetal speeds.
What if we did the same for the Host OS - built a hypervisor Unikernel to delegate hardware and nothing else? zomg folks are squeeing with excitement about a new way to frame building a lean system...
But it gets us compiling in-house again instead of using COTS (commercial, off-the-shelf software). Got an old Gentoo-user hanging around that can debug kernel code, harden the kernel against attack, and compile it all lean?
That guy is who we're gonna need to figure out what goes wrong with any of this. Because Murphy's law is unreliably unreliable, yet ultimately absolute.
> In this case the authors themselves put the pdf on their website.
OT: This should honestly be a criminal offense, at least if done before a year passes. The publisher does valuable work in vetting contributions and checking the paper before publishing. They shouldn't be cheated out of their fair share of money that comes with due process just because authors go rogue.
Can't really say if this is sarcasm or not, but journals are paid to do that checking, it's not free, and they use unpaid people for peer review, so I'd say they aren't really losing money.
But determining whether or not authors have "gone rogue" depends on their agreement with the publisher. You do not have a basis for claiming they have done so unless you know whether or not they have obtained permission or whether or not they never handed over rights in the first place.
This is really cool, though it looks like the security impact is nuanced.
Unikernel plus: A lot of code, including code with vulnerabilities, can disappear. So some vulnerabilities disappear.
Unikernel minus: The unikernel and application don't have any security isolation, since the kernel is essentially linked into the appl. As a result, the application has all the privileges of its virtual machine (not the subset that would normally be imposed by the kernel), which would normally be more privileges than it really needs. So any remaining vulnerabilities can have a more devastating effect.
That trade-off in terms of security is really hard to evaluate.
But the kinds of performance improvement shown here, with relatively modest changes, is a really really big deal. So I'd expect a lot of people to investigate this further; it certainly seems promising.
If you write your unikernel in a high-level language then you can mitigate some of your security concerns. The unikernel should only have access to the data it needs to, so doing multi-tenancy or isolation inside the unikernel is probably not what you want: just spin up a new unikernel for each tenant/thing you want to isolate. The hypervisor would provide isolation in this case.
There is a downside, if there is a vulnerability the exploit can probably make hypercalls straight to the hypervisor, but a hypervisor can have less of an attack surface than a full OS.
all unikernels so far suck ALOT regarding security. wouldn't recommend any of them to run in any production environment unless you want to lose all of your data.
NCC group did a good article on unikernels and how crap they are if you want to know.
A better idea, like already suggested would be to create an OS which builds itself according to a target application and system, to host that application on said system specifically. that would reduce about as much code, but keep potential security mechanisms like aslr, stack protection , user / kernel separation etc. in tact.
(now kernels / oses build to target system ,but not application! -> application would only use subset of kernel, and thus kernel can be built to target application, reducing kernel to whats needed).
don't try to be cheap for performance and skip security, we're not in the damned 80s anymore.
I'd argue the Mirage Unikernel (built almost wholly in OCaml) is one of the most robust platforms out there. The NCC paper you talk about looks at two rather old fashioned unikernels in isolation. I don't think the idea of unikernels should be discarded because the current implementations are slightly lacklustre -- it just shows that there's a fair way to go yet.
>[..] A better would be to host that application and reduce the kernel to whats needed
The authors of the NCC paper are evaluating MirageOS as well. IIRC from listening to their talk on the paper and ongoing research https://www.youtube.com/watch?v=b68VFuB_y5M it's got more of the problems other unikernels do than I'd have assumed. I'm pretty ignorant, but the paper gave me the impression that there's a long (rather than fair) way to go yet, especially relative to seemingly widespread assumption that unikernels are inherently more secure.
If you read that article, they tested two unikernels and the key point was that unikernels don't implement security mechanisms present in a normal OS. This Linux Unikernel, otoh, does not delete any of that security functionality. The boot up code is the same, it just calls a specific code instead of bringing up the general userspace. Unikernels and security don't have to be mutually exclusive.
Genuinely curious: Wouldn't the teeny-tiny "attack surface" at least mitigate attacks, and should one be successful on say a memcache server running inside a unikernel... what exactly can you do on a "rooted" machine that has no userspace tooling or libraries? Assuming it's running in isolation, it wouldn't be much different than breaking into a container through an unprivileged process, right?
I was actually planning a Unikernel based deployment of Memcached (OSv) as a L2 cache for our application servers. I thought that user/kernel separation didn’t matter that much when only a single application is ran in a microkernel, and the data exfiltration risk would be the same as if we were running it on a typical Linux box.
We literally just talked about this in a video done yesterday for application security weekly - https://www.youtube.com/watch?v=Hob1iLjIgWE - I previously wrote my thoughts on it here as well:
I'd rather see unikernels written in Rust or any other memory-safe language. If we're going to start from scratch with this, let's do it right this time.
70%-90% of bugs seem to be memory-related. Let's end that with a little bit of effort upfront for much fewer headaches in the long-term.
The entire point of this paper is not to start over from scratch, but to reuse existing software (Linux and memcached in this case), and fiddle with the linker command line and a little bit of glue to link them into a single binary. If you want to start over from scratch using a safe language then see MirageOS.
Edit: I don't mean to say you have to use C for the "userspace" part of this. It should be possible -- in future -- to use any language for that. However at the end of the day you'll still be linking that with Linux (written in C) and glibc into a single binary that runs in one address space.
We need Rust to mature a bit before we can do that.
Everything looks great on surface but as soon as you start doing low level coding a lot of issues pop up and you need the nightly compiler and xargo and some undocumented library for the boot sequence and whatnot.
Choose any "let's write an OS in Rust" project that has recently been featured on HN. Then try to follow it using the stable compiler (sudo apt install rustc) and see what happens.
Maybe "work in progress" better describes the situation. We need to wait for the language to stabilize, then the tooling and finally the libraries before starting a huge project like writing an OS.
Edit: In case you can't read the paper there's a copy here: https://www.cs.bu.edu/~jappavoo/Resources/Papers/unikernel-h... (Thanks anonymousDan in the comments below for linking to it)