IncludeOS roadmap: tiny Node.js-style web services in highly efficient C++

kator · on Dec 16, 2015

I love watching the pendulum swing back and forth in tech.

From mainframes to pc's ... to thin clients ... to server rendered HTML ... to client side frameworks ... to microservices

I'm sure I could come up with more examples. I was just thinking of re-architecting a heavy apache web head app into unikernel super light micro services. It just makes me chuckle as we swing back and forth and try to find the right balance. Each generation sort of saying "Hey wait there's a CPU over here on this side of the diagram and it could do X/Y/Z".

Recently I've been working on Raspberry Pi's so much that when I logged into a nice medium sized EC2 instance I was "shocked at how responsive these are". LOL. I started on machines with 4k of memory and built systems that consume TB of ram across many nodes and data centers. It's fun to go back to my roots trying to cram something into an ESP8266 nodemcu device and thinking about how long I can run it on a LiPo battery. I think this back and forth is good for us, but I encourage everyone to spend time on micro devices to remember just how simple you can solve a problem when confronted with the reality of 20k of ram and 4M of flash storage.

Gladdyu · on Dec 16, 2015

I wrote something similar a while ago, combining the easy-to-use asynchronous programming model of Node.js with the speed of C++.

http://gladdy.github.io/c++/2015/07/26/NodeDemystified-pt1.h...

source code: https://github.com/Gladdy/Mininode

nickpsecurity · on Dec 16, 2015

The example looks really clean. Amazing how far modern C++ has gotten in that area.

pcwalton · on Dec 16, 2015

Losing memory safety is a pretty big deal though.

In particular, the capturing semantics with [&] are going to result in a big surprise (use after free) if you try to close over and capture variables like you do with node.js.

jbandela1 · on Dec 16, 2015

I agree, that it can be a big deal and one must think about ownership and lifetimes whereas in Javascript there is no need to. Especially in code that will exit the surrounding function before the callback is executed, I do not use &, and explicitly specify the captured variables so I can think about what type of capture.

However, C++ is improving in this regard. I think lambda capture initializers with C++14 help in this regard so that unique_ptr can be used. In addition, the memory safety tooling is improving - see Herb Sutter's CppCon 2015 keynote. Finally, C++17 should have co-routines which will make this type of callback code less used and allow one to use "normal" logic without callbacks that will also be non-blocking.

pcwalton · on Dec 16, 2015

> I think lambda capture initializers with C++14 help in this regard so that unique_ptr can be used.

They're still opt-in—the burden is on the programmer to get it right.

> In addition, the memory safety tooling is improving - see Herb Sutter's CppCon 2015 keynote.

Sure. But it's not here yet, it requires a lot of annotations, and there are several key unanswered questions that I didn't see an answer to (how the static analysis deals with pointers derived from shared_ptr when it relies on all pointer function parameters being unaliased, for example).

> Finally, C++17 should have co-routines which will make this type of callback code less used and allow one to use "normal" logic without callbacks that will also be non-blocking.

And when the coroutines suspend/yield, you have the exact same issues regarding dangling pointers and use-after-free.

nickpsecurity · on Dec 16, 2015

Got no expertise there. I always assume it with C or anything derived from it. But C++ commenters here keep telling me modern C++ and its libraries have all kinds of ways to prevent common issues with memory. They've demonstrated a few. So, I'd be curious to see a comparison of what happened in practice between the two.

That said, typical C++ applications might be easier to get right than Javascript runtimes, JIT's, or whatever. A lot of eyes go into the JS foundations but so do a lot of complexity and problems.

pjmlp · on Dec 16, 2015

To add my input to pcwalton.

It is true that modern C++ with the proper tools can mitigate many of the issues that come from C heritage.

The sad reality is that despite the tools that C++ offers for memory safety, the majority of enterprise developers don't buy into them and the code looks more like C with classes than anything else.

If you look at CppCon 2015 when Herb was presenting the Core Guidelines and the new VSC++ static analyzers, only 1% of the audience said they ever used one.

seren · on Dec 16, 2015

From what I have seen, the issue is that in big risk-adverse entreprise, the mentality of "don't change anything if it works" goes a bit too far.

I feel a bit ashamed, but to give you some rough idea of the age of the toolchain, the c++ compiler we are using does not even support namespace. We are slowly fazing out old platforms but this is really a long drawn struggle. Basically you have to wait that the underlying HW is not produced anymore for 10 years, before maybe thinking the SW side should be upgraded.

I would be very eager to use some nice feature of Cpp11 but it won't be in this company...

pjmlp · on Dec 16, 2015

No reason to be ashamed, I have seen many companies that only use platform SDKs (e.g. aC++ on HP-UX) and IT has the last word over what tools are allowed on dev machines.

However the situation is no different in other languages.

I am aware of some companies still using Java 5 on production, just to cite one example.

nickpsecurity · on Dec 16, 2015

I know one major retailer whose terminals are all a DOS variant whose sales ended in 1999 and whose inventory is hard-tied to SCO UNIX. Got future-proof written all over those. ;)

Note: Many key operations, including the backend, are on AS/400 and mainframe. Much wiser choices. Needless to say, the employees tell me that part never goes down.

pcwalton · on Dec 16, 2015

> If you look at CppCon 2015 when Herb was presenting the Core Guidelines and the new VSC++ static analyzers, only 1% of the audience said they ever used one.

That's interesting.

It's especially interesting to note that those static analyzers work on unmodified code! The work-in-progress ISO Core C++ lifetime profile does not work on unmodified code and requires a lot of annotations…

pjmlp · on Dec 16, 2015

Here

CppCon 2015: Herb Sutter "Writing Good C++14... By Default" around 00:32:00.

https://youtu.be/hEx5DNLWGgA?t=2053

> The work-in-progress ISO Core C++ lifetime profile does not work on unmodified code and requires a lot of annotations…

Herb mentions that it still less than lifetimes in Rust given C++'s type system:

https://youtu.be/hEx5DNLWGgA?t=1653

https://github.com/isocpp/CppCoreGuidelines/raw/09aef9bd86d9...

pcwalton · on Dec 16, 2015

I've read the paper closely.

> Herb mentions that it still less than lifetimes in Rust given C++'s type system:

Yes, I know Herb claimed that. As far as any of us can tell, it's based on a misunderstanding of Rust's rules and is wrong :)

nickpsecurity · on Dec 16, 2015

That is a sad reality. Got tools to get the job done effectively and they collect digital dust.

pcwalton · on Dec 16, 2015

> That said, typical C++ applications might be easier to get right than Javascript runtimes, JIT's, or whatever. A lot of eyes go into the JS foundations but so do a lot of complexity and problems.

It is untrue that typical JavaScript applications are vulnerable to anywhere near as many memory safety issues as C++ applications are. Even "modern C++".

As a very relevant example, closures are modern C++, but neglecting the right annotation on the capture clause—the difference between writing one ampersand and not—can lead to use-after-free. In JavaScript, the runtime will manage the lifetimes of the closed-over variables for you, and you can't get it wrong unless there's a bug in the JS engine. Accidental use-after-free due to JS implementation bugs is so rare that I wouldn't be surprised if it's never happened in practice.

gpderetta · on Dec 16, 2015

>It is untrue that typical JavaScript applications are vulnerable to anywhere near as many memory safety issues as C++ applications are. Even "modern C++".

The OP was talking about JS runtimes and JITs.

nickpsecurity · on Dec 16, 2015

Exactly. There's attacks on them posted quite regularly.

pcwalton · on Dec 16, 2015

Note the emphasis on accidental memory safety violations.

There's an enormous difference between attackers running JavaScript that they control and attackers having to exploit memory safety issues in JavaScript that you wrote. Virtually all attacks against JavaScript JITs have been of the former type, in that they rely on perfectly crafted hostile JavaScript that nobody would write on purpose. But in a node.js type of server side scenario, attackers do not have the ability to run arbitrary JavaScript. They only have the ability to interact with the JavaScript that was already written to be non-malicious. So it's much, much harder for an attacker to exploit memory safety problems in the JS engine in a server side scenario—so much harder, in fact, that I wouldn't be surprised if it's never been done.

If your argument is that JS JITs are hard to write, well, sure. But I think that's totally irrelevant to this conversation. Clang and LLVM are also hard to write and are just as trusted in this scenario.

nickpsecurity · on Dec 16, 2015

"Note the emphasis on accidental memory safety violations."

Good points. Of course, I'm guessing you're saying it's still really hard to write memory-safe code with modern C++ constructs whereas others here suggest it's pretty easy. That point is tripping me up in these discussions. Did a quick Google on C++ safety and found you already said something about the topic:

https://news.ycombinator.com/item?id=7587978

My statement is predicated on the premise that modern C++ can be done safely without issues like that. If not, then it's false and we're back to me recommending safer, systems languages.

One thing to remember, though, is that there's tools to automatically transform C code into safe code. Softbound + CETS comes to mind. Might be able to use a C++-to-C compiler with one of those. There's also safer forms of C++ like Ironclad C++.

http://www.cs.grinnell.edu/~osera/papers/ironclad-cc.pdf

So, the situation might be worse than people looking into C++ were hoping but not as bad as you think.

hanez · on Dec 16, 2015

Sounds a little bit crazy to me at first but when thinking about it, it is a nice idea.

May you should take a look at Rump Kernels and build your stuff on top. Then you do not need to implement the OS stuff - It's done already. May I am wrong but it seems to be a similiar idea but the following project is currently at the OS level only but some applications like ngnix are working already. I was very confused when I first read about Rump Kernels and after reading a while and watching some conference talks a lot of stuff made sense to me even if I do not understand in detail what they are doing.

http://rumpkernel.org/

https://github.com/rumpkernel

semisight · on Dec 16, 2015

This seems more like MirageOS/Ling than a rumpkernel. From what I remember, rumpkernels are more general, at the cost of being less "unikernel-y" (i.e. lean and fast). I agree that it would be easier and faster to use a rumpkernel if the intention is use in production soon.

I can't tell if this is a research project, or intended to be production quality at some point.

vive-la-liberte · on Dec 16, 2015

>I can't tell if this is a research project, or intended to be production quality at some point.

Found this on their website [1]:

>IncludeOS is the result of a research project at Oslo and Akershus University College of Applied Science (hioa.no)

>IncludeOS is not production ready - but we're working hard to become so.

[1]: http://www.includeos.org/

semisight · on Dec 16, 2015

Thanks, I didn't catch that. I'll have to make some time to play with this, especially once it gets further along.

hanez · on Dec 16, 2015

I believe to get this production ready will take some years.

From what I understand in the projects FAQ page they want to implement the concept of a Unikernel and this is what the Rump Kernel at http://rumpkernel.org/ is intending too. May I am wrong because I am not to deep into it.

If it is a research project then they should go on... It looks like it is because it is beeing developed at a university in Oslo, Norway. http://www.hioa.no/eng/

anttiok · on Dec 16, 2015

No, a rump kernel is not an unikernel, check the FAQ linked from http://rumpkernel.org/

However, you can use rump kernels as a major component of a unikernel implementation. A rump kernel provides environment-agnostic drivers, meaning you can integrate them pretty much anywhere.

Now, what is a unikernel? From my perspective it's essentially: 1) application 2) config/orchestration 3) drivers 4) nibbly "OS" bits

So from the bottom, the nibbly bits include things such as bootstrap, interrupts, thread scheduler, etc. It's quite straightforward code, and a lot simpler that the counterpart you'd find e.g. in Linux. But you can't do much anything useful with the OS when that part is written.

Drivers are difficult because you need so many of them for the OS to be able to do much anything useful, and some drivers require incredible amounts of effort to make them real-world bug compatible. Just consider a TCP/IP stack -- you can write one from scratch in a weekend, but the result won't work on the internet for years. Then you may need to pile on a firewall, IPv6, IPsec, .... A rump kernel will provide componentized drivers for free. The policy of if you use those drivers in a unikernel or microkernel or whateverkernel is up to you, but I guess here we can assume unikernels.

The config/orchestration bits are actually quite an interesting topic currently, IMHO, at lot of opportunities to make great discoveries. Also, a lot of opportunities to use the rope in the wrong way.

The applications depend on what sort of interfaces your unikernel offers. If it offers a POSIX'y interface, you can run existing applications, otherwise you need to develop them for the unikernel.

Now putting rump kernels and unikernels together: the nibbly bits are straightforward, the drivers come for free via rump kernels, and those drivers provide POSIX syscall handlers, so POSIX'y applications just work. That leaves the config/orchestration stuff on the table. There's a rumpkernel-based unikernel called Rumprun available from repo.rumpkernel.org. It's essentially about solving the config/orchestration problems. Due to the rump kernel route, the other problems were already solved in a way which can be considered "good enough" for our purposes.

Hope that clarified the difference between rump kernels and unikernels.

(edit: minor formatting fix)

hanez · on Dec 16, 2015

That clarified the difference! Thank you.

I think I mixed rump kernel and rumprun. I saw a talk long time ago and am not very deep into that nowadays.

Therefore, double thanks!

timClicks · on Dec 16, 2015

There is actually an active unikernel project called IncludeOS (http://www.includeos.org/), which could be a good place to start.

hanez · on Dec 16, 2015

Exactly this is what the original post is about... ;)

nickpsecurity · on Dec 16, 2015

Maybe it's a bot designed to promote IncludeOS. Gives a stock response to anything on the Internet that has IncludeOS in its name. That would make more sense.

timClicks · on Dec 16, 2015

Urgh, completely thought that the name was clobbered by a second project. Silly me.

hanez · on Dec 16, 2015

Uh, he ist not a bot, he is not a bot, he is not a bot... :D

std_throwaway · on Dec 16, 2015

At first it seemed strange that a Node.js-replacement needs a full IP stack including DHCP.

As far as I understood this, they are building a whole operating system to be run in a VM like QEMU which then acts like a Node.js-instance - except that it's programmed in C++ and linked as one whole exeutable which also is a complete OS.

I wonder how this compares to Docker or Sandstorm where the existing kernel API is used in a virtualized container instead of emulating a whole machine.

kentonv · on Dec 16, 2015

> I wonder how this compares to Docker or Sandstorm where the existing kernel API is used in a virtualized container instead of emulating a whole machine.

Short answer is probably: Better at security, but less efficient.

Xen bugs notwithstanding, the VM boundary has a better security record than the kernel API boundary. (Though Sandstorm -- my project -- has been pretty successful at dodging kernel bugs through aggressive attack surface reduction.)

The problem with VMs is that they're heavy-weight. The hardware interfaces were never intended to be a clean abstraction boundary between software. For example, it's tricky to reclaim unused RAM back from a VM, because within the VM the guest would by default assume it has a fixed amount of RAM that is exclusive to it. Communication probably has to happen in the form of network packets, requiring setting up and traversing a whole IP network stack in the guest, which is a lot more expensive (in both CPU time and memory use) than pipes, unix sockets, or shared memory would be.

However, if you're designing a kernel specifically for use in a VM, you can probably do better on these things, e.g. you can define some virtual hardware interface by which the kernel marks RAM pages unused so that the host can take them back, or you can define a cleaner communication interface. But as you define these interfaces, your "virtual hardware API" ends up looking more and more like a kernel API -- and has greater risk of security bugs.

So one way (containers) you start out with a wide interface and high efficiency but a large attack surface, and you try to narrow the interface to improve security. The other way (VMs) you start with a narrow but inefficient interface which you try to widen to improve performance.

Eventually they will probably converge.

hanez · on Dec 16, 2015

What they want to do is to implement the concept of a unikernel (https://en.wikipedia.org/wiki/Unikernel). That's a very different approach then running a complete OS with their software stack on top. Take a look at the two links in my other comment.

alfanick · on Dec 16, 2015

I never been fun of Node.js (neither the hype of writting server-side in awful JS), so sometime ago I've written https://github.com/alfanick/rest-cpp - "tiny Rails-style web services in highly efficient C++" ;) Code has not been updating for some time, but it is near production-ready.

kentonv · on Dec 16, 2015

I like this idea, but I don't understand describing it as "node.js-style". I guess they think of Node as being a particularly ideal platform for implementing "microservices", but honestly I don't think there's any particular property of Node.js itself that makes it microservice-friendly. To me "Node.js" simply means "server-side Javascript", which obviously this isn't (it's C++).

rileymat1 · on Dec 16, 2015

ASP has had server-side "Javascript" for years. The magic sauce in Node.js is the combination of async, JS, and package management.

smithclay · on Dec 16, 2015

If you're interested in IncludeOS, runtime.js is also worth checking out: http://runtimejs.org/

ilaksh · on Dec 16, 2015

I thought web assembly was going to be the main part of Node++.

Anyway I would look into Nim or Go if you want a more efficient webserver.

jdc · on Dec 16, 2015

Anyone know how this compares to OSv?

BrainInAJar · on Dec 16, 2015

So... back to CGI programming then. Cool.

TheRubyist · on Dec 16, 2015

Just use Rust....