The original headline is preserved, and clarified by the editorial clarification in square brackets. It would be great for HN to adopt this as a solution to the modified headline problem, with the provisio that editorial comment must only be used for the purposes of clarification.
* The ZeroVM site makes a big deal about application execution being completely deterministic. How does this interact with applications that require random numbers, such as crypto?
* Is ZeroVM capable of running unmodified Linux binaries? If not, what compiler toolchain is required to get it working? The main advantage of other lightweight virtualization solutions (OpenVZ, LXC) is that it's very easy to take regular binaries (e.g. postgresql) and drop them in a sandbox with minimal fuss.
- It is deterministic based on the inputs. You would need to pass in a seed or read from an external source of randomness to get different values out of a PNRG.
- Binaries need to be recompiled. There are two toolchains, a GCC-based one and an LLVM-based one. We can also compile within the ZeroVM container itself.
We expect that a lot of people will use existing language runtimes (Python, Lua, JS) To avoid compilation.
Over the long term, though, a lot of the power comes from composability. Think Unix pipes, in parallel, across the cloud.
Does the hypervisor use multiple communicating CPUs? If so, how do the races inherent in concurrency not destroy the determinism? Is this a single CPU/thread/fiber hypervisor?
That's not true if the processes communicate, or contain communicating threads. In one run, an input queue looks like {A, B}. In another, {B, A}. The source of the non-determinism is just entropy bubbling up from the hardware.
EDIT: using completely synchronous I/O (mentioned below) is a very clever solution, but it requires a process to know its inputs ahead of time. This may also cause cluster scalability issues, as now each "round" of inputs is gated by the slowest of the source processes.
All reads from other sessions are blocking. There is no input queue, zerovm processes read and write directly to each other. This way determinism can be preserved even for clusters.
Assuming A and B are produced from separate input processes, they have to be submitted to a "gather" process that takes both of them. If the "gather" process uses select() or non-blocking I/O -- says basically "read from A or B, whichever becomes available first" -- you'll get them gathered in a nondeterminstic order. OTOH if the "gather" process uses synchronous blocking I/O -- "Read one message from A, then read one message from B" -- then you always get (A, B) order.
If the framework you're using requires all I/O to be synchronous, and there's no way for a program to tell time or tell when an action would cause a delay, then there's no way for nondeterminism to develop based on timing.
I don't have any idea if ZeroVM is like this, and a framework that only allows synchronous I/O would have its own problems (basically you'd have to worry a lot about deadlock).
EDIT: To expand on this, you might still be able to do a lot even if cyclic interprocess data flows are forbidden. This is particularly true of database style applications, which are where ZeroVM originated.
With the potential of starting a ZeroVM in 5ms, and running it for a very short amount of time, do you see yourselves starting to charge in smaller time increments (ms)?
I work a lot with smallish data, where queries/processing may take 20 minutes or a few hours. Often I can split this work up significantly, but when most places charge by the hour it's rarely worth it. PiCloud are excellent in this area, and Manta looks interesting (but possibly a bit more expensive) but I'm not aware of many others. I love the idea of being able to start, with little overhead, a large number of short lived jobs. Particularly if I can run them locally or my own cluster.
The platform itself produces accounting data with 1 ms accuracy. How to do accounting in a real world public service is another question. But we aim for a very short-lived (10 seconds) but widely horizontally spread (1000 machines) workloads.
Thanks for the reply, this sounds like just the kind of thing I'm interested in, and I'm happy to see more competition in the field of renting machines for short periods. I look forward to seeing this released and giving it a go.
Lua, Python and C/C++ are available right now.
C/C++ is compiled in run-time with LLVM.
Porting an interpreter requires a minimal effort, although porting some of the interpreter libraries can be not that easy.
I have compiled PHP just for fun, and it was working "out of the box".
If you have a legacy application that demands threads we can emulate threads by essentially using coroutine approach.
http://en.wikipedia.org/wiki/Coroutine
What would be the impact of such a coroutine emulation if the threading is used to leverage multi-core hardware for high performance computing such as done by Atlas [1], OpenBLAS [2] or MKL [3]? These libraries are tuned to maximize CPU cache hits. It seems to me that executing each thread task sequentially using coroutines would probably break such optimizations.
Perhaps a bit off-topic, but I would like to get to the point where I can make intelligent comparisons between technologies like CoreOS and ZeroVM, and in general better understanding of containerization, virtualization etc. Can someone suggest a list of books that can get me started on that path?
CoreOS and ZeroVM are so young that there's not really much literature about them yet. That said, VMs and containers have been around for decades.
Lots of the recent activity is more about packaging and usability improvements, rather than theoretical improvements.
A quick overview:
CoreOS is a super-minimal Linux distribution designed to be used as a base for applications. It's essentially equivalent to the JEOS buzzword from five years ago. It would run inside of Xen or KVM or VMWare.
Xen, KVM, VMWare, Virtualbox, you probably already know about- they provide a virtual machine, the operating system running inside of it (theoretically) can't tell it's not on its own hardware. Xen uses a 'hypervisor', which is essentially a very tiny, custom kernel. KVM uses the Linux kernel as the hypervisor, which makes a lot of sense- you don't have to reimplement all the years of hardware support and scheduling work they've done. VMWare and Virtualbox run as applications on whatever OS you provide. You lose out on some opportunities for clever performance hacks this way, but there are other advantages. VMWare ESX[i] is more like Xen & KVM, but I don't really know that much about it.
Containers (BSD jails, Solaris zones, LXC, and of course, HN's lovechild Docker) let you provide VM-like isolation and resource management between processes or groups of processes, but you only run one kernel. This means much less duplicated effort and memory, and Docker's AUFS lets you deduplicate your storage too. There are slightly more security concerns about this approach than full VMs, the Linux kernel (and others, but let's be honest about the target audience) has a long and ugly history of local privilege escalations.
ZeroVM is based on Google Chrome's NaCL, and that's about all I know about it. I would expect VM-like security (it validates machine code), and an environment that requires serious porting from POSIX. That said, if you use Python, Ruby, Mono-compatible .NET, or Go, the heavy lifting has already been done for you.
ESX and ESXi are bare metal hypervisors, unlike KVM which is kind of like a quasi-bare metal hypervisor which happens to sit in the Linux kernel. ESX has been EOL'd, however it relied on something called the "Console OS" to bootstrap itself until the vmkernel would take over and start scheduling tasks. The Console OS was actually a modified Red Hat Advanced Server (later Enterprise Server) instance which once the system was booted would act as a kind of "privileged guest". You could log in to it and do sysadmin-y tasks like add users, install RPMs etc.
ESXi, on the other hand, was written to do away with the Console OS entirely, but it still has a fairly rudimentary shell. Many of the utilities are based on busybox and the idea is it should be stripped down with only really minimal functionality. It also sported something called the Direct Connect UI (DCUI) which is a curses based interface for doing things like settings up an admin password, reviewing logs and changing security settings.
As far as I read it, you are wrong about CoreOS. It's meant to be run as a host OS, not as a guest. It provides a minimal Linux Hypervisor you can use to run containers built for docker.
Yes, CoreOS wants to be a host OS, but it's also a ripping good guest OS, because of its minimalism. I'm expecting most people to basically stick their app and nothing else (like, say, a full ubuntu environment) inside of the container.
Hypervisor is the wrong word- that's what you'd call Xen, VMWare ESX or the host KVM kernel.
That said, if you use [...] or Go, the heavy lifting has already been done for you.
Not really. Go used to have a nacl port, but that was years ago. It was abandoned when the nacl people decided to use a different method for isolating code.
Porting Go to use the new method would require writing another compiler, like {5,6,8,}{c,g}.
For some reason I thought ZeroVM is based on vx32, not NaCL:
http://pdos.csail.mit.edu/~baford/vm/
Does anyone know if there is something similar based on vx32?
Based on "Why ZeroVM?" (http://zerovm.org/wiki/Why_ZeroVM), a large part of the motivation for ZeroVM is based on the premise that regular VMs require a full OS and are therefore unacceptably fat. However, there are multiple platforms for running unmodified applications directly on VMs without requiring a traditional OS, e.g. the work I've been involved with: https://github.com/anttikantee/rumpuser-xen/
Determinism, OTOH, sounds interesting at least on paper. Is there any experience from tests with real applications in real world scenarios?
LXC starts as a general purpose Linux container with everything built in, and adds more isolation as development continues
ZeroVM starts with no general purpose Linux, and will add support as development continues
ie, LXC will work with what you have now, ZeroVM will eventually work with what you have now, but shims will have to be developed for everything, either in your code, or in ZeroVm's
IMO the future endpoint will have similar functionalty in both projects, but LXC will see more testing and use /now/
But, when you say tantalizing things like 'erlang-on-c', you raise the question: what does the clustering control plane look like?
One of the great things about erlang is that the cluster's got supervisors that receive execution-level messages (e.g. 'EXIT') and can then take whatever action they feel like. Is that control plane level exposed to ordinary containers?
And the other great thing about erlang is that the messaging model is either synchronous if you care (with return receipts) or asynchronous if you don't (fire and forget) -- and that richness turns out to have a bunch of good use cases. What's the ZeroVM story there?
And the other great thing about erlang is being able to trace out messages, especially when your synchronous architecture just took a dump on the sheets and is staring at you belligerently. Does ZeroVM have introspection figured out yet?
Thanks for the response. It'd be great if the wiki were fleshed out with an overview of how that works and for the other questions to be addressed as well, for those of us examining it from other backgrounds.
I think for stuff like your Rails app, you'd want to wait until there is support lower level in your stack.
But imagine that you're writing a commenting system, and want to sanitise stuff received from a user. Sanitising data is error prone, so you isolate the code in a new zerovm. If someone finds a way to exploit anything in your sanitising code, they might be able to write broken sanitised HTML out, but they won't be able to e.g. send queries to your database, or write to your disk, because the zerovm simply doesn't have permission.
And imagine the web server spawning a new zerovm for every request, that only has permission to talk to the inbound network connection and pass messages between that and a Rails zerovm for that request. If there's an exploit in the HTTP parser, that vm could be exploited, but it'd die at the end of the request, and would have no permissions to talk to the database server or write to disk.
And imagine the Rails zerovm similarly being split into pieces: Request handling might be done in one; authentication might be done in one.
The lower the startup costs, the more you can afford to chop the app into pieces and the more you can leverage that to benefit in terms of security (by reducing the privileges of each individual component) and scalability (by allowing distribution of the VMs across CPUs and across servers)
How would ZeroVM instances talk to any server with persistent storage (e.g. a key/value store) in a deterministic way? get('top_stories') will change over time.
Unlike ZeroVM, Docker is not a security solution, it is only useful for managing administrative domains within a machine. (To preempt a massively pointless, ~20 year old conversation, Google "chroot security" and "jail security" and suchlike to understand why). On the other hand ZeroVM starts with statically verifying any code that executes adheres to a fixed protocol, and that protocol only allows invoking a small set of rigorously defined service stubs.
This may sound vaguely similar to how Linux containers and the syscall interface work, but it involves orders of magnitude fewer LOC written from the outset with a robust security design in mind. Compare that to the thousands of LOC daily churn in the Linux kernel, often written by people who are usually too busy fighting with shitty hardware to care about how their driver ioctl might be accidentally exposed to UID 0 running in a container, and even if they notice, might not even care.
It looks like hardened *nix process. It has no access to anything it's not permitted to access. And it has no notion of network, time or machine it's running on, although it can communicate with other instances (even on remote machines) via ipc. It can be suspended, resumed, relocated and so on without it ever noticing.
Thank you. When you instantiate a zerovm instance you give it the associated code as well? And which IPC method can it use? Is zerovm the library you use, is there such a thing as a separate zerovm instance, or is it just the way we are used to talking about virtualization?
You get nothing but /dev/stdin, /dev/stdout, and /dev/stderr by default. You can optionally make other resources (network, files) available, through a similar api.
When we instantiate we give zerovm an executable image (a file) and any other files this executable will need (can be arranged in a sort of "VM image" which is a regular tar file).
Sessions (instances) can communicate by unix pipes.
Yes we have notion of "instance" it is a running zerovm process. Each session runs in a separate process.
I cannot see what this has to do with security. At the end of the day, it is the data that attackers are after and the app needs to be able to access it whether it is virtualised or not.
Each part of your application needs access to some sub-part of your data, but if you isolate your app at the OS level and run a whole app server inside a VM, every part of your application can at least in theory access all your data.
If you sub-divide your app in separate zerovms, whether per-request, or split it up further into functional responsibilities, then you substantially reduce the attack surface by ensuring that an exploit against any one part of your application can only exploit the specific subsets of data it is allowed to work on.
You can do this without zerovm too, but the more you reduce the cost and difficulty of spawning a new vm or container, the more finely grained you can subdivide your application, and hence the fewer privileges each subset of your app will have.
This is wishful thinking at the moment. I understand perfectly well what that means but data is data. An application typically has access to all data and the fact that you run it through a VM doesn't change anything.
I can find this technology useful only in areas where you want untrusted 3rd-party code to run without worrying about what it will do.
Typically - yes, but we can change that. Application does not need to access all data, it happens because today any web application serves millions of requests and thousands of users, it needs to access all the data of all users at any time. When using ZeroVM you can serve one user and one request by one VM instance. Then you may explicitly define what data is accessible to that one VM instance. Yes, you can implement such controls in your application yourself, but we are just doing it for you, uniformly, on infrastructure level.
And about "3rd party code". If we have two developers, each works on a different module of the same application, isn't their code is "3rd party" to each other?
It is only "wishful thinking" is as much as people are usually lazy because the effort required to sandbox small pieces of code is prohibitive in most current platforms. But larger systems are already often layered in ways that layer access to data anyway (though often not intentionally for security).
Good luck with this approach. I am not saying you should throw away this - awesome technology btw - but rather impractical in many ways. You will find very little use-cases where you can apply this with direct benefit. In most cases this wont change a thing - only perhaps on highly specialised software.
Probably because it reduces the attack surface, instead of having to worry about a 0-day in ssh or similar, you can just worry about your application. Also if someone else on the machine gets compromised, I assume you are isolated from that also since the attacker is still contained inside that container.
Not only that. Each request is isolated in its own container. This way one user of your application cannot gain access to data of another user by simply exploiting an application bug.
That's assuming you lock down access at the level of the application instance. If you create an instance with permission to read all data, it doesn't matter that it's been created by a single request… it's still got permission to read everything. Not saying a single request instance isn't a win for security, but you'll have to build your app around this concept to get that win.
To make application truly multi-tenant you will need to adopt "share nothing" concept anyway. We just supply you with a "share nothing" infrastructure.
If you have to run a single database for multi-tenant application you're in for some real pain. For example: how will you shard it? How will you load-balance it?
ZeroVM approach to cloud is that "cloud is the database". ZeroVM sessions have transactional qualities: deterministic, isolated, can be rolled back, etc. Essentially we integrate distributed storage with "stored procedures" and "triggers", this is what ZeroVM cloud looks like.
I've never understood people's fascination with replacing glibc. There's almost never anything to gain- and glibc has the advantage of being really, really well tested.
"I've never understood people's fascination about SpaceX. There's almost nothing to gain- and Russian Proton has the advantage of being really, really well tested."
On a serious note: the primary disadvantage of glibc is that it's really, really hard to change (and build times are slow). While it's already here, sometimes you want to port it to a new platform or a new ABI, and the adventure begins.
The original headline is preserved, and clarified by the editorial clarification in square brackets. It would be great for HN to adopt this as a solution to the modified headline problem, with the provisio that editorial comment must only be used for the purposes of clarification.