This is intended as a way to run computation close to data. Some databases embed Lua or pluggable languages for that; ZeroVM can run NaCL binaries (compiled with a special toolchain), verified in the same manner as the JVM checks bytecode, on a very limited sandbox (just some pre-configured data channels). Besides the NaCL verifications, they are enforcing functional programming: the program only has access to deterministic instructions and library calls.
It's people. I'm slightly blinded by their motivation. Something inside me wants to believe they're 14 year old mischievous scriptkiddies, but I bet they're really just guys and girls like you and me.
Is it right to equate functional programming and determinism though? Functional programming implies a certain programming style / type of language, whose output is guaranteed to be deterministic if only pure functions are used. But they're not the same thing.
ZeroVM invocation is a pure function.
ZeroVM cluster invocation is a pure function.
Functional programming according to wikipedia is - "a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data"
ZeroVM invocations satisfy that definition. With existing Swift integration all storage is immutable and ZeroVM cannot have any state by design.
Google Chrome has a sandboxed VM called Native Client (NaCl) that runs at near full speed. Its very neat.
So they have taken that same VM and, instead of Chrome's Pepper API, they have a file-handle-based API and some message-passing between instances.
Now you can compile your C/C++/whatever program and run it on the cloud!
It seems an excellent building-block for big data and big crunching on the cloud, and it gets the benefit of Google's massive resources on security and performance fixes.
I'm surprised that Google doesn't explicitly talk more about this use case for Native Client. It could be the backbone for an AWS/Heroku competitor. The ability to run lightweight tasklets securely would enable a lot of interesting scenarios. While not very significant, Google actually already started doing this with their Exacycle program:
He works on Go at Google. He updated an earlier version of Go to run on an earlier version of NaCL, but it's since bitrot as NaCL's formats were changing at the time.
It gives a bare-bones environment for you to run your programs that is presumably very low overhead. Think of it as an embedded system where programs run without an OS. This is the environment a program running inside zerovm will see. All you have is libc and the zerovm-provided APIs. If you want more, you'll have to statically link your programs.
The thing is, you can run many, lets say thousands, of these little programs inside a single machine in such a way that each one can never see the other ones (as long as it's impossible to break out of the ZeroVM sandbox).
Such a technology would enable neat stuff, like renting a server for someone to run a single program for some period of time and have the results sent back. Nobody does this for unrestricted programs today, for many reasons, a very important one being the fact that it would be very hard to do this in a secure way.
The "run a C program for some period of time thing" would work kind of like the AWS dashboard, but instead of having to spin up a machine with linux on it and running your program inside that, you would only upload your binary and a manifest file. Kind of like what app engine does, but with less restrictions (you'll probably be able to do anything as long as you're able to compile a "safe" binary that does it).
Such a technology would enable neat stuff, like renting a server for someone to run a single program for some period of time and have the results sent back. Nobody does this for unrestricted programs today, for many reasons, a very important one being the fact that it would be very hard to do this in a secure way.
Well, almost nobody. NearlyFreeSpeech[1] lets you compile and run unrestricted C and C++ programs on their servers, or any other binary, if you compile it somewhere else. I've done some tests with a Go based webservice.
They do have a very restricted time limit until they're killed, but that's because their servers are designed for web applications, not data crunching.
Thanks for the info. Seriously I should have said "almost nobody" in the first place. In fact I thought I had said that :)
I didn't know nfshost was doing it. From what I see, they're probably using FreeBSD jails in this case, which is nice if they are.
Anyways, we have to agree that this space is largely unexplored. I never thought there was a need for this kind of service, but just after reading the ZeroVM pages I think it's a very good idea. With a "little" more initial effort, it would enable writing systems in a very interesting way: self-healing (when the other end has failed, make an API call to provision another copy of it), self-provisioning (when traffic is high, make an API call to provision another copy of a worker), etc. Of course we can already do this already, it would just be more natural, and if you combine this with the idea of Mobile Agents, then the cloud suddenly becomes much "cloudier".
Actually as long as you set up your permission correctly, you can run any untrusted user-mode C code (non-privileged code). Just deny access to most of the system except the places you let it access. Chroot actually let you emulate the system directories for the process. Set up the firewall correctly to restrict its network access.
The multi-user environment in Unix is the very old idea to let untrusted codes and untrusted programmers run wild in the same machine.
I think in practice there have been a few historical problems with this approach. A major one is that while in theory it's possible to secure a process to a chroot jail (either the Linux or the BSD variety) in practice it's hard to defend against arbitrary, possibly malicious programs in that jail.
Kernel exploits that rely on local access are uncommon but not incredibly rare. You could easily have a hosting service like this that ran fine for a few months, or even a few years, and then a 0-day kernel exploit comes out (that relies on having local access to the machine) and then you're SOL.
Another issue is that in practice some system calls are very expensive. With a little knowledge you can DOS a machine hosting your process if you can run arbitrary system calls -- for instance, if you can create some sort of resource that is expensive for the kernel to track, and then create a lot of that resource. Sort of related to this, at least in Linux there are a very large number of syscalls and new ones are added relatively frequently, and not all syscalls have had the same amount of security auditing.
Even if you're not worried about your own servers getting hacked, you need to convince your customers that other users aren't going to be able to intercept or modify their data or attack their VMs.
In my mind, one of the main innovation of a lot of the VM stuff in recent years, and in particular the NaCL approach of limiting what system calls can be invoked, is that you are fundamentally reducing the interface by which malicious programs have to attack the hosting machine. If you can securely limit system calls to a small subset, you can more easily audit the paths those system calls make to ensure that they're secure and can't be used in a DOS scenario. In a virtualization environment like Xen or KVM, the amount of code that you have to audit is limited to the Xen/KVM code which is much smaller than the rest of the code base.
Thanks for that, I feel the same way for process isolation. In theory that should be a solution but in practice doesn't work for historic reasons.
Just one comment. ZEROVM IS NOT NACL. It uses NaCl, moreover we explicitly refrained to touch validator in order to remain under its proven security blanket (Google established hefty monetary prizes for each found exploit). However, except of validator it is heavely refactored and rewritten.
MAIN DIFFERENCE:
NaCl has "syscall firewalling" feature that is called Pepper. ZeroVM forbids all host syscalls. In fact ZeroVM is a new virtual hadware architectures (a subset of x86 and subset of ARM and new ones in future) so there is no such concept for code running inside as "host syscalls".
Thanks for the responses; however, most of the attacks to OS process can happen in ZeroVM.
- Kernel exploits can happen, so can exploits in ZeroVM.
- DOS attack on OS, so can DOS attack on ZeroVM. I don't see any quota system or resource management in ZeroVM to mitigate it. At least the OS have better tools to deal with it - priority, scheduler, resource manager, memory partition, etc.
- Same thing for the customer worry. They could worry about other users attacking the ZeroVM or the host system.
If you prefer security VM, there are a lot of mature VM's - JVM, .Net, Xen, or KVM.
My point is there have been so much research, work, and experience have gone into the security in OS that I would trust an OS isolation than a new research system in term of security. It's easy to have a lock-down OS to restrict access to most subsystems.
ZeroVM is not a lightweight container so all the above doesn't hold. ZeroVM doesn't use any "syscall firewalling" techniques. ZeroVM efficiently emulates new hardware platform just as XEN/KVM.
Now let me address your concerns one by one:
1. (Host) Kernel exploits cannot happen as no syscalls to host OS are allowed. ZeroVM app doesn't even have such a concept as host OS.
2. DoS attacks impossible on host OS as there are no access to host OS. The interface between ZeroVM application and ZeroVM itself is specifically designed to be impossible to DoS attack. It consists now of 4 functions. Setup and exit are callable only once during lifetime of app. Message queue read/write could be repeatedly called but the it intentionally designed to be synchronous, so throttling mechanism could be transparently implemented.
3. We intentionally haven't hacked NaCl validator as it the only component in the system that guaranties security. Google established 5-6 digit monetary prizes for any exploits in Chrome/NaCl and heavily invest in security. For many customers it is enough.
4. No one asserted that ZeroVM is mature and right now it is not! So this piece of advice is correct. If you need security now for production usage - KVM/XEN is the only way to go.
5. OS security... So much work was done to secure OS from outside... not from inside. I would appreciate description how it is easy to lock-down OS process in multi-tenant sense. Also when you start "syscall firewalling" and draconian restriction it would be very hard to program such as system. Just think for a moment. You take the whole syscall list, for every syscall you decide on restrictions (I haven't found any document on web on that). Now how you work with such API? How you enforce yourself before you issue a syscall, what to do when third-party code is causing violations? Syscall API is not built for such draconian capping....
1. So there are 100% guarantee that ZeroVM does not and will not have exploit? The OP point was kernel can have exploit so kernel is inferior than ZeroVM. My point kernel can exploits, so can ZeroVM. You just don't know yet.
2. DOS on ZeroVM indirectly DOS on the host. There are so many way to DOS a system. How do you handle an app running in a tight loop access all the memory randomly? Queuing the max payload in a tight loop? Spawn off new instances across the entire cluster in a tight loop? Claiming DOS can happen in kernel and not possible in ZVM is just naive.
1) there is $100K bounty on each Chrome/NaCl exploit and we have only one ZeroVM 'syscall' that we allow with a lot of attention put how to make it easily secure. The situation is not same on Linux. First of all kernel exploits by process are not really considered severe in Linux and for sure it is not top priority to anyone. Linux built to be secure from outside not from inside.
2) All these is impossible in ZeroVM except accessing memory randomly and thrashing caches and TLB tables. Hm... that could work, I guess. For the first time in this forum we talk about real vulnerability. However, I think the problem exists also in KVM/XEN (will do a proper research now, Googling EC2 TLB thrashing doesn't yield anything interesting), no access to other tenant data just temporarily slowing down specific processor chip.
It is a layered approach, first you wold have to find an exploit in the "VM" (which is a sandbox really) thn exploit the underlying OS. The VM has a much smaller attack surface as you have less you can do, so it is easier to audit. NaCl, which is used here has had minor flaws http://arstechnica.com/open-source/news/2009/07/google-nacl-... but nothing like straight kernel. Sure there are other approaches eg see http://sandboxing.org/ eg to use selinux to constrain processes, but none are easy. There is some more recent work on more directly limiting syscalls to processes which is another approach, so the OS provides an isolation service.
1. Filtering all resources accesses, letting some pass and others denied.
2. Enforcing different abstraction and then unwanted resources accesses become impossible as they are not even addressable.
Filtering is by definition less secure. As filtering get more complicated there would be false negatives and false positives. Both are harmful.
Enforcing different abstraction is usually less efficient as there is a need to simulate hardware devices. However, some devices have hardware support for virtualization as with Intel CPUs and MR-IOV devices and then enforcing abstraction is free.
True, even the JVM verifier was flawed, maybe neither approach can succeed. It just seems to me we now have good reason to believe it's a dead end to try to sandbox native code in legacy instruction sets.
Every system might have weaknesses. What matters is:
1. Small surface for attack. With NaCl it is all concentrated in single tiny validator module. The model is also simple and mathematically proven to be secure.
2. Prior testing. It is especially hard for security product. Establishing motivating prizes is good way to ensure it is not easily breakable.
3. The speed with which patch is made available
4. Defense-in-depth, ability to have multiple levels of defense cheaply.
I'm indeed not a native English speaker. And I apologies for not having diagrams, in fact I was working on this part right now... And planned HN exposure little later after more diagrams are added to website, VirtualBox image ready with whole environment to save time for folks who want to experiment and a well tested initial version separated from development branch.
However it went out earlier, uncontrollably, while I'm travelling... and I will play now a catch-game...
Well, sure, but there's no high level overview. Even something like "It's a virtualizing solution, like Xen/VMware/VirtualBox, that uses Chrome's Native Client".
tl;dr It's basically a lightweight sandbox to let you run untrusted code from un-managed languages (i.e. C/C++/Assembly/etc) in the native host format (i.e. in Linux i686 or Win32 or iOS).
My comment: In other quarters, people usually use OS process plus permission to isolate the running of a untrusted program. This is basically what ZeroVM is, a more restricted process.
One major downside of this approach is that you have to compiled to different host formats to deploy to different host environments. You don't have the write once and run everywhere advantage of Java/C#/Python.
1. ZeroVM abstraction is C including full support for native inline assembly/intristics. Make your own conclusions about portability with regards to platform. OS portability could/will be provided and it is piece-of-cake (for geeky overoptimism of course)
2. Don't want to go into JIT / C flamewar... but let's take a sorting example. In samples directory you can find my hand-coded assembly code (thank you Takeshi Yamamura for the help) for sorting. GB is sorted within seconds. I am sure no JIT comes even close and I tried it hard. I my self SCEA and Javaist and JIT/LLVM enthusiast. If you need my help with compiling the sample let me know I'll gladly help... I know it is not ready for the prime time, very very cryptic badly commented too... which should not be for a sample. So you are warned. But it I never ever saw in my life a single threaded sorting performance that comes even close. Let me know if I mistaken.
3. Why not just use process isolation.... Well this is right way to go and zerovm would be a solution looking for a problem if and only if OS process would be secure in multi-tenant sense. However, there is no such OS. The only way is to use KVM/XEN and this is just too heavy... Another issue: why I use NaCl and not native hardware support for memory protection? The answer is two fold:
a) I was afraid of the amount of work needed to write my own complete OS from scratch (if I use existing stuff it would have most of shortcomings I am trying to solve in the first place)
b) It would be useless in its initial years if not decades as there are would be not enough applications for it to justify complete hardware dedication for zerovm. I thought about writing it as lightweight real OS running under some quickly spawned KVM processes, but still was afraid I and my team just had no spare engineering capacity for that and missing knowledge also that would be fun to acquire but just could take too long and even then it would be heavily linux-specific.
I don't understand why multi-tenant is problem. Multi-user Unix (or others) have supported allowing multiple untrusted users/programs to run for a long time.
Edit: I don't mean to be a downer on your project. It's a cool project, but I think you would prefer constructive criticism than me just saying "awesome, carry on."
The answer is that multi-tenant is different from multi-user on so many aspects.
Multi-user:
1. Usually closed institution or even single group within single institution where all know each other.
2. Usually a lot of shared data and application.
3. The risk of malicious activity is low.
4. The consequences of malicious activity are not severe/fatal.
5. Auditing followed by identification and penal actions are an effective strategy to stop malicious activity.
Multi-tenant:
1. Usually open to anyone without "background checks".
2. Single instance of malicious activity is severe, repeated malicious activity is fatal.
3. Auditing, identifying the intruder and suing him in court is not an effective strategy to stop malicious activities.
What I was talking about is the multi-user as a capability to run untrusted code. You were talking about it as a policy and procedure. The multi-user capability allows one to use permission to create a lock down, low privilege area to let anyone to run in it. There are open Unix accounts out there that allow anyone to log in to play around with it for a long time. The multi-tenant problem claim is just not a valid one.
I read this as RPC with code instead of just data. If so, this is exactly what I've been looking for a long time, because traditional RPC roundtrip latency is often high - so high, that you need to create a more complicated API to avoid excess iteration.
Combine this with ZeroMQ and MessagePack, and you have some serious power at your fingertips.
Messages can execute at destination, do iteration, API calls and return only needed part of the data and results back.
I would expect about order of magnitude speed difference between interpreted language and optimized machine code. I can recall case when reducing analytical request from 10 hours to 10 minutes changes the qualities of research company was doing - since analysts where able to do more queries selecting better dataset for the report.
Order of magnitude response time might also be go/no go for interactive analytics.
In case of clouds where we can assume infinite resources it can mean 1/10 of cost.
For the private clusters - it can be differenace in buying 10 machines (something common in hadoop's word) and buying 100 machines - something very few groups can get.
In most cases within distributed computing you need either move the data or move the code. It could also be something in between with the case of "request" and "query", if they simple they are surely data, if they are complex they look more like code.
Ok, now if we agree that something should be moved lets think what makes more sense to move dataset or to move code?
And my take is "it depends". Dataset size plays a role... security plays a role and etc.. think what if dataset for another reasons than size cannot be moved. In all these cases you need to move code.
In the case the code is untrusted (malicious suspect or just being buggy) you will want some kind of sandbox.
ZeroVM uses both. Well, not now, but one contributor took the clustering thing on himself and implementing it right now with zeromq while another is developing a client library with collection of serdeses verified to compile and work on zerovm.
We decided on micro-kernel design with zerovm so all data-crunching libraries will go into untrusted code part.
I'm slightly confused as to if this has any use beyond "on-demand data access" use cases. I reviewed the site and I like the idea but I'm confused as to what else this could be used for. Would something like a "Heroku" style PaaS using PyPy or something that targets NaCL benefit from this is a process separation sense? Anybody care to clarify?
Imagine that everyone in the office could be running your tests, compiles and such easily and transparently.
There are lots of grids to do this now; but this seems like a new lighter-weight and faster solution that, as it'll run well on Windows too (in that Chrome proves it does), is going to be great.
Imagine if you could scale up and down the amount of computing capacity you were paying for on a second by second (or even ms by ms) basis! That would be pretty cool :)
This is very interesting! It will be even more interesting to see if it can be compiled to run on my OS. This is exatly the kind of application I have been looking for. ESXi is too big :)
I can assure you it can easily be ported and used on any OS. We just a few guys right now and don't have the capacity to test it on anything but Ubuntu. However, we are designed it to portable (and NaCl/Chrome code is also portable which helps a lot). Even to run on bare-hardware. So tried to keep OS usage to minimum. In fact, porting would be a more extensive effort to architectures not naively supported by Nacl. For example zerovm on tilera-linux (MIPS variant) will be much more effort then FreeBSD on x86-linux.
As a side note, I personally convinced that today OSes are an overkill for cloud-based number crunching (the prime case for zerovm) wasting resources. I am looking forward for future a lot lighter 'cloudware'. Think 'opencompute' approach for OSes. zerovm is being a humble experiment here.
OpenMirage project is a similar-ish idea, providing numerous implementations of a std-lib for different targets (sample targets: Android, Linux OS, raw x86), and using this limited API. Sure, ZeroVM has it's own "vm instructions" rather than "library calls," but the ideas both reduce to building virtual machines for great glory and profit.
what is less known is that when deployed at cloud, Hadoop cannot access that enormous dataset locally due to security restrictions and therefore is screamingly inefficient compared to on-premise Hadoop deployment.[1]
If you use EMR or just roll your own Hadoop in EC2 then:
1. Hadoop runs on EC2
2. Data is stored on S3
3. Intermediary results stored in EC2
4. Hadoop loads the data from S3 to EC2
5. EC2<->S3 bandwidth is not that fast or efficient (S3 proxy, network contention, TCP/IP processing)
Hypothetical MapReduce/ZeroVM/Swift scenario:
1. Data is stored on S3/Swift
2. Map and Reduce functions are run inside S3/Swift secured by ZeroVM in majority of cases accessing data locally without networking/proxies getting in the way.
3. Intermediate and final results are also stored within S3/Swift.
4. Local data access is efficient, fast and predictable
5. Local networking within S3/Swift is more efficient, fast and predictable than S3<->EC2 / Swift<->Nova
Accelerated Hadoop scenario:
Exactly as in #1, just Hadoop makes "predicate pushdown optimization" into S3/Swift secured by ZeroVM.
Regarding 'due to security restrictions' I meant that cloud vendor would not let you run your own code in S3 or CloudFiles. Why? Because you could mess up other people data and storage system itself. Why not run in VM inside S3? well I guess it would be impractical due to long provisioning time of conventional VM.
That criticism is specific to S3, not EC2 or Hadoop. It's perfectly feasible and probably preferable to have Hadoop work on local files in instance store volumes (or EBS if you're mad).
There is other issue with running hadoop on EC2 (w/o S3). Instance storage is relatively small - about 3.6 TB on largest instance and 1.5 TB on other "large" instances. In typical Hadoop machine I would expect about 8TB. So local storage is prohibitively expensive for the big data tasks.
In the same time - if we use local storage we a loosing elasticity - we have to run cluster all the time, even there is no jobs to run. It kills main point of using hadoop in the cloud - to pay for the computational resources on demand.
but instance store is transient! You may argue if you do triple replicated in different availability zones then you are ok. Well, in this case it would be very costly as you will end up with constantly spinning EC2 cluster. Even if you don't do any computation you must keep it all spinning. And see what happened to elasticity... you end up paying inflated cloud prices for constantly spinning fixed size EC2 cluster! Instead of being able to rapidly roll out large cluster, make the computation and fold it back and pay only for what you have used - isn't it the true promise of cloud?
you got (2) wrong, the reason is running code on S3 is insecure and would not be allowed and spawning VM inside S3 for some local calculation would be cumbersome at best as it is too bulky for such acrobatics.
Well... this echoes ZeroVM ideas but so is PiCloud and a few others mentioned here. I think it goes without doubt that current OSes and VMs are not best suited for cloud technologies. How they can be? The were designed to completely different requirements.
I assume by VM you mean ZeroVM. Well, ZeroVM currently doesn't allow self modifying code at all. Nice try.... We haven't touched Google's provided validator in order not to break anything security-related. And if you think you have a good idea for vulnerability then you can claim some Google prizes.
If you meant more practical uses for it then unfortunately modern JIT would be difficult to support efficiently as they constantly recompile and with ZeroVM it is not only recompilation but also validation. However, JIT that recompile only once, on loading, is easy to support. In fact, next version of NaCl dumps GNU toolchain in favor for JITy LLVM, but then recompilation is happening only once.
I love how many high level technologies are left to mature in the java environment and then are reintroduced on a level that's a lot closer to the kernel and the metal.
ZeroMQ is another great example of the progression that started with AMQP.
Hadoop hopefully will see a similar fate. ZeroMQ combined with ZeroVM actually offers two important building blocks.
Containers/jails draw the isolation boundary around your processes, whilst this technology confines your code within a singe process and completely isolates it from the system.
Further from being a different abstraction, container technologies (at-least in their current implementations of 'chroot on nukes') are not completely sealed or 'secure'. OpenVZ seems to be the most secured one over-there, requires kernel-patching and still... close but not 100% airtight. That is one of the reasons that many lightweight containers are used only as secondary sandbox (like Heroku) and not allowing you to run arbitrary C/assembly inside your environment. So, practically, LXC always ends up as secure-python-environment or ruby-environment as so on... never as secure x86 execution environment.
dotCloud (http://dotcloud.com) supports arbitrary code execution inside LXC containers (pre-2010 versions used OpenVZ, and very early versions were built on V-server). The main limitation is that the process runs under an unprivileged uid under a kernel managed and deployed by dotCloud.
I agree with the assessment that containers are not "completely secure" - I would not trust it to contain a root-privileged process. However an unprivileged process running inside an lxc container on a recent kernel will have an extremely hard time escaping.
What if I DoS attack some syscall? Or create zillions of files with 1 byte size driving crazy file-system or anything else.
Kernel is such vast area vulnerable for an attack that it is scary even to think about securing all of it and not leaving a single weak point. Moreover, you will screw your syscall API to the point that it will become unusable. At bare least we need standard for the syscall capping and etc... so programmer will know what to expect.
And thanks for the link, will check them and what solution they use and whether they are happy with it.