I feel like the long-term architectural implications of virtual machines and now containers haven't quite sunk in. I'm not talking about the administrative advantages, which I think everyone is across these days. I mean the implications for the design of new applications.
As far as I am aware, folk aren't really writing distributable applications that target the VM up. You can get preconfigured stacks, or you can get standalone apps that you install in your environment.
But nobody's said: "Hey, if we control the app design from the OS up, we can make it much more intelligent, robust and at the same time sweep away a lot of unnecessary inner platform nonsense".
In terms of the slides, my approach is to reduce the NxN matrix by eliminating a lot of the choices. Why write your blog engine to support 5 different web servers when you can select and bundle the web server? Repeat for other components.
It gets better. Why write a thin, poorly-featured database abstraction layer when you can take serious advantages of a particular database's features?
You can't do this if you write under old shared-hosting assumptions. You can do this if you target the VM or container as the unit of design and deployment.
A step forward was made by RoomKey. You should read what their CTO wrote. At RoomKey, they made several radical decisions that gave them a very unusual architecture:
--------------
Decision One: I put relational data on one side and “static", non-relational data on the other, with a big wall of verification process between them.
This led to Decision Two. Because the data set is small, we can “bake in" the entire content database into a version of our software. Yep, you read that right. We build our software with an embedded instance of Solr and we take the normalized, cleansed, non-relational database of hotel inventory, and jam that in as well, when we package up the application for deployment.
We earn several benefits from this unorthodox choice. First, we eliminate a significant point of failure - a mismatch between code and data. Any version of software is absolutely, positively known to work, even fetched off of disk years later, regardless of what godawful changes have been made to our content database in the meantime. Deployment and configuration management for differing environments becomes trivial.
Second, we achieve horizontal shared-nothing scalabilty in our user-facing layer. That’s kinda huge. Really huge.
But then, you get to take responsibility for that entire stack. This is a bad thing.
Remember PHP register globals? Rails 2.x? Perl 4? I'd bet a lot of lazy devs would still be using those if they could just wrap it all up into a container and say "run this!" That's what commercial products do. And they're much worse for security as a result.
Fundamentally, I'd say the solution is to automate testing and installation. Make it extremely easy for a dev to test app A against a matrix of language implementations B, C, D, databases E, F, G, and OS platforms H, I, J. Make it easy to make packages that install natively on each platform, with the built-in package management tools. FPM and similar help with this. Nearly every platform will allow you to create your own package repos. Better tools = better code = more flexibility = less ecosystem dependency
Containerization as a logical separation for security (ala chroot/jails before it) makes sense, but doing it so you can shove your whole OS fork in there and then fail to maintain it seems foolhardy and myopic.
What I notice about all your examples is that they are stack problems that an application could not, on its own, have fixed. In the current model, each part of the stack has an independent lifecycle, creating shear points and hidden security flaws.
If the application can control the whole stack, then the application author can fix it.
Automating test and install just puts you back where you started: with a gigantic test matrix that will impose non-trivial drag on the whole application's development.
And it's not necessary. It's just ... not. necessary.
> If the application can control the whole stack, then the application author can fix it.
You are right, but the other point is that it becomes the application authors responsibility to fix it.
If you're bundling apache httpd with your app, and there's a security flaw and a new version released, it becomes your responsibility to release a new version of your app with the new version of httpd.
If there are 1000 apps doing this, that's 1000 apps that need to release a new version. Instead of the current common situtation, where you just count on the OS-specific package manager to release a new version.
Dozens of copies of httpd floating around packaged in application-delivered VMs is dozens of different upgrades the owner needs to make, after dozens of different app authors provide new versions. (And what if one app author doesn't? Because they are slow or too busy or no longer around? And how does the owner keep track of which of these app-delivered VM's even needs an upgrade?)
You're describing what you see as the advantages of the shared hosting scenario and in the blog post I linked, I explain why I think that business will be progressively squeezed out by VPSes and SaaS.
In any case, there's no difference in kind between relying on an upstream app developer and an upstream distribution. You still need to trigger the updates.
And you might have noticed that stuff is left alone to bitrot anyhow.
I am not talking about an app that is distributed to be installed "on" OS X, BSD, Illumos etc.
I am talking about an app that is packaged to run "on" Xen, VMWare, or maybe docker (LXC) for some cases. Or zones for others. Or jails. Whatever.
The point is that you, the application designer, ask yourself, "what happens if I have total architectural discretion over everything from the virtual hardware up?"
But, rather than the panacea you envision, what I think would actually happens is you end up with a lot of people doing substandard OS release engineering jobs, neglecting security patches, etc.
Or...
Cargo culting around a small number of "thin OS distributions", which is substantially the same as what we have today.
Heck, "total architectural discretion over everything from the (virtual) hardware up" is pretty much the definition of an OS distribution. Am I missing the point here? Is there something about this other than the word "virtual" slapped on there that's unique from what we have now?
Consider that a lot of applications, when shipped in VMs or containers, needs very, very thin slivers of a full OS. Especially in things like an LXC container which can easily be set up to share a subset of the filesystem of the host.
E.g. many apps can throw away 90%+ of userland. So while they need to pay attention to security patches, the attack surface might already be substantially reduced.
And LXC can, if your app can handle it, execute single applications. There doesn't need to be a userland there at all other than your app.
Now, it brings its own challenges. But so does trusting users to set up their environments in anything remotely like a sane way.
> Is there something about this other than the word "virtual" slapped on there that's unique from what we have now?
Yes: virtual machines and VPS hosting make it possible to bypass shared hosting. That means you needn't write apps which have to aim for lowest common denominator.
Edit: I agree that the approach I'm advocating introduces new problems. But obviously I think that it's still better than the status quo, which is largely set by path dependency.
I think you just end up moving the work around. Not sure the current concentration of security at a few points (distros) has scaled. Most web application developers do not use a distro stack anyway for much. Most of the security issues in a distro apply to stuff you don't use, although it may be installed. Traditional Unix was a much more minimal thing.
> The application author can provide an updated container if there's a security problem.
Yes, but now it's their responsibility to stay on top of updates for the entire stack, and push out updated containers whenever any part of it changes. Whereas in the traditional model staying on top of updates to anything other than the application itself is the responsibility of the user or his/her sysadmin.
Not saying containerization of apps isn't a promising concept, just that it does require the app developer to take on a lot of additional responsibility.
But doesn't that ignore the move toward IaaS we are seeing? Where a customer is buying compute time, instead of access to install an app onto a managed OS?
It's getting to the point where very soon we will have complete clouds on demand - we pretty much already have them, but soon that will be a trivially selected level of granularity.
We can deploy OpenStack clusters extraordinarily easy now with Fuel. Having fully deployed app clouds is pretty much already here, if not just around the corner.
> "As far as I am aware, folk aren't really writing distributable applications that target the VM up. "
We are. You might want to check out Mirage [1] microkernels, which is an OS that targets the Xen Hypervisor. You write your application code and select the appropriate libs, then compile the whole thing into an 'appliance'. We have big plans for the kinds of system we want build using this and if you'd like to keep up with it, pls drop me an email. The devs are presenting a Developer Preview at OSCON so there'll be more activity soon.
Agreed, sort of. I actually referred to Mirage in my honours proposal and it was one of my inspirations.
I had in mind to use the facilities of an existing OS and tools rather than reinvention. My other major inspiration was OK Webserver (and through it, "Staged, Event-Driven Architecture").
I host blogs and one of my pain points is that slow plugins hold up rendering.
What if, for instance, rendering is a graph of pipelines, and there's known logic for failing to meet a rendering deadline? So if you have the blog page and the %#%^^ "Popular Posts" plugin is running slow again, it doesn't slow down the whole site. That <div> merely shows old content, or is excluded.
You can then use standard operating system facilities to ensure that, for example, the "posts" and "page" modules get top priority. Then "comments". Somewhere way down the list might be "complicated plugin that talks to five remote servers which crash half the time".
You can do some or all of this within a programming environment. For the common case, only some. But why not use operating system facilities? They're already there, they're battle-hardened, they enjoy universal coverage of the system and are closer to the metal, real or virtual.
Mirage is I think more of a programming-lanugage environment. Some of the facilities I'm point out exist. Some don't. I don't feel like writing all the missing bits from scratch when they're already available off-the-shelf.
I have been experimenting along these lines, by making a scripting interface to Linux that lets you do the basic stuff that you need to bring up a VM/container/hardware [1], This includes bringing up networks, configuring addresses and routing. There is a lot to do (still need iptables), but you end up with a script over Linux that configures it, using a scripting language and ffi, which you can compile into a single binary and eg run as your init process. Linux is a pretty decent API if you wrap some scripting around it rather than a lot of C libraries and shell scripts.
You do not of course end up with a standardised interface from this, as it is dependent on the Linux environment you are running in (although a VM can standardise this). So I am also experimenting with userspace OS components, like the NetBSD rump kernel [2] in the same framework.
Also helping here is the shift (gradually) from applications (like web servers) to libraries that you can link in to your application, or full scripting inside (like openresty for Nginx), which addresses your bundling issue. If you are building single function applications that are then structured into larger distributed applications this is much simpler.
Back when I was preparing to take an honours year, I wrote an circulated several project proposals. One was to explore the argument above with a constructive project -- writing a blog[1].
An example I gave for the inner platform effect was the Wordpress file wp-cron.php.
It gets called on every request made to Worpdress because WP has no other way to arrange for scheduled tasks to be carried out. So you get a performance hit and your scheduling relies on stochastic sampling. Oh, and it stops working very well when (as inevitably happens) you slather Wordpress with caching.
In an OS-up design, you just delegate this to cron.
Or plugins. In a standard current design, these can't be isolated. In an OS-up design, you can can make them standalone programs with separate accounts that can't reach into and interfere with the core code. No more broken sites from a PHP error in a hastily-installed plugin. Similarly, you can control their access into the database (instead of having a shared login that all code running in the application shares).
And so on, and so forth.
[1] I'm happy to forward copies of the proposal. My email is in my profile.
Sounds like Erlang. (You ship a copy of the Erlang VM emulator ("ERTS") with your Erlang application "release", and upgrade the release as a whole, rather than upgrading just the application code.)
Of course, you could argue that the whole design of Smalltalk with it's VM is already abstracting away the OS...
Indeed, some of the advantages from java on the server side is arguably that it has its own VM, so that you don't target the OS platform, but the java platform.
Everywhere I've worked, the developers have needed to figure out the dependencies requires to get the software working. Sometimes with the assistances of a dev-ops or ops guy.
All this does is say "while you figure that out, put it in a script". First, it means we can test the dependencies easily by re-running the script to see that it actually accurately reflects what needs to be done.
Secondly, when you're done, you have a reproducible deployment environment that massively simplifies ops and dev: Ops can decide on upgrades, re-run the scripts, have QA run their tests and know the upgrade won't break stuff in production. Dev can make code changes and be confident that what they hand off will actually work in the production environment because they've test deployed it on VMs built from identical templates.
As long as your team can figure out how to deploy the software they write, they can do this. If they can't, you have bigger problems.
Here's a rather contrived example but it illustrates the idea i'm getting at: "Will the inventory agent software be able to login to audit this container environment when they're done building their release?"
The point is : PaaS give you the advantages of shared hosting, but a good PaaS isolate properly all apps. It's definitively a good way to focus on dev and let a ops and constraint on trained teams.
As far as I am aware, folk aren't really writing distributable applications that target the VM up. You can get preconfigured stacks, or you can get standalone apps that you install in your environment.
But nobody's said: "Hey, if we control the app design from the OS up, we can make it much more intelligent, robust and at the same time sweep away a lot of unnecessary inner platform nonsense".
In terms of the slides, my approach is to reduce the NxN matrix by eliminating a lot of the choices. Why write your blog engine to support 5 different web servers when you can select and bundle the web server? Repeat for other components.
It gets better. Why write a thin, poorly-featured database abstraction layer when you can take serious advantages of a particular database's features?
You can't do this if you write under old shared-hosting assumptions. You can do this if you target the VM or container as the unit of design and deployment.
Yes, this is one of my bonnet-bees, since at least 2008: http://clubtroppo.com.au/2008/07/10/shared-hosting-is-doomed...