CoreOS (YC S13) Raises $12M to Bring Kubernetes to the Enterprise

raspasov · on April 6, 2015

Maybe it's just me, but I fail to see how all those project make the developer/devops person's life less complex. I don't have the experience to have managed a cluster of 1000s of machines but I HAVE managed a cluster of ~50 nodes with MySQL, ElasticSearch, RabbitMQ, ZeroMQ, PHP, Clojure and the whole gang at various points in time. I am yet to see a single project (Mesos, Kubernetes, Docker, whatever) that would DRAMATICALLY make my life so much better. Not trying to be overly negative here, just looking for answers and better solutions.

EDIT I also continuously wonder how do things like Ansible fit into the picture. Are they competitors? Are they supplemental to the likes of Mesos/Kubernetes? Are they orthogonal?

brendandburns · on April 6, 2015

Disclaimer: lead engineer on kubernetes here...

It really depends on how you do deployment. Containers provide deployment (and more important rollback) that is better than other deployment tools like Puppet/Chef/... because they are atomic (they either work, or they fail, they don't get stuck in the middle) and they package up all of their dependencies within them, so that they don't have the "well it worked on my machine" problems.

Systems like Mesos and Kuberenetes, decouple applications from the individual machines (and the operating system on those machines), and are online systems with self-healing properties (so that they will fix themselves rather than waking you up in the middle of the night)

k8s and mesos turn containers into an API that spans an entire fleet of machines, and enables you to dynamically use (and re-use) the fleet of machines for multiple different applications. No more dedicated boxes for mysql, mongo, etc. This in turn enables you to have an easier ops experience, because every single machine in your fleet is homogenous (same OS, same patches, etc) OS management is abstracted away from Application management, so that they don't interfere with each other. Since things in the API are expressed in terms of applications, it's easy to add health checks and automatic restart to the system, and provide self-healing properties as well. Both kubernetes and Mesos also make replication a first-order primitive so that it is easy to scale in response to load.

Ansible is sort of orthogonal to systems like Kubernetes and Mesos. Kubernetes and Mesos are designed to be online, self-repairing systems. Ansible is a way to easily execute commands on a bunch of machines. I can see collaborative use cases, where you generally use Kubernetes for deployments, but use Ansible for querying some data while debugging, or somesuch.

Anyway, sorry for the extended response. There actually is way more that I could say about the topic ;)

raspasov · on April 7, 2015

Thanks for the detailed response. The point about atomic deploys is a good one. However, if we all agree that immutable deployments are a good thing, I've been wondering how is that fundamentally different from launching new instances via AWS/Rackspace/Google API? Is there really a fundamental difference between shipping a Docker container to a set of servers, or just relaunching/rebuilding servers?

You also mention the "no dedicated Mysql, Mongo boxes" etc. I am 100% for that. However, how can you really make that work with databases/systems like MySQL which were fundamentally designed to work on one machine, and scaling them across machines is usually very painful, or at least, let's say, not very "idiomatic" (if I can use that word here lol). I can see the auto scaling part working with distributed databases like Riak/Cassandra, but even there the solution is not clear-cut and "out of the box". It still feels like some manual orchestration work is needed - correct me if I'm wrong.

I can totally see the "online, self-repairing point" but only for application servers that were designed from the scratch to be easily scalable by just adding servers. Which is the case for most scripting languages PHP/Ruby/Python et al and for well designed JVM/CLR/native/Go applications as well. However, I would argue scaling the app boxes/containers is the EASY part. Again, you can always scale up by "just" copying machines (I know, it's always harder that that). The hard part comes with managing the database servers or your cluster of messaging queues, or some other stateful thing that has to persist data SAFELY. Does Kubernetes/Docker REALLY make my life easier with those kinds of things? Is the answer to use DynamoDB/BigTable/RDS/managed queues and forget about that hassle of managing a database or a queueing cluster? Looking for answers :) . Thanks!

m_mueller · on April 7, 2015

> I've been wondering how is that fundamentally different from launching new instances via AWS/Rackspace/Google API? Is there really a fundamental difference between shipping a Docker container to a set of servers, or just relaunching/rebuilding servers?

There's two perspectives to this:

* From a user's perspective, ignoring performance implications, containers should be the same as VMs.

* From a hardware perspective however, containers are much more lightweight in terms of CPU, Ram and Disk space, since they all share at least the kernel.

Why should you care? Well if something is less resource consuming, it means that (a) you could run it for less than what you pay for VMs or (b) it can be sold to you for less. There's some additional benefits like fast "boots", strict decoupling of persistant and non-persistant storage (which I find an advantage, restarting a container cleans up whatever you don't care for), but at the end I think it comes down to money.

ecnahc515 · on April 7, 2015

To add to this. Sure you can spin up ec2 instances and you don't have to worry about atomicity. However, what happens if you want to run more than one service per VM? That's where containers allow to take full advantage of the system.

cwyers · on April 7, 2015

> No more dedicated boxes for mysql, mongo, etc. This in turn enables you to have an easier ops experience, because every single machine in your fleet is homogenous (same OS, same patches, etc)

Is that a good thing? Granted, the AWS account I used to managed was 8-10 machines at most, which is probably well below the point where Kuberenetes makes sense, but I remember that it was useful to configure different kind of machines for MySQL than Apache (disk needs were vastly different, for instance) and that the MySQL instance for the web server had different needs than the MySQL instance for offline data processing.

mercurial · on April 7, 2015

Especially if you're working with Postgres, which doesn't do multi-master...

merb · on April 7, 2015

oh since postgresql 9.3 upwards + repmgr(d) it's now way easier to scale and running postgresql clusters, even in docker.

ipedrazas · on April 7, 2015

You have to design you applications bearing in mind where do they run (AWS, baremetal...) and now, you have to design your applications bearing in mind the "Datacenter OS", which is fine, but adapting solutions to new ways of doing takes time.

To me, unless you have a big park of machines, these systems are a total overkill... but I guess that time will say.

grey-area · on April 7, 2015

To me, unless you have a big park of machines, these systems are a total overkill

I think that's an important point, and one which container vendors are not going to labour, as they want as many people as possible on their platform, even before they really need it. A lot of people are trying to use docker or coreos who really don't need to, and as they're not the focus of containerisation efforts, they'll suffer as they find out they're not really tailored to what they want to do, which is just get their small web service running reliably with the minimum of fuss, and be sure they can rebuild it or move it between providers easily.

If you have 1-10 machines which don't change much, use Ansible or similar to get predictable (re)deployments and don't worry about using containers.

If you have > say 10 machines, this sort of stuff becomes more useful, because you are herding cattle, and need the infrastructure necessary to keep that herd going, even if a few die off from time to time - then you can scale to hundreds easily as your business grows, you can manage lots of workers reliably on one VM in containers etc, etc.

For probably 90% of websites out there, with a sane setup that's never even going to become an issue and they could run easily on just a few servers.

merb · on April 7, 2015

that's not true. what happens if one of your server dies?

you either heal it back or you shoot it. shooting is way faster and docker could help you by that.

Also 1-10 servers could be much, it really depends how much stuff you need.

Also docker adds "some" security. Docker isn't the perfect match, but on our site we run a match between ansible and docker (without coreos) and are very happy.

In our aws cloud we have another system which only uses fleet and coreos, the cluster upgrades itself which is a big plus, but doesn't work that good in our internal infrastructure with proxies, firewalls, etc..

We run 5 machines in aws.

And have like 6 vsphere esxi nodes.

grey-area · on April 8, 2015

If one of your server dies you deploy another in 5 minutes with ansible, with a small number of servers that rarely happens though.

benjaminwootton · on April 6, 2015

A few ways in which this stack can simplify your life:

- Abstract away elements of failover, disaster recovery, clustering, load balancing

- Abstract your application services cleanly from underlying servers

- Give you more consistency across and within development, test and production environments

- Allow for immutable deployments

- Allow for canary releasing, A/B releasing, rollback etc

- Better isolation of processes and services

- Move away from general purpose operating systems to lighter weight single purpose OS such as CoreOS

Configuration management still has a role to play in this stack, but somewhat smaller than the old world

leef · on April 6, 2015

It doesn't. As an average developer it's my public/private cloud providers job to take my images/dockers and schedule them correctly on whatever underlying fleet of hardware they run.

I can only assume Tectonic and related tech are aimed at those developers who run hybrid or private clouds for the rest of their company.

brendandburns · on April 7, 2015

This is also very true. The purpose of systems like Kubernetes/Mesos is to be set up once by a company, and for most developers to just interact with the CLI tool to run containers. Tools like Google Container Engine, make this into a software as a service product, where the cloud provider provides the API. In that world, as a developer, you never really think about the OS, just your app. But unlike in traditional PaaS, there is no framework that restricts what you can code (language, libraries, etc).

ajessup · on April 7, 2015

The main difference between an PaaS like Heroku and a general purpose cluster manager service like GKE is that the former can simplify my making certain assumptions about your workload.

For example if a service knows your container is serving a web application then it can sensibly provision load balancers, DNS, HTTPS, auto-scaling, static content caching, automatic QPS monitoring etc. to support your app with little explicit configuration. And with a commercial service you get an SLA for those things.

You can get much of this with a general purpose cluster as well, but of course you need to configure it yourself, and more importantly - debug it when it goes awry.

Disclaimer: I work for Google

lstamour · on April 7, 2015

Sadly in the Cedar stack, Heroku did away with Varnish or static asset caching from their "Dynos". Instead all assets bundled with your app are served using Ruby and dyno time. While a simple and clean architecture, this either raises the cost or forces you to place static assets on other servers (or use CDNs). And you'll still need to debug someone else's stack just as much -- when you get a service-specific error for exceeding memory limits or having requests take too long because of instance-level queues, etc.

At this point, I never pick PaaS because it's less work, I pick it simply because I've used it before and it's easy to get started with. Production apps are never hands-off if you're the one developing them ;-)

jacques_chester · on April 7, 2015

> But unlike in traditional PaaS, there is no framework that restricts what you can code (language, libraries, etc).

I'm not sure how traditional PaaSes are limited in that sense. Heroku accepts a wide variety of software through the buildpacks mechanism. Cloud Foundry supports both buildpacks and docker images, with clean extension points to add further mechanisms -- for example, .NET-on-Windows deployments, which is under development.

Disclaimer: I've worked on Cloud Foundry and I'm employed by Pivotal.

cmelbye · on April 7, 2015

There are already a couple pretty good responses, but I would add that all of the technologies you mentioned (along with others in this space) are still evolving and maturing.

Docker popularized containers, and now we're seeing the supporting technologies that will help fulfill some of the promises of container-based deployment. Big companies have had this technology internally for years, and now it's starting to get productized.

I'm referring to things like schedulers, networking, monitoring, etcd/consul, and more. I personally don't think all of the pieces are there yet to make it as seamless and easy as it could be, but it's coming.

wyc · on April 6, 2015

> we set out to build and deliver Google’s infrastructure to everyone else

This statement rings pretty true as Kubernetes (also known as k8s) has some Google biases. Not all cloud providers will have such an easy time providing all the infrastructure necessary to run CoreOS + k8s smoothly.

For example, Kubernetes assigns each Pod (k8s unit of computation) an IP address, which is only simple to do if your cloud provider supplies something like a /24 private block to your nodes. CoreOS came up with the VXLAN-based Flannel project to make this model more portable[0], but Layer 2 over Layer 3 isn't something I'd like to throw haphazardly into my production environments. Google Compute Engine conveniently provides this setup as an option.

Another example of Google-favoritism is the strong preference of centralized storage--particularly GCEPersistentDisk. At first I was concerned about centralized storage by default as we know disk locality is a Good Thing (TM), but after reading a paper that claimed networking is improving faster than disks are[2], I felt somewhat better about this. However, it's still pretty obvious that a Google Persistent Disk is the way to go with k8s[3].

That said, I'm really happy that Google has open-sourced this project because it is indeed a functioning, tested, and easy-to-use distributed system. I'm sure that the devs aren't aggressively shutting out other cloud providers and that these biases are probably just a side-effect of their resource allocation process and the problems that they intend to solve (e.g. GCEPersistentDisk used to be a core type instead of a module--it has since gotten better). It's still important to evaluate a technology's biases and potential evolution before throwing your product on it.

[0] https://github.com/coreos/flannel

[1] https://github.com/GoogleCloudPlatform/kubernetes/blob/maste...

[2] http://www.eecs.berkeley.edu/~ganesha/disk-irrelevant_hotos2...

[3] Do you see any other providers here? https://github.com/GoogleCloudPlatform/kubernetes/tree/maste...

justinsb · on April 6, 2015

I'm working on adding AWS support for Kubernetes. I just last week finished Load-Balancer (ELB) & Persistent Storage (EBS) support, and they're currently going through the pull-request review process. Once they merge (I'd guess a week or two?), AWS will be on-par with Google Cloud Engine feature-wise.

I have found the Kubernetes team to be nothing other than extremely supportive of efforts to support AWS & non-Google platforms. It takes a little longer to translate some of the Google-isms to other platforms, but I'm happy for the thinking behind those decisions, vs just adopting lowest-common denominator.

josephjacks · on April 6, 2015

I also see quite a lot of IaaS providers here:

https://github.com/GoogleCloudPlatform/kubernetes/tree/maste...

wyc · on April 6, 2015

> I'd guess a week or two?

I'm mostly concerned about feature lag and vendor lock-in, so I'm happy to hear that this will be out so soon. I'm excited to try it out.

> I have found the Kubernetes team to be nothing other than extremely supportive of efforts to support AWS & non-Google platforms.

I don't doubt it one bit; in my experience, people on the Kubernetes IRC channel have been always really helpful and supportive. I just tend to be a little more pessimistic when it comes to resource allocation: a Google team probably prioritizes support for Google platforms, and that's no one's fault or foul play.

Thanks for your work!

chubot · on April 6, 2015

About disk locality -- I've read that paper and know that Google increasingly has the philosophy of disk locality being irrelevant.

However, I don't buy it for 2 reasons:

1. Highly available distributed services need to have geographical diversity, i.e. they should be "multihomed". This is true on AWS or in Google's internal data centers. That means you have WAN latency, in which case locality becomes again the primary design concern for performance.

Pre-Spanner, Google's solution was to use application-specific logic to be multihomed -- i.e. nearly rewrite your application, depending on how stateful it is. Spanner isn't a silver bullet either. You still have to solve latency problems, just within the ontology of Spanner rather than the application.

It's bad for your code to ignore latency within the data center, and then later add (incorrect) hacks to work around latency between data centers. If you pay attention to network boundaries from the beginning, it will be easier to multi-home.

2. A single machine is still your domain of failure. Even if it doesn't matter for performance, you still have think about machines to handle failures.

The interfaces between machines should be idempotent to handle failures gracefully. And many distributed storage services have complicated performance vs. durability knobs for how many machines/processes have accepted writes.

So I think Google does have a "single system image" bias, and you are right that Kubernetes has these Google-isms in its architecture.

zaroth · on April 7, 2015

I have serious trouble with [2]. Disks not evolving as fast as the network? Under what rock have they been living?

The paper seems to peg local disk bandwidth at 150Mbps, and then compare it to remote network disk access at... 150Mbps. NVMe is going to grant us 2.2GBps bandwidth and 450K IOPS (from a single consumer-grade product), so that paper is off by more than an order of magnitude. Local disk is non-volatile storage sitting a PCIe lane away from your CPU. I just don't see how disk locality is not going to be crucial for many workloads, for decades to come.

In 2020 a flash-only SAN isn't going to deliver 20Gbit/sec to each of 100 blades in the rack. A 4TB NVMe card on each blade will though...

Look at Intel's latest Xeon-D SOC, yeah, it's got Dual 10GBE, but you're not going to get 7.7GB/s over that... [1]

[1] http://www.intel.com/content/www/us/en/benchmarks/server/xeo...

brazzledazzle · on April 7, 2015

You should look at their measurements and assumptions in context. As you can see from the URL, it was written in 2011 when the NVMe working group was first formed. It was also written in the context of cluster-based applications in a data center and specifically mentions SSD and cost effectiveness. Storage cost effectiveness is critical at these scales because your data is growing by terabytes per day.

You also mention blades, which goes into the next point of context which is that operations like Google and Facebook don't utilize blades like you might expect working at your average enterprise because they aren't leasing rack space or working with a limited amount of physical space. They don't need the same U to performance ratio so they can save money by using commodity hardware. Their applications also scale readily, so the loss of entire boxes is meaningless within a certain threshold.

raksoras · on April 7, 2015

Why does k8s has this restriction that each pod/minion should be in its own subnet?

brendandburns · on April 7, 2015

Clarification:

Each pod has it's own IP address that is routeable anywhere in the cluster. This makes life much easier because you don't have to do port-forwarding onto the host node.

In all current k8s set-ups, each Minion/Worker node has a subnet that it allocates these Pod IP addresses out of. This isn't a hard requirement necessarily, but it tends to be much easier to make this work, since you only have O(Workers) routes to configure instead of O(Pods), but long term, I think we would rather do away with subnets per node, and simply allocate IP addresses for each Pod individually.

atombender · on April 7, 2015

Re IPs, you could just go with private IPv6 addresses, could you not?

thockingoog · on April 9, 2015

Yes, in theory, but we have not really tackled IPv6 yet. Soon :)

tdicola · on April 6, 2015

Congrats to the team! Interesting to see a big focus on Kubernetes going forward--doesn't CoreOS kind of compete with Kubernetes? (even though Kubernetes builds on CoreOS tools like etcd)

kelseyhightower · on April 6, 2015

Kubernetes is the perfect complement to CoreOS, and it's the type of cluster manager we would have built over time -- thanks Google!

We started building fleet from the ground up with plans to build a higher level tool on top, which would provide the application centric workflows you see in Kubernetes. But like all decisions at CoreOS, we only build stuff we have to build. When Kubernetes came out we quickly started evaluating it for our needs. It turns out it's a great fit, which is why we built Tectonic around it.

Kubernetes adds a lot of value to the container ecosystem and is grounded in the same roots as other CoreOS projects: opensource and highly collaborative. Both CoreOS and Kubernetes want to see our components leveraged outside of our commercial products, but we also feel that we can offer the best experience by combining them into focused offerings for people looking for a complete package.

josh2600 · on April 6, 2015

Do you see fleet existing in a Kubernetes centric universe? If so, in what role?

ecnahc515 · on April 6, 2015

Kubernetes doesn't compete with CoreOS except in that both CoreOS and Kubernetes have their own schedulers (fleet and the k8s API server).

However, fleet is designed to be relatively simple, and can be used to deploy 'services' (as in systemd services), while kubernetes is more oriented around deploying 'apps', and provides a much higher abstraction that makes discovery, etc easier.

fredkelly · on April 6, 2015

Also interested how the Rocket/Docker debate will fit in with this?

kelseyhightower · on April 6, 2015

This is a great question.

CoreOS is currently working on adding support for Rocket[0] to Kubernetes. There is still a lot of work to do here, but things are progressing nicely. Kubernetes is a true opensource project and accepts contributions that makes sense. Supporting another container runtime engine makes sense in terms of choice for the community and forces clean separation between Kubernetes, the cluster manager, and the underlying container runtime engine.

When it comes to Rocket vs Docker I think the competition will be good for both projects and the community as a whole. At the end of the day, both projects and the communities behind them want people to be successful with containers.

[0] https://github.com/GoogleCloudPlatform/kubernetes/pull/5518

Alupis · on April 6, 2015

> both projects and the communities behind them want people to be successful with containers.

I think it's better said, Docker wants people to be successful with containers, so long as it's Docker containers.[1][2][3]

We've seen a lot of very open work coming from the CoreOS team with Rocket and the ACI... unfortunately we've only seen negativity and mud slinging coming from the other camp.[2][4][5][6][7]

Docker is more interested in having an "open" implementation, rather than an open specification. The things is... an Open Specification is far more important for industry changing technology like containers.

[1] https://news.ycombinator.com/item?id=8683705

[2] https://news.ycombinator.com/item?id=8938409

[3] https://news.ycombinator.com/item?id=8789181

[4] https://news.ycombinator.com/item?id=8938066

[5] https://news.ycombinator.com/item?id=8938176

[6] https://github.com/docker/docker/issues/10643

[7] https://github.com/docker/docker/issues/9538

curun1r · on April 6, 2015

Don't kid yourself...it's been negative on both sides. CoreOS called Docker fundamentally flawed, though they've backed away from that a bit.

From what I can tell, CoreOS was happy to let Docker, Inc control the container platform and build orchestration tools on top of Docker, since it's pretty clear that no one is willing to pay for the base containerization layer and the only money to be made is in orchestration. But then Docker released their own orchestration strategy and that started the current period we're in where both companies have come off as being very unprofessional and concerned only with their own future success.

Between Docker trying to build almost everything into the actual Docker executable (it's a container runner, API server, build tool, process monitor, cloud provisioner and orchestration tool...those last two are only not the case because they got called on the strategy) and CoreOS's pointless pushing of rkt (rkt itself isn't pointless, but the PR to add rkt support to Docker was, as was implementing Docker support in rkt). Both sides have foolishly put their own interests ahead of the container ecosystem.

To say that either of the two companies has been the bad actor in all the drama would be biased. They've both come across like spoiled little children trying to get their way. As someone who has been pushing my company (~8000 employees) to consider implementing a containerized deployment strategy, it's been hugely inconvenient to have to concede to people arguing against containerization all the points around how unprofessional the companies in the space have been, both in behavior and coding practices.

fredkelly · on April 7, 2015

+1, This has been my interpretation too, I feel like a lot of the rkt PR was subtly aimed at Docker, it was very much "this is why it's better" rather than "this is why it's good".

politician · on April 7, 2015

Honestly, as a consumer of both projects, I'm happy that rkt exists because it provides a working demonstration of a solution where container images are orthogonal to other concepts such as orchestration. If it went the other way, our vendors would be swearing up and down that the only way for this container thing to work is if we install their complete solution.

I love my Apple MacBook Pro and its soldered components, but I need my data center to be meaningfully upgradable.

ecnahc515 · on April 7, 2015

I love Docker, but it feels a lot like the days of incompatible VM images, where you got super locked in, and if you tried anything that wasn't what they had designed for, then you got screwed.

Alupis · on April 7, 2015

Can you point to specific examples of the CoreOS team dealing low blows?

Frankly, CoreOS started the ACI and Rocket because there was no open specification. Docker had started off with somewhat of a specification, then deleted it. A year later, CoreOS launches ACI as a public open spec, and suddenly Docker whips one of their own up over a weekend.

The entire point of ACI is nobody owns the specification. Docker wants to own the spec, govern the spec, and be the spec.

With something as critical as this technology -- we need an open specification -- far more than an open implementation.

curun1r · on April 7, 2015

The three examples in my first post, combined with their timing in releasing Rocket. Their initial press release was filled with vitriolic rhetoric...so much so that they had to recant most of it. And coming so soon after Docker started talking about their orchestration plans, it's pretty transparent that CoreOS doesn't feel that the community needs and open specification. They feel that CoreOS needs an open specification to prevent Docker, Inc from leveraging the common implementation to gain an unfair advantage in the Docker orchestration market.

After announcing Rocket, they embarked on a campaign of subtly trying to show people how uncooperative Docker, Inc was being, first by implementing Docker support in Rocket and then that sham of PR (the first ever PR PR) where they implemented Rocket support in Docker. It was so much code and, with no warning that it was coming, there was no chance that it would ever get merged. The only point of that was to try to argue that they'll support Docker and Docker won't support Rocket. It's so transparently passive aggressive to anyone that understands the underlying dynamic expressed in my earlier post.

> With something as critical as this technology -- we need an open specification

That's fine. But there's a right way and a wrong way to get there. They should have started by documenting the existing Docker implementation. It's open source and implemented by many Docker partners (Amazon, Google, etc). Once the community has accepted the specification, as the authors, they would have license to go to the community and push for changes they feel are necessary. Docker isn't fundamentally flawed, but it is flawed and I think even Docker, Inc admits that. There are changes that are necessary and they need to happen out in the open rather than behind closed doors at Docker, Inc.

Starting their own format was an attempted power move designed to fracture the community. And I see it as largely having failed. There are a lot of things that I like better about Rocket, but the ecosystem around it is way too nascent to even consider using it. There's no collection of images that I, as a developer, can leverage to avoid building everything from scratch. Docker Hub is a big part of the reason to use Docker. And there's no third-party integrations for Rocket. I can use Docker through AWS ECS and have the piece of mind to know that I can migrate to GCE and my Docker images will basically just work. With Rocket, I basically have to use CoreOS (a design I'm not thrilled with) or roll my own.

If either company seemed like it was trying to put the community first, I'd be 100% behind that company. But CoreOS's rhetoric around trying to position themselves that way is simply indicative of them not having first-mover advantage. Both companies need to grow up and realize that this is a nascent market and there is room for both to succeed if they're only willing to cooperate. But if they continue to squabble like little children fighting over a ball, the parents are going to take that ball away. Google and Amazon will roll their own proprietary containerization or, worse yet (for Docker and CoreOS), they'll cooperate on an open standard. But the community is what's important. If they can continue to grow the community, profits will follow. If they continue to try to corner the market and push each other out, the community will go elsewhere and both companies will fail.

Alupis · on April 7, 2015

You seem to not understand what ACI is... or haven't bothered to do some research on it.

> With Rocket, I basically have to use CoreOS (a design I'm not thrilled with) or roll my own.

Rocket works on any Linux platform. Other implementations are working on more and more platforms, including BSD's.

They all adhere to the ACI open specification.

ACI is an open specification that anyone/everyone can contribute to (including Docker if they wanted). To date, there are more than 4 completely different implementations of ACI (one of which is Rocket), all written by different organizations. Apache Mesos is working on a 5th, and more are coming.

That makes the ACI have 400% more implementations that Docker's "specification" which was written after-the-fact and is more-or-less a documentation of how Docker was implemented rather than an open specification in-which the community helped form.

ACI belongs to the community. It's being contributed to by a large userbase and is growing rapidly.

This was the entire driving force behind ACI from CoreOS -- to get a standardized specification that anyone could implement. CoreOS saw the need for there to be more than a single container technology -- but they must be inter-operable.

It's a lot like virtiualization before the OVF format came along. ie. If you made a VMware guest, you were stuck with VMware. After the OVF format, you could export your VM to an OVF image and import it into any other standardized virtualization platform -- you were not locked into a single for-profit company and carried at their whims.

You seem to think it's about a "money grab"... well, sure if users adopt Rocket maybe CoreOS gets support fees etc... but if users adopt one of the many other implementations of ACI, CoreOS doesn't get anything except a diverse ecosystem of inter-operable container platforms.

That's a win for everyone.

CoreOS still supports Docker -- and I've seen nothing but praise come from the CoreOS camp. That doesn't mean Docker is doing the right thing for the community at large -- instead, they are doing the right thing for Docker, the for-profit company.

curun1r · on April 7, 2015

> You seem to not understand what ACI is

I know exactly what the app container spec is...stop being patronizing. My point was that I have to use CoreOS FOR ORCHESTRATION because my normal options for orchestration don't (yet) work with Rocket or any of its implementations. I know CoreOS has publicly announced that they'll make Kubernetes work with Rocket, but that's not yet reality. And the orchestration option that we're currently pursuing, ECS, isn't likely to support Rocket any time soon.

And CoreOS doesn't exactly have a great history of being interoperable...the tools that make up CoreOS have hard dependencies on each other. For example, I find Consul to be better than etcd in almost every conceivable way and yet CoreOS forces me to run etcd. I'm not going to believe that CoreOS cares about interoperability until they're willing to take the same "batteries included but removable" stance that Docker has professed.

You seem to have a very pro-CoreOS bias. I'm not going to try to change that. Let me be clear on my biases. I like the design of Rocket far more than Docker. I don't want a single executable to build, run, monitor processes and be a REST API. But I also disagree with almost every decision that's gone into CoreOS. Fleet is cumbersome, systemd is a train wreck (the one thing an init system cannot be is non-deterministic and I've seen too many systemd boot sequences be that) and etcd is inferior to other options. So while I respect that Rocket can be used elsewhere, there's still little-to-no support outside of CoreOS when it comes to everything that goes into running containerization in a production environment.

My company has serious dev-test-prod workflow issues that can be solved by containerization. We're publicly tied to running in AWS. CoreOS is simply a non-starter. While I completely agree with all the negative things that have been said about Docker, I can still advocate Docker-based solutions in the meetings I attend on this subject because everything I need from the ecosystem is there, today, ready to use. Rocket simply isn't there yet and it could have been if they'd started with the existing Docker implementation as the basis for their specification. And you're not going to convince me that that NIH decision is in the best interests of the community.

Alupis · on April 7, 2015

> Rocket simply isn't there yet and it could have been if they'd started with the existing Docker implementation as the basis for their specification

It's not "their" specification. You really don't get what the ACI is. Nobody owns or controls the ACI -- it's a community specification being worked on and developed by a community, ie. no one entity dictates what it looks like.

> I find Consul to be better than etcd in almost every conceivable way and yet CoreOS forces me to run etcd

This is out of the scope of this discussion. Rocket runs on any Linux. ACI implementation run on any 'nix currently. There is not interoperability issues here. etcd is part of CoreOS the Operating System. However, you can use etcd outside of CoreOS the Operating System.

This complaint is like saying your don't like iptables in RHEL 6.x and since it's not easy to change out, you refuse to use Wayland. It makes no sense.

> I can still advocate Docker-based solutions in the meetings I attend on this subject because everything I need from the ecosystem is there, today, ready to use.

Perhaps this is a sign that contanerization is simply not ready for mainstream enterprise use yet. Sure, Docker stamped the big "1.0" to impress investors and Red Hat, but that doesn't mean it's actually really ready for the big time. Especially since other offerings are rapidly brewing. You've gone this long without containers, you can definitely wait another 6 months (or switch to BSD and use jail containers which have been mature for 15+ years).

> I'm not going to believe that CoreOS cares about interoperability until they're willing to take the same "batteries included but removable" stance that Docker has professed

Again -- you don't seem to understand what the ACI is. It's a community thing -- CoreOS doesn't own it. That guarantees interoperability. The ACI specifies what a container should look like to be "standards compliant". It's up to the specific implementation to make a system that works with that standard container format. The system can be implemented in any language, work any way they want, do anything they want, etc... so long as it can read and understand the standard container format. If that is not "interoperability", I don't know what is.

> You seem to have a very pro-CoreOS bias

No, I don't. I have a very pro-Open Standard bias.

The world does not need another vmdk.

ecnahc515 · on April 6, 2015

Kubernetes is actually in the process of getting rocket support in addition to existing support for Docker. More details can be found here: https://github.com/GoogleCloudPlatform/kubernetes/pull/5518

kungfooguru · on April 6, 2015

Aren't they doing the same thing people have been complaining about Docker doing? Running with features but not stablizing? Is CoreOS not as unstable as the impression I've gotten of it from comments led me to believe?

kelseyhightower · on April 6, 2015

It's true, the CoreOS team has been moving really fast. It's what startups have to do in the beginning -- before the money runs out :). While our announcement today introduces Tectonic and some funding, it also communicates that we have found our market fit. This means we have the right set of features to go to market with, and add value by making people successful with containers.

Now it's all about execution.

CoreOS is now focused on balancing new features with maturity and stability. For many of our projects this focus started almost a year ago. Take etcd for example. CoreOS spent the last 9 months effectively rewriting etcd and it's raft implementation to meet the stability requirements we need going forward. Other projects have demonstrated great signs of stability such as CoreOS Linux, our Linux OS focused on containers. It's not perfect, but there are many people that run CoreOS Linux in production today. We expect this trend to continue.

I should also note that we have many more team members employed by CoreOS than we did a year ago. It takes time to build the right team, but so far I think we have done a fantastic job. I'm still amazed at the number of projects we actively maintain and ship with a team our size. Now we have the people power to make our projects/products solid.

Thanks for raising the question and hopefully we can continue to answer it with actions and shipping products that you will fall in love with -- or at least get the job done without the pager going off.

cbsmith · on April 7, 2015

> CoreOS spent the last 9 months effectively rewriting etcd and it's raft implementation to meet the stability requirements we need going forward.

Yeah, I noticed that. It almost seems like it would have been better to just glom on to Zookeeper or some other directory service that was a bit more established, just for the sake of being able to work on other things for the last 9 months. Do you ever look back in retrospect on that?

jpgvm · on April 7, 2015

The reality is that etcd is still probably a year or 2 away from being production ready.

In the mean time Zookeeper has shipped dynamic ensemble configuration which bring parity to the only thing etcd had any advantage in. (Not that it -really- mattered, most people that were running dynamic ensembles were already using Netflix's Exhibitor)

Zookeeper is also faster, has more features and you need it anyway if you are running SolrCloud, Mesos, Hadoop etc.

cbsmith · on April 7, 2015

Zookeeper definitely has some advantages both in terms of maturity and features.

I'm not sure though that Zookeeper is universally faster than etcd (certainly as you add nodes it slows down... though most people don't need more than 3 or 5 nodes).

Dynamic ensemble configuration is definitely a place where Zookeeper was lacking before, but it isn't the only difference that matters. The big difference I remember with etcd was partition tolerance. Last I checked Zookeeper was all on the consistency side and it's partition tolerance basically just wrote off the smaller partition. Either choice can be a good thing or a bad thing for a directory/orchestration store, but particularly as you scale up to large numbers, partition tolerance semantics seem to better match the use cases.

So I don't think etcd is all inferior to Zookeeper, and I understand the desire to build a system where all components are fundamentally AP, but sometimes having a bit of impurity and some duct tape hackery in your model can help you get out the door faster and focus on more significant technical challenges that will help you build your community. Once you have a community, you have more resources to address that duct tape.

chuhnk · on April 6, 2015

Kubernetes will become a dominant container orchestration platform and API. It's what Mesos should be, but with the backing of Google, kubernetes is more likely to push into that space a lot quicker and with a lot more momentum. Notice, there are a lot more startups forming around kubernetes than mesos. Even still, competition is healthy, having multiple forms of large scale orchestration and resource management systems are important for the tech ecosystem. I imagine we'll see a couple more over the next few years. Docker with their own. I'm sure we'll see one to solve these problems at a smaller scale that is easier to manage. And perhaps another focused on a niche area like AWS lambda functions but as an open source project.

josephjacks · on April 6, 2015

Overall, I do agree. Kubernetes represents 10+ years of compressed experience from the exact same folks who designed and implemented Borg and Omega at Google. It's also worth noting that Google is putting 30+ engineers to work full-time on Kubernetes. There are almost 300 contributors to Kubernetes today, just 9 months from inception. I believe we will begin to see some pretty exciting production references soon when v1 arrives over the next couple months.

Disclosure: I work for "the Kubernetes company", Kismatic. http://www.kismatic.com/

abhay_agarwal · on April 6, 2015

Kubernetes is not really a container orchestration platform. It helps organize containers into logical groups with dependencies, but orchestrating resources on your physical machines is well within Mesos' domain. Keep in mind that Kubernetes and Mesos aren't mutually exclusive. Kubernetes has been [implemented](https://github.com/mesosphere/kubernetes-mesos) on Mesos.

My opinion is that Kubernetes is closer in identity to Docker: tons of OSS excitement, many companies jumping on board, and really pluggable. As to whether its going to become the critical infrastructure piece that gets enterprise to pay $$$... that's a stretch.

josephjacks · on April 6, 2015

Abhay, perhaps you can disclose in your biased answer that you are employed by Mesosphere?

notacoward · on April 6, 2015

Perhaps you should have disclosed in your answer that you're a founder of a Kubernetes-focused startup.

josephjacks · on April 6, 2015

Noted! Apologies.

notacoward · on April 6, 2015

Thanks, and cheers. :)

josephjacks · on April 6, 2015

Cheers. :-)

jsmthrowaway · on April 6, 2015

> Kubernetes is not really a container orchestration platform.

Since Kubernetes schedules, I'm curious what makes you say this. Kubernetes without Mesos works perfectly well for the use cases that both CoreOS and Google envision.

abhay_agarwal · on April 6, 2015

To what scale? A lot of tools out there help you schedule containers, but to how many machines, containers, and level of resource utilization? Perhaps I meant that it's _technically_ an orchestration platform, but it's not at the level of quality that is required for large-scale orchestrations (i.e. Borg, Mesos).

And yes, as noted above I work for Mesosphere. YMMV.

chuhnk · on April 6, 2015

I think everyone is clear on the fact that after 9 months Kubernetes is not yet as mature as the 4-5 year old Mesos. But people love the project and the direction that it's headed in, to such a degree that there are partnerships forming, startups emerging and many many people contributing to the project. Many of us who will adopt Kubernetes are not looking to scale workloads across a Twitter or Google scale environment. We might have a few hundred machines or less, some might go beyond that but in all honesty I'd say for the time being people can resort to 100 machine clusters and manage multiple clusters.

Don't get me wrong, I really like mesos. I think it's a fantastic piece of technology and I really want to see it do extremely well. I just think Kubernetes is something a little more accessible to the wider developer community. Plus we get to see it grow from infancy and help contribute to that growth. AND its written in Go, which is a major plus.

krenoten · on April 6, 2015

It's a plus until it isn't :P

jsmthrowaway · on April 6, 2015

Maybe it's just me, but perhaps I like the company who actually wrote the scheduler powering the service you mentioned, has run it in production for over a decade and gathered petabytes of knowledge about how to refine it, has been working for several years on its replacement, and has woken up in the last couple years and realized that while their infrastructure has been their competitive advantage to this point there is value in the whole industry having access to to their technology. Maybe I like their expertise and I'm not ready to condemn Kubernetes and its quality just yet because it's young and who knows where they'll go. Maybe they'll separate Omega from google3 and land it in Kubernetes. Then what?

Mesos and Mesosphere talk a lot about scale, with the implication that the technology you're modeling powers the entire Google fleet in one big harmonious system. It doesn't. There's a fairly hard upper limit on the system you mentioned in terms of scale, which is one of the motivations for Omega. The system you mentioned also fits into an ecosystem from a DLM[0] to a RPC system to a build system and on up. Pieces of that are gradually being open-sourced, too, like Bazel. Google is clearly becoming aware that their infrastructure being generally available will benefit the industry enormously, and regardless of my personal feelings on their business model and exploitation of user data, that is a good thing. If you're telling me that Mesos can do better than all of that expertise, better than John Wilkes, better than all of that, to the point where you don't even consider Kubernetes on the same path as Mesos, I don't know what to tell you other than come back to reality. This seems to be a rut with Apache projects that model Google systems in some way: Zookeeper, Mesos, Hadoop, Flume. All of them look like if I described the systems in question over a very blurry fax, and the open-source community did its best. That's not outright criticism, it's just having the unique experience of having seen their Google counterparts; it's a good thing they exist, but there are absolutely opportunities to do better, Mesos included.

I get what you're trying to do with Mesos, but I strongly disagree with your characterization of Mesosphere as a datacenter operating system. It's disingenuous, and positioning Mesosphere as the only company who can do hyperscale in containers is a tough sell. Last time I used Mesos, in 2012, it could barely handle 100 machines and I had to write shell scripts to even get a simple static application scheduled. I then threw 10,000 jobs at it and the scheduler hung and took down the entire cluster. Clearly you have come a long way since then, but it's apparently easy for you to forget that given the age of Kubernetes.

[0]: You'll pardon me if the thought of Zookeeper doesn't excite me as the backbone of the entire system.

jacques_chester · on April 6, 2015

Mesos has come a long way since the early research. But so has all the competition.

I'm biased, of course. I work for Pivotal, we're the lead builders of Cloud Foundry. Our current generation of software can spin up hundreds to thousands of machines without too much fuss. Our next generation[0] is ... better.

And again, this is my bias showing, but I think our design for container workload scheduling is better than Mesos and Kubernetes.

We don't get much buzz on HN because we sell a complete PaaS to F500s. It's not very approachable on a personal level.

[0] http://lattice.cf/

krenoten · on April 6, 2015

The two things that matter are performance and reliability. Are you able to go into how lattice improves over either Mesos or k8s in these regards?

jacques_chester · on April 7, 2015

I was interested in the design decisions, but sure.

Lattice is a selection of Cloud Foundry components, principally the Diego runtime system, Loggregator log streaming system and Gorouter HTTP routing system.

If how software is written matters to you, then you might like the way we work. Cloud Foundry is Pivotal Labs DNA, scaled up. 100% pairing, every new line of code is test-driven.

We dogfood everything by running Pivotal Web Services as a public PaaS, which is always within a week or two of HEAD across the board.

Where we follow is that Kubernetes is extracted from real experience and Mesos has a head start on implementation.

In terms of design, I like ours best. Tasks and long-running processes are distributed using an auction mechanism, which means that there's much less global resource-status chatter required to make the scheme work. Diego is "demand-driven", meaning activity occurs when new demands are made. Mesos is "supply-driven", meaning state-affecting activity occurs when resources become available. They both solve a critical problem by pushing intelligence out to the edges, but to different edges.

I am much less familiar with Kubernetes, so I will avoid being any wronger than usual.

achanda358 · on April 6, 2015

There seems to be some feature overlap between fleet and kubernetes. Anyone knows are they planning to reconcile those two?

jsprogrammer · on April 6, 2015

What more do you want fleet to do? It's a simple scheduler and seems to be pretty much complete.

Expect CoreOS to shift resources towards proprietary revenue streams while allowing the community to pick up slack on open projects.

kelseyhightower · on April 6, 2015

I hope not!

I can tell you first hand the majority of our team members work on our opensource projects. It's one of the main reasons I came to work for CoreOS over a year ago. But the concern is still valid -- will CoreOS really keep funding our opensource projects or will CoreOS pull a fast one and go open core?

So far our actions prove we are committed to opensource and that continues with todays announcement. We wanted it to be really clear that CoreOS is all about opensource projects and collaboration, even if that means our competitors can compete with us, that we took major effort in keeping the two brands separate. There is coreos.com for "Open Source Projects for Linux Containers", and tectonic.com that combines those projects in a commercial offering. There are some non-open bits in the commercial offering, but they don't conflict with the opensource projects.

You are right that we will allocate more resources towards making money, which is required if we want to continue the good work we are doing on the opensource front. You are also right in that the community will pick up the slack, but please keep in mind that CoreOS is part of that community and we plan to continue leading the way.

politician · on April 7, 2015

Does k8s replace fleet in a tectonic system? It's been asked several times on this thread, but there hasn't been a straight-forward answer yet.

jsprogrammer · on April 7, 2015

I don't think they want to say anything that would potentially alienate anyone, which is why you're unlikely to get a straight answer.

fleet will exist indefinitely, but kubernetes essentially replaces it with something (promised to be) better.

ecnahc515 · on April 7, 2015

Nope, k8s and fleet are used together in Tectonic. K8s is what the user interacts with for any kind of deployment, however.

barakm · on April 6, 2015

I can safely say that's not the case. Much of what we're doing with CoreOS or Tectonic depends on open source, particularly our open source projects. That's not going away anytime soon.

(etcd dev here. We've got a long roadmap ahead.)

jsprogrammer · on April 6, 2015

Where can we find the roadmap and is Tectonic itself open source?

barakm · on April 6, 2015

To your first question: https://github.com/coreos/etcd/milestones (and the mailing list is pretty useful as well, to keep up on point releases and announcements and such).

To your second question, Kelsey said it best: https://news.ycombinator.com/item?id=9330398

jsprogrammer · on April 7, 2015

I appreciate the roadmaps on individual projects. I am more interested in the CoreOS roadmap.

So far the approach seems to be, work on a special project in secret for a couple months that aims to compete with existing solutions and then hype a partial release of the incomplete project on a Monday.

achanda358 · on April 6, 2015

Does not look like it's open source.

achanda358 · on April 6, 2015

Not saying fleet should do more. Just that it has some overlap with Kubernetes.

donflamenco · on April 6, 2015

How does this compare to Redhat's Atomic Host?

kelseyhightower · on April 6, 2015

It's better to compare RedHat's Atomic Host with CoreOS Linux. Both projects are focused on providing a slimmed down Linux OS focused on running containers in a server environment.

Tectonic would be closer to RedHat's Openshift project/product, but that's not quite right either since Tectonic is not a full blown PaaS solution.

jacques_chester · on April 6, 2015

> Tectonic is not a full blown PaaS solution.

Granted, but let's be serious for a moment.

Nobody with widely-recognised components is going to just stop moving up the value chain to selling turnkey systems. That's where the serious bucks are waiting.

I expect that before too long, my employers' sales team will be seeing CoreOS and Docker sales folk as well as the existing competition from Apprenda, Salesforce/Heroku, IBM and Red Hat.

(And, again, I don't speak for my employers).