I’m one of the long term PMC / committers on mesos.
In retrospect I feel this was inevitable to a few key reasons:
* k8s was a second system with all the learnings and experience of building such a system at Google for over a decade. Mesos was birthed by grad students and subsequently evolved into its position at Twitter but the engineers driving the project (myself included) did not have experience building cluster management systems. We learned many things along the way that we would do differently a second time around.
* Mesos was too “batteries not included”: this fragmented the community, made it unapproachable to new users, and led to a lot of redundant effort. Most users just want to run services, jobs and cron jobs, but this was not part of Mesos and you had to choose from the array of ecosystem schedulers (e.g. Aurora, Marathon, Chronos, Singularity, etc) or building something yourself.
* Mesosphere was a VC backed startup and drove the project after Twitter. Being a VC backed startup you need to have a business model that can generate revenue. This led to a lot of tensions and mistrust with users and other vendors. Compare this with Google / k8s, where Google does not need to generate revenue from k8s directly, it can instead invest massive amounts of money and strong engineers on the notion that it will improve Google’s cloud computing business.
Even if k8s hadn’t come along, Mesos was ripe to be disrupted by something that was easier to use out of the box, and that had a benevolent home that could unite vendors and other contributors. Mesos could have perhaps evolved in this direction over time but we'll never know now.
Well said Ben. I am also one of the long term PMC/Committer for the project.
One of the lesson I learnt was that Mesos's two level resource allocations was originally designed for running batch workloads (e.g., spark, mpi, etc.) if you look at the original paper. Use it to run long running services is actually an after thought. We end up finding that we have to do lots of tuning on the first level scheduling algorithm to ensure fairness given that the second level scheduler does not have the full view of the cluster and the first level scheduler does not have enough information to make good decisions. The solution to the problem is actually optimistic offer, which is essentially the k8s model.
Another reason k8s was successful is probably because the golang ecosystem. In mesos, we spent a lot of the energy building basic HTTP layer in C++ due to Mesos's unique threading model. I wish we could have spent those time working on actual useful features.
Thanks for the historical perspective. Might you or anyone else be able to recommend any resources that discuss the efforts to tune the two level scheduler for long-running workloads?
You mentioned:
>"The solution to the problem is actually optimistic offer, which is essentially the k8s model."
Isn't the K8s model more "choose your QoS model" - BestError, Burstable or Guaranteed? Or am I misunderstanding your comment completely?
I was curious about the this:
>"Another reason k8s was successful is probably because the golang ecosystem. In mesos, we spent a lot of the energy building basic HTTP layer in C++ due to Mesos's unique threading model."
Could you say what was unique about the Mesos threading model?
> Isn't the K8s model more "choose your QoS model" - BestError, Burstable or Guaranteed? Or am I misunderstanding your comment completely?
k8s's scheduling model is that scheduler is able to see the entire state of the cluster, thus can optimistically make optimal decisions on scheduling, especially for those long running jobs that are very picky in practice. Although k8s by default only runs the default scheduler, you could in theory run multiple schedulers in parallel (the omega model).
Mesos's pessimistic two level offer model makes it hard for second level scheduler to make optimal decisions because it might not get the right offer it needs. At the same time, first level scheduler lacks application specific information to make the right decision to send the right offer to the second level scheduler, thus the problem. We evaluated many first level scheduling algorithms, and ironically found that "random" first level scheduler sometimes works better than DRF for long running services scheduling.
> Could you say what was unique about the Mesos threading model?
Mesos uses a component called libprocess (think of it as C++ version of erlang). Each actor in the system (mesos master, mesos agent) is single threaded. Thus, all i/o operations need to be non-blocking to not block the actor. This makes it hard to integrate 3rdparty C++ libraries, especially those that involves I/O as they might have a different threading model.
Golang solved this problem using go-routing and bake that into the language. So the golang libraries, especially those involve I/O, are much more composable than C++ IMO.
Refreshing in the sense that it's a look back at what worked and what maybe didn't work so other projects don't make the same mistake. They could have just flipped the code repo to "archived" and moved on without a word.
I think the way you explained your last point really hits the nail on the head in terms of FOSS. I did actually enjoy large parts of Revolution OS, the movie about the creation of GNU and Linux, but the part that stood out to me the most was cmdrtaco explaining open source (paraphrasing here): at the end of the day you end up working on something you need, and then you think “if I need this maybe someone else does to” so you publish the source code and let others use it. This stuck with me because, well, if I publish something I found useful and nobody else finds it useful, oh well. But if they do, that’s really great! I am not saying Google open sourced k8s out of sheer goodness of their hearts, but I think it’s a lot harder to maintain that sensibility when the project is VC backed.
>"k8s was a second system with all the learnings and experience of building such a system at Google for over a decade. Mesos was birthed by grad students and subsequently evolved into its position at Twitter ..."
This is not about Apache, but a failed open-governance for commercial open-source from Mesosphere. It's not the case with Apache Spark nor Apache Beam, HBase, etc.
Mesos (and many other Berkeley AMPLab efforts) had briliant ideas behind it and an elegant implemention that allowed for much more than what Kubernetes was desgined to.
Kubernetes was supposed to be scheduler for Mesos and Google invested in it. Let's not forget John Wilkes (Google Borg, Omega, Kubernetes) on the stage at MesosCon in 2014 https://www.youtube.com/watch?v=VQAAkO5B5Hg
While I don't know the exact reasons behind Kubernetes becoming the scheduler and resource manager, I think that has very much to do with the stewardship of the Mesos project as well.
By 2015/16 most Mesos committers were at Mesosphere. Suffice to say the open-governance was more of pain than a benefit to them. If you were at Apple, Adobe and others that relied on Mesos you didn't have much to say. Everything shifted towards DCOS and everyone was pushed towards commercial licenses.
This is said, because sadly Kubernetes didn't fix the whole problem and left us with a distributed system with a text based "API" that requires text patching to manage and borderline clueless about the lifecycle of the services running on top. Yet, it's the best we got :)
I'm an engineer at one of those companies you mentioned who worked directly on the Mesos+DC/OS platform.
Mesosphere's support was pretty much nonexistent as far as we were concerned. We had major issues open for years without any significant action. They were never resolved. We had to solve most problems ourselves.
We got so fed up that a few of us worked late nights (often up to 2AM) on a bottom-up skunkworks project to replace Mesos with Kubernetes. That project was exceptionally successful. (I'd like to point out that we are not in the Bay area and are adults with families and small children- we hated Mesos so much we were willing to stay up anyway.)
In short, Mesos has a lot of great theory but we hated it in practice. Kubernetes has some theoretical flaws but it is better experience for practitioners.
100% Agreed. Ex-Mesosphere employee (I joined in 2015 and left/was fired a year later)
The dominant ethos at Mesosphere was that they already won, and were poised to become the next 'cloud orchestration' above the cloud services. But the managers also had no empathy for developer experience -- the majority opinion was "distributed systems are hard, developers don't deserve to have a good experience", despite the new cool easy-to-use distributed systems cropping up every week. The company even developed a sham 'community edition' that was designed to fail. One person complained on our community slack channel that he left his cluster running overnight and was charged $600 for the 12 hours of use.
Within a few months of joining, I started to point out their odd technical decisions (e.g. why they decided to build their enterprise edition on Core OS rather than a trustworthy distro), and was eventually chopped for speaking up. I was fired right before a massive company offsite. When the rest of my team came back, they along with my manager were fired, too. The messsage was: we didn't need your whole team but because you spoke up we had to double screw you.
That’s cool! The issue is that when you’re selling to a massive legacy institution, the infosec on a new Linux distro that self updates to new patches is really no bueno. We lost many months on this piece and on key deals. I heard from the grapevine that they ended up having to port DCOS to Ubuntu after all, soon after I left.
Yeah we disabled the auto updates and orchestrated the updates ourselves. Saved our bacon multiple times when kernel/systemd/docker bugs got pushed out.
> Mesos (and many other Berkeley AMPLab efforts) had briliant ideas behind it
Brilliant on paper.
The idea was influenced by MR workloads on Borg. Because MR is so large a workload pattern on Borg, that it benefits from having its own scheduler, and Borg also benefit from unnecessary meddling.
The idea was only brilliant for a very small number of human users, both inside Google and in the broader industry. In terms of CPU & memory usage, mesos' approach should always take great share, because workloads that in the category of needing their own scheduler always is the big ones.
But as an open source product, its adopters are far off from that group. So you can see that all early adopters of Mesos are large corps. I seldom hear any startup applaud Mesos.
Precisely, I have repeated many times when Mesos was still relevant, that its model makes it costly to get started. It will never succeed unless it start to package a product that can provide the scheduler part.
We all know what happened after K8s. K8s works because Borg already worked for 12+ years.
> an elegant implemention that allowed for much more than what Kubernetes was desgined to.
By design K8s is more capable then Mesos. Because K8s includes scheduler. And numerous other capability. Heck, excluding performance and scalability, K8s is a level a bove Borg in terms of feature and capability. Mesos is far behind Borg even, not mentioning K8s.
Mesos does offer better scalability, on paper. I had no experience though.
> Kubernetes was supposed to be scheduler for Mesos and Google invested in it.
As an early participant, while someone might have had this plan, I certainly did not hear this ever stated by anyone. Having the ability to run on Mesos was discussed as interesting and in the spirit of community but the work we did treated Mesos as a source of learning but not as a serious thing to integrate. I’m sure others considered it, but as justicezyx says it was an inclusive idea but at best something a few people cared about.
This is much more the real story but I doubt most folks will ever hear it.
Totally agree. Mesos project always had a trouble with governance and didn't build out the larger community the way other projects did. If they had built that coalition, made the project more accessible to others, who knows, maybe it would have gone differently.
Then again, k8s succeeded in part due to the Google reputation (even if undeserved) and it's use of GoLang. I always found Mesos' use of C++ meant many of the users (often developing in Java, Go, or Node) just wouldn't contribute.
I think the problem went deeper than that actually. Mesos required that applications use a distributed system API to interface with the executors. This meant it was very challenging to write a distributed system upon Mesos, even though that was ostensibly the very thing that Mesos was meant to solve. When K8s came along and made everything manifest and YAML-based, we sort of knew that we were screwed because of how much simpler that was.
The thing is, Mesos on it's own isn't k8s. Mesos is a scheduler. k8s is a full stack. So it's always been a bit off to fully compare the two.
If your system fit clearly within what k8s did well, then you were fine and the stack worked.
If you had a more complicated architecture with parts that didn't play well with containers or k8s' scheduling model, life became pretty difficult.
That's one reason I liked Mesos is you could build the stack you needed for _your_ infrastructure.
Granted, I don't think most shops had the kind of problems that warranted Mesos, let alone k8s (that's still true today). But k8s is "good enough" for lots of problems and understandably that's where the community went.
> Everything shifted towards DCOS and everyone was pushed towards commercial licenses.
This was then partially diluted when Microsoft decided to deploy DC/OS and, from what I've heard, pushed Mesosphere to open source large chunks of DC/OS.
I went to see Mesos early in their life in their San Francisco office after a joint customer put us in touch.
Never in my life did I meet such an arrogant group of people.
First, they left us waiting in reception for an hour.
They eventually took the meeting over lunch, where we had to watch them inhaling their free food.
Some guy in plastic-leather trousers spent most of the hour lecturing us about all of the multi million deals they were doing, and how they weren't interested in speaking with anyone who couldn't write 7 figure checks.
Not once did they ask about our business or even our names if I remember correctly.
I had a similar experience with other west coast tech companies. Too arrogant for their own good and not able to put in the hard yards with stodgy old enterprise companies even when they have good technology and are early to market.
I haven't seen much overt rudeness like what you're describing, but it does mystify me how often I'll meet with another company to discuss a potential integration story that benefits us both, and instead of any engineers who can discuss technical considerations, they send multiple product managers who all want to present their own slide deck about why they're so great.
I know you're great - that's why we requested this meeting. I think we're great too. There's a reason I think our integration would be a win-win and a win for users. Can we actually talk about it now? Oh let's schedule another meeting so the fourth product manager who didn't show up for this one can share their slide deck too.
I couldn’t stop crying-laughing at what you described.
Unfortunately at many firms, Product is supposed to be the interface to customers and engineers are supposed to interface with product. This model only works if the product person is smart and humble enough to understand when to involve engineers. Often their incentives are against it as they want the credit for “landing” a big account without the helps of others, especially engineers who often report to a different org.
True. As the major corporate sponsor though they didn’t give it the best chance.
I ran one of the first Docker partners and saw first hand how both Mesos and Docker Inc had considerable mindset and interest from large enterprises for a year or so before Kubernetes matured.
They both spectacularly failed to deliver through arrogance and bad execution.
I remember visiting SFBA in 2014 or so, a friend worked in PayPal said they used mesos, kubernetes was a baby project with a lot of hype. It sure seemed mesos had won back then, and the fact that kubernetes caught up, surpassed and then succeeded so wildly would have never crossed my mind.
I worked with tech companies and some banks in the early days of Kubernetes who were deep in Mesos and many of them felt the same way. It’s really hard to see this as it’s happening.
Kubernetes is absolutely “worse is better” when it comes to scheduling. But scheduling isn’t the problem most people needed solved. Kubernetes standardized deployment patterns of multiple apps well for most people. Everything else was a nice to have (scheduling, scale, ease of initial setup, etc).
> I had a similar experience with other west coast tech companies.
I am the infra lead for a "west coast tech company", and they gave us the same guff. I knew we would not be using them after the first few minutes of listening to them.
But please do continue to hate on this coast, tell your friends. I don't want to compete with more folks for housing.
I understand how Parent’s “west coast company” tag can come off as snotty, but I’ve begun using “west coast” and “east coast” as non-pejorative shorthand when discussing with a mixed audience the two major distinct and incompatible corporate operating systems in the tech industry.
The coastal definition is obviously imprecise, e.g. HP, based in Palo Alto, is as much an “east coast company” as IBM and JPMorgan Chase in New York, however it gets the point across more effectively and non-pejoratively IMO than enterprise vs startup, old school vs new school, companies who run on VMware vs k8s companies, oracle-y or Googley, boomer company vs hipster company, etc. I’ve found every person you’re talking to, whether they’re a developer, salesperson, HR recruiter, investor, etc. immediately knows what you’re taking about in context.
Long response but I had previously been thinking about this and appreciate to hear others’ thoughts.
I interviewed 3 different engineers from Mesosphere in 2015 or so at a time when it seemed like the company was tanking and everyone was looking for a new job. At least for the people I interviewed, there was nothing to be arrogant about. We were specifically looking for engineers with this sort of experience and none of them were close to getting an offer.
I used Mesos for a few years before experiencing Kubernetes. As neat as Mesos was, it was doomed from the start.
For one, Kubernetes was, at least to some extent, a rewrite and extraction of functionality built at Google, from their production orchestration ecosystem that is Borg. The fact that Kubernetes was heavily influenced by a successful, large solution in this space allowed it to leapfrog, at least a bit, the competition.
Mesos was trying to become a solution seemingly from scratch. I worked with and interacted with a number of Mesos project members and committers, and while they were generally bright folks, their industry experience was surprisingly shallow. In my experience, Mesos changed frequently and there were enough unknown TBD components at the edges of the ecosystem to make it somewhat volatile.
Within a year Kubernetes was waaaaay ahead and Mesos started considering pivoting to support k8s.
I no longer see a purpose in Mesos, and frankly, that's okay. Too many Apache projects are on eternal life support, and they lower the bar for the rest of the Apache ecosystem, which has sort of begun to earn a poor reputation for quality (ala some of the fringe Boost libraries). Apache Foundation is no longer a label for solid, reliable software.
>As neat as Mesos was, it was doomed from the start.
I don't think it was doomed from the start; Mesos was deployed at Twitter 4 years before k8s was even released. The "problem" with Mesos was that it was focused on a future that never panned out. The "killer app" for Mesos was Spark, and the problem they were focused on solving was resource allocation for batch jobs. Mesos was supposed to be a replacement for YARN. Even Marathon, which was the de facto application launcher, was pitched as a "meta-framework". Marathon wasn't for launching services, it was for launching custom frameworks. They never really pivoted from this, and the writing was on the wall when Mesosphere decided to make Marathon as proprietary as possible and fully integrate it within their commercial solution. Once Marathon was gone, Mesos didn't really compete with k8s anymore unless you wanted to write your own framework.
Mesos started as a research project at Berkeley in 2009 and was originally focused on cluster computing frameworks like Hadoop. From the paper: "We present Mesos, a platform for sharing commodity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI."
It actually predates YARN by a few years.
But, it very quickly (in 2010) saw production use at Twitter as the foundation for Twitter's custom PaaS which was later open sourced as Apache Aurora.
Marathon's main use case was actually for running microservice application in containers, which is why it has some advanced features around managing groups of containerized apps and their dependencies. The "meta-framework" use case for launching custom frameworks was also important but basically just needs Marathon to keep a container alive. Mesosphere never made Marathon proprietary. The full code is still OSS here: https://github.com/mesosphere/marathon/
Our commercial product DC/OS just added advanced workflows through a UI on top, and better integration with the rest of the components around Mesos.
Mesos was originally an academic project out of UC Berkeley. I'm not aware of what industry connections there were, but the story is not at all similar to Kubernetes and Borg.
Aurora was Twitter's contribution -- a bit missing piece of Mesos at the time. It definitely steered Mesos towards solving for Spark, but I'm really not sure what Mesos was actively solving before then.
We used Mesos in production until 2020 (started transition to Kubernetes in 2018), and this comment is incredibly accurate. Mesos was an interesting project but the defaults were incredibly naive about production environments.
Two concrete examples: Mesos maintenance mode vs Kubernetes' cordoning and eviction APIs, and Mesos's default behavior when a Node is suddenly powered off vs Kubernetes'.
>"Two concrete examples: Mesos maintenance mode vs Kubernetes' cordoning and eviction APIs, and Mesos's default behavior when a Node is suddenly powered off vs Kubernetes'."
What was Mesos default behavior when a node was powered off?
It assumed that the Node had a temporary network fault and did not automatically remove the Node. Marathon also did not automatically schedule replacement containers on other Nodes.
Kubernetes will, after a period of time, garbage collect the Node and schedule replacements as dictated by the controller.
A challenge with Mesos is that Mesos was a piece of technology, a framework at most, instead of a product. When I was using Mesos, the selling point was flexible and efficient resource scheduling. Unfortunately, resource/machine efficiency alone does not sell well, as most of the companies and individuals have betters things to worry about, say, productivity.
> Unfortunately, resource/machine efficiency alone does not sell well...
Surprising because one of the driving forces behind accelerating adoption of on-demand IaaS and various PaaS like Serverless is that too many expensive server resources lay idle. According to James Hamilton, chief Data Center architect at AWS, server utilisation remains very low (10%–15%) despite servers being the most dominant cost of building and running a data center (which is to say, folks pay through their nose for servers yet those are under-utilized by a huge margin) [0]
I don't deny that. It's just that so many companies have so much inefficiency else where that addressing resource inefficiency has too low a marginal return.
One really does not want to saturate a fleet of servers to a point where random faults and broken SLOs are popping up all over the place. That’s first and foremost the main reason datacenter utilization is low.
>the selling point was flexible and efficient resource scheduling
There was a period when it wasn't clear that you didn't need both resource management and container orchestration. One of my colleagues was quite convinced at the time that we needed both Mesos and Kubernetes. If course, the market coalesced around Kubernetes which largely backfilled the missing capabilities.
> Apache Foundation is no longer a label for solid, reliable software
ASF is just a shelter for providing governance, infra, a legal framework for donations, and credibility to projects that went unprofitable, or weren't intended to be profitable to begin with, such as complements to commercial software. It's a responsible way to dump software you can't realistically maintain going forward, to give it a chance to build up a community by those depending on it, and often accompanied by an initial contribution to ASF. While not every software project is doing well for sure, whether ASF or not, I think the above sentence really misrepresents what ASF is doing (and arguably is doing with a lot of success).
I preferred Mesos to k8s. I think it's core architecture (a 2-level scheduler) is a better foundation. For the longest time, I felt k8s was effectively an overgrown hobby project that had no place being deployed the way it was. That had me realize something in the shower this morning:
k8s is the Rails of the cloud.
Back when Rails came out, it too was a bit of a hobby project. When coming from more established enterprise web frameworks, Rails felt like a toy. It didn't have the features, robustness, safety, and scalability of "proper" frameworks.
What did Rails do? It was easy to get started and it hid a lot of the boring and painful work of web frameworks at the time.
Through the sheer force of will of a massive community, Rails grew up and became something more the toy it started as. I was pretty arrogant in my opinions about the Rails community at first, but then I ended up working on several Rails projects over the years.
It still hides a lot under the hood, there are still arguably better technical frameworks out there and plenty of folks use it improperly, when they don't need to, and without really understanding the fundamentals, meaning that they tend to get in trouble when pushing the limits or moving outside the golden path of development.
And I feel the same way about k8s. I think it started out without anywhere near the features of similar frameworks. It didn't scale well, was simplistic in it's model, and overly complicated in it's implementation. But it was much more approachable than something like Mesos and answered the question of "why am I containerizing everything?", giving everything a purpose to those who started down the path of Docker. And now it has a huge following and industry behind it.
At this point, I've learned that what becomes popular isn't necessarily the "right" or "correct" architecture. There's a lot more to programming trends (and fads) than that. The whole industry seems to want to reinvent the wheel every decade, almost like some sort of planned obsolescence to justify our work. Nevertheless, it's rarely wise to fight against the tide and when you have the enough of the industry moving in a direction, we can make even toys into real tools.
I actually think Kubernetes is a better foundation for a 2-level scheduler system than Mesos is. (In k8s land, they call this the operator pattern[1]). Each operator creates Pod objects in k8s with constraints/affinity/anti-affinity, and the K8s scheduler decides on your operator's behalf where each pod will go.
Pending pods (pods that aren't assigned to boxes yet) are also a really useful signal for cluster autoscaling that is annoying to calculate in Mesos. (The Mesos master has no idea how many pods each framework _wants_ to launch, so we ended up writing code that knows how to deduce the resource demand from each framework's API or UI.)
On the other side of the same coin, the framework in Mesos has no idea the total resources available to the cluster - it has to wait to be offered each box's resources in turn. This usually means frameworks either:
a) accept the first offer which matches the basic requirements for a task, even if there are better places to run that task, or
b) accept every single offer, under the assumption that they're the only framework on the cluster, and then implement their own scheduler on top of this pool of resources. (I call this the "monoframework" approach.)
The former approach is how Mesos is supposed to work, but can lead to all sorts of bad outcomes, e.g. all the copies of a Marathon service running on the same box, or Spark having to use timeouts to know when it should give up waiting for offers if it doesn't get enough right away.
The latter approach can lead to better placements, but defeats the purpose of the two-level scheduler, as no other frameworks can use the resources that aren't being used by the monoframework.
Under Kubernetes, any operator can query the state of the cluster and make informed decisions about what to request.
My frame of reference for the argument about Mesos vs k8s is from earlier in the timeline when the k8s was up and coming and just starting to compete with Mesos. Just as in my analogy with Rails: at it's inception, it lacked a bunch of features, but it was accessible. Years later, Rails had many of those features.
k8s is the same. Operators didn't come around until late 2016 by CoreOS and even then they weren't widely adopted. It wasn't until after RedHat bought CoreOS and pushed operators as a pattern into k8s that the pattern took off. As it is, the k8s version of the operator framework only went 1.0 last year.
Finally, I should have pointed it out in my original comment, the Mesos vs k8s comparison isn't perfect apples and apples. Mesos is just a component in a stack, whereas k8s is effectively a full stack. Again, Rails was pitched as an opinionated, batteries-included framework compared to many of the more focused frameworks that, for their niche, may have been better, but the convenience of having everything in one packages won out.
> To validate our hypothesis that specialized frameworks providevalue over general ones, we have also built a new frame-work on top of Mesos called Spark ...
Yes, this is not a surprise. The fun fact part is that Spark began as merely a way to validate the hypothesis of Mesos and look how the hypothesis validation project has taken off. Neat!
On the topic of arrogance, I wasn't in that meeting referenced, but having worked with most people there who could have been, and being a Brit who's worked for west coast startups like that for nearly 10 years, it's that American confidence, outlook and drive (along with Sand Hill road) that contributes to so much success out there. I'm a fan, but have to check myself every time i use 'super' as an adjective.
Tech vs Product is a good comparison to make. I banged my head many times on DC/OS (the commercial version of Mesos) and underlying Mesos itself. Solid tech, but I'd have liked to have seen the product develop faster, to realise its potential in the enterprise market, which was huge.
Mesos made a ton of new contributions to distributed resource management. Resource offers and application integration can allow high cluster utilization and efficiency. Giving applications the opportunity to take part in resource allocation and scheduling was similar to the Exokernel design, and led to many interesting middleware architectures.
Mesos also introduced Dominant Resource Fairness for allocating resources to competing applications, which has become an important concept in CS and specifically computational economics in its own right.
Mesos achieved the rare combination of using CS research towards a widely deployed operational system. Its C++ code is simple enough to understand and hack, and seemed like one of those projects that you can/want to reimplement on your own "in a few days".
Are there examples of high-utilization, large-scale Mesos deployments? Mesos didn't even gain over-commit until 2015, so it seems like it was generally behind the state of the art.
OK but what was the utilization? I'm not really sure K8s is state-of-the-art either. There are published research papers about very-large-scale clusters with 80%+ resource utilization.
In our production experience, utilization had far more to do with the service owners (or autoscalers/auto-tuners) correctly choosing the cgroups and CPU scheduler allocations, as well as the kernel settings for cgroup slicing and CPU scheduler. We had Mesos clusters with 3% utilization and have Kubernetes clusters with 95%+ utilization. But we also have Kubernetes clusters with <10% utilization.
To be fair, Kubernetes right now only schedules relatively small clusters. But it turns out that the majority of the world is not Facebook or Google and only needs relatively small clusters.
There must be some folks from Criteo lurking here. I'm an ex-Criteo'er and if memory serves we had something on the order of 10K nodes running mesos/marathon. We did all kinds of silly things to it, like running very CPU intensive .NET/Core apps.
We also ran HiveServer2 and the Hive Metastore in Mesos, though that wasn't super CPU intensive (that was a pain, but mostly due to our Kerberos deployment).
The general use case of Mesos/Marathon always worked for us just fine (self-executable JVM apps), though there was plenty of Mesos hate at Criteo (and eventually Kubernetes spun up, though I left about a year ago and don't know its footprint).
PS, Hi Greg S! Hi Maxime B! <-- if you're reading :).
IIRC you could always overcommit in Mesos using DRF weights and accepting resource offers in your application. I could be wrong.
The larger point is that Mesos introduced a new, exciting way to do truly distributed allocation (where the cluster manager (i.e., Mesos) and various applications coordinated and cooperated in how they use computing resources). In contrast, Kubernetes is centralized, pretty vanilla, and I would love to know what new ideas it has introduced (from an algorithmic and architecture perspective).
Twitter. From generic caches to ad serving; from stream processing to video encoding, all high utilization applications of either one or multiple schedulable resources.
These jobs had their allotted quotas, per team, giving them above >70% utilization in their logical slice of the cluster. E.g. video processing team gets 20,000 nodes globally. They stack (co-locate) their tasks (interpret: set of processes) however they want.
Granted Twitter operated one big shared Mesos+Aurora offering for everything*, the whole cluster high utilization wouldn't give much flexibility to absorb load, or do reasonable capacity planning (which was an entire org in itself) when you own and operate those machines and data centers. I can't comment much on the 20-30% figure given in MesosCon, it's been more than 5 years since I was last privy to these figures.
I worked for Twitter up until 2017 and when I was there it was much higher than 20-30%, definitely >50%. It's very possibly changed since then, but at least at that point in time Twitter was running Mesos on many thousands of machines.
I've run this at scale, in production since 2015 and it has been absolutely rock solid and does most of the production things you'd want. Unlike commercial products, it was written to run HubSpot's own infrastructure so it does what a production system needs. Really bummed to have to downgrade to something like K8s in the near future.
Thanks for the kind words! Singularity is still a core piece of infrastructure for running stateless apps at HubSpot (our status page currently reads ~13k Requests, ~22k Active Tasks, ~700 Agents). We're also heavily investing in Kubernetes, but Singularity is solid enough that our focus has been more towards reliably running and scaling stateful things like MySQL, ZK, Kafka, HBase, and ES.
Nomad is pretty darn easy to run and scales far. I help run about 20 clusters in day job from 100 to 1800 nodes, we’re going to push one of them to over 8500 nodes here soon too.
We don’t use the Singularity driver but we wrote two of our own and they work well and are easy to maintain.
We used Mesos as our first container orchestration stack. It worked OK but Kubernetes came along and offered a one stop solution for everything. In Mesos you had to use separate projects for container management (Marathon), discoverability (Consul/HAPROXY). It seemed more geared to people that wanted to run their own stacks for such tasks. For a small to medium sized operation it was difficult to solve these issues where really they are issues everyone running a stack of containers has. This was in 2014 so it was early but k8s came along with everything we needed.
I used mesos a while ago across a couple of cheep VPS for personal projects and services and it was great. I especially liked that I could used a shell executor to run services with filesystem access. For many environments this probably didn't make sense but for me I could put some secrets on the filesystem and services could make unix sockets available to nginx for SSL and hostname based forwarding.
Also with Nix I could easily install software the specific services needed and it was trivial to integrate into the garbage collector making it much faster than launching a docker container.
That being said the project wasn't moving that fast and it was a bit buggly for me and nodes would regularly get into loops where they couldn't authenticate with the master for various reasons (I think there were timing issues and timeouts that caused issues over the internet). Now I'm using Kubernetes but I have all of this complicated virtual networking that I don't need, I'm locked into docker-like containers and the thing is so complicated that I need to pay someone to run it for me.
Personally think AWS ECS is a 3rd place contender. Would add sprinkles to it if only they'd allow yaml files vs json configs in the aws-cli. ecs-cli and copilot are alright. Generally prefer to stay as close as possible to aws-cli
Having worked at a company running Docker swarm at... medium(?) scale... I have witnessed a truly shocking variety of bugs in its network stack. I always wondered if it was something specific to the company setup, but the result is that we just couldn't stay on a platform that would randomly fail to give containers a working network connection to each other.
For over 90% of workloads kubernetes is an overkill. Only when company is reaching google scale kubernetes make sense.
A good alternative to kubernetes is LXD [1] or just stick with docker compose. Kubernetes except for managed services from cloud providers is more difficult than an average application to manage and a huge cost in itself to run and maintain.
> For over 90% of workloads kubernetes is an overkill. Only when company is reaching google scale kubernetes make sense.
I hear people repeating this truism all day, but from practical experience, it doesn't seem to be the case - Kubernetes is paying dividends even in small single-node setups.
K8s single noder and GKE user here, can confirm I wouldn't remotely even consider going back. Deploying an app takes 5-10 minutes at most first time around, new pushes <1 minute, and when there is ops bullshit involved, it is never wasted work, and never needs to be repeated twice.
I hated Kubernetes ops complexity at first, but there really isn't that much to it, and if it's too much, a service like GKE takes 70%+ of it away from you
For a different perspective, learning Kubernetes and using it widely gives you a universal set of tools for managing all sorts of applications at all scales. https://news.ycombinator.com/item?id=26502900
Sure, this is kind of true, but only in the most depressing way possible. Kubernetes is overengineered and terrible but it's also just about the only game in town if you want a vendor-agnostic container orchestration system.
This is the same situation as Javascript circa 2008: "Learning this absolute dogshit language and using it widely will give you the ability to write universal applications that will run in browsers everywhere and make you very employable."
You're not wrong about k8s today, and wouldn't have been wrong about JS in the past, but boy is it a sad indictment of our industry that these things are true.
Kubernetes is engineered to solve the five hundred different problems that different people have in a way that works for them. If you don't do that then everyone will complain about their feature or some edge case being missing and not use the system (see comments on mesos in this thread). That's not over engineering, that's the required amount of engineering for this sort of system.
But also, k8s "secrets" are not, in fact, secret, you can't actually accept traffic from the web without some extra magic cloud load balancer (cf https://v0-3-0--metallb.netlify.app/ maybe eventually) or properly run anything stateful like a database (maybe soon).
Forget covering "everyone's" use cases: From where I'm sitting, k8s is an insanely complicated system that does a miserable job of covering the 95% use cases that Heroku solved in 2008.
It's great that k8s (maybe) solves hard problems that nobody except Google has, but it doesn't solve the easy problems that most people have.
Yes, yes, there's a million zillion Kubernetes ingress things, none of which are really good enough that anyone uses them without a cloud-provider LB in front of it. Also they only deal with HTTP/S traffic. Got other types of traffic you want to run? Too bad, tunnel it over HTTPS.
If you want a picture of the future of computing, imagine everything-over-HTTPS-on-k8s-on-AWS stomping on a human face forever.
> Got other types of traffic you want to run? Too bad, tunnel it over HTTPS.
Then you’d expose a service, not an ingress. You can do this in a variety of ways depending on your environment.
I’m going to go out on a limb here and say you’ve never really used k8s and haven’t really grokked even the documentation.
It’s complicated, some parts more than others, but if you’re still at the “wow guys secrets are not really secret!1!” level I’m not sure how much you can really bring to the table in a discussion about k8s.
That only works to access the service from other pods inside k8s, it doesn't help you make that service accessible to the outside world. Tell me how you'd run Dovecot on k8s?
> if you’re still at the “wow guys secrets are not really secret!1!” level
I'm just pointing out one (of many) extremely obvious warts on k8s. You act like this is some misconception on my part, but it's not that silly to assume that a secret would be.
But to answer your smarm, yes, I've used Kubernetes in anger: my last company was all-in on k8s because it's so trendy, and it was (IMHO) an absolute nightmare. All kinds of Stockholm-syndrome engineers claiming that writing thousands of lines of YAML is a great experience, unable to use current features because even mighty Amazon can't safely upgrade a k8s cluster....
> That only works to access the service from other pods inside k8s, it doesn't help you make that service accessible to the outside world. Tell me how you'd run Dovecot on k8s?
Quite the opposite. I’d re-read the docs[1]. Specifically this page[2]. If you’re on AWS you’d probably wire this up with a NLB and a bit of terraform if you’re allergic to YAML. Seems like a 5 minute job assuming you have Dovecot correctly containerized.
> I'm just pointing out one (of many) extremely obvious warts on k8s. You act like this is some misconception on my part
It’s hard not to point out misconceptions like the one above.
> ClusterIP: Exposes the Service on a cluster-internal IP. Choosing this value makes the Service only reachable from within the cluster. This is the default ServiceType.
Internal only.
> NodePort: Exposes the Service on each Node's IP at a static port (the NodePort). A ClusterIP Service, to which the NodePort Service routes, is automatically created. You'll be able to contact the NodePort Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
Nobody uses NodePort to expose external services directly, and I think you know that.
> LoadBalancer: Exposes the Service externally using a cloud provider's load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
As I mentioned above in this thread, requires cloud provider Load Balancer.
> ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up.
> Note: You need either kube-dns version 1.7 or CoreDNS version 0.0.8 or higher to use the ExternalName type.
This one's a new one to me, and apparently relies on some special new widgets.
Anyway, if you love k8s, I'm sure you'll have a profitable few years writing YAML. Enjoy.
> This one's a new one to me, and apparently relies on some special new widgets.
ExternalName has been around since 2016.
> Nobody uses NodePort to expose external services directly, and I think you know that.
Sure they do. Anyone using a LoadBalancer does this implicitly. If you don’t want k8s to manage the allocated port or want to use something bespoke that k8s doesn’t offer out of the box then using a NodePort is perfectly fine. You can also create a custom resource type if you’ve got some bespoke setup that can be driven by an internal API.
The happy path is using a cloud load balancer, because that’s what you’d use if you are using a cloud provider and you’re comfortable with k8s wiring it all up for you.
Has your criticism of k8s evolved from “I’m unclear about services” to “well yes it supports everything I want out of the box but uhh nobody does it that way and therefore it can’t do it”?
My criticism of k8s is it's an absolutely batshit level of complexity[0] that somehow still fails to provide extremely basic functionality out-of-the-box (unless paired with a whole cloud provider ecosystem, but then why not just skip k8s and use ECS???). I don't think k8s solves real problems most developers face, but it does keep all these folks[1] getting paid, so I can see why they'd advocate for it.
Nomad is vastly superior in every way except for mindshare; Elastic Beanstalk or Dokku is superior for most normal-people use cases.
> Nobody uses NodePort to expose external services directly, and I think you know that
I do. It provides a convenient way to integrate our non-k8s load balancers (TCP haproxy tier with a lot of customization) with services on kube. This is good for reusability and predictability while we slowly migrate services from our prior deployment targets to k8s.
Your points mostly only matter if you're running on bare metal. If you're in the cloud then you've got load balancers and databases covered by your cloud provider. I need Kubernetes to handle problems that I already have great solutions for. I want it to handle the problems that my cloud provider provides poor or very specialized (ie: lock in) solutions for. Which for me it does very well and a lot more easily than doing so without Kubernetes.
edit: Kubernetes secrets are also either good enough (ie: on par with Heroku) or your cloud provider has an actual proper KMS.
> For over 90% of workloads kubernetes is an overkill.
It's not. Take any simple web app and deploy it into managed GKE with Anthos and you automatically get SLI's and the ability to define SLO's for availability and latency with a nice wizard. Takes a few minutes to setup.
The amount of engineering needed to achieve good SLO monitoring dwarfs the engineering needed to run a simple app so it just never happened. That's no longer the case.
> Only when company is reaching google scale kubernetes make sense.
Also obviously not true given the number of companies deriving value from kubernetes.
Your statement already support that without the blessings of engineering team of Amazon, Google, Microsoft, Digital Ocean and various managed kubernetes service it's impossible for a reasonable small team to manage and monitor k8s and all of this service comes with lock-in and additional capital outlay.
Obviously for a Google Cloud Partner, more people are tied to gcp and kubernetes, higher the revenue. Its secondary if it's really necessary for an application to require k8s.
I'd imagine most small to medium companies would be running things on a cloud service using managed kubernetes. It seems mostly larger companies that are sticking with non-cloud hardware and services.
The advantage of kubernetes in that case is that there is a large ecosystem of helm charts, guides, documentation, etc. Deploying something new from scratch is fairly easy since someone else has done all the leg work and packaged it all up.
just because something is packaged does not mean it’s usable. YMMV but this is how security horror stories start. Someone ran a container they had no idea where it came from, happily used a helm chart. Most of the times it’s not even malicious - it’s outdated software because “it just works”
In my experience all the common helm charts and docker images are regularly updated. If you don't update your installation of them then you also wouldn't update a docker compose or LXD.
I don't know about this. I caution people against microservices architecture all the time, but at my company having a scheduler just made sense. We spin up and down queue workers by massive amounts every hour, and doing this with anything besides a scheduler would be really tricky.
Granted, we use Nomad, not k8s, but we definitely need a scheduler and definitely are not reaching Google-scale.
I wonder what recent large-scale adopters of Mesos are going to do. I know Dropbox deployed Mesos in the last year or so[1], even though it was pretty obvious that the project community was dead. Will the existing users found a new community?
The framework-based scheduler architecture looked like an interesting concept at first sight. But its advantage (write your custom scheduling policy, along with gaining scalability) evidently wasn't that important to make up for the effort of actually having to write your own framework to run anything. (In this regard its a bit like object storage as a file system replacement narrative, which promises scalability, if you adapt your application).
This seems to be true both for the Omega experiments at Google (which I assume still uses Borg), and for Mesos. In the end, Mesos always lacked even a minimal application deployment story. Marathon unfortunateley never got beyond being a shiny UI with a primitive application model and a hacky API.
I think if Mesos(-sphere) had recognized that gap in time, and came up with a decent framework for application deployments, instead of telling everyone to write their own framework (in C++, against a hacky code base), Kubernetes might not have had a chance, or at least we would have two alternatives to chose from. Too bad.
They have apparently hired a ton of great k8 folks who have been tasked with building an internal layer of sorts for themselves so I guess they are moving off it (though not anytime soon).
Nothing is decided yet. This is a vote thread which _might_ end with someone interested in Mesos stepping in to pick the project up.
This hasn't happened in a while with other projects (a whole bunch of Apache projects have just retired in the last few months) but with Mesos there might be a chance.
Somewhat of an end of an era. Worked at a shop that had one of the largest Mesos fleets in the world. Thousands of physical nodes, petabytes of data. True Mesos in production was something to witness.
Most of the comments here about Mesos are exactly my experience with Kubernetes. Replacing the words and I'm just nodding my head:
"Kubernetes changed frequently and there were enough unknown TBD components at the edges of the ecosystem to make it somewhat volatile."
"Kubernetes was an interesting project but the defaults were incredibly naive about production environments."
"Kubernetes unfortunateley never got beyond being a shiny UI with a primitive application model and a hacky API."
"A challenge with Kubernetes is that Kubernetes was a piece of technology, a framework at most, instead of a product."
"In Kubernetes you had to use separate projects for container management (kubelet, kube-scheduler, kube-controller-manager, kube-proxy, cri-o), discoverability (etcd). It seemed more geared to people that wanted to run their own stacks for such tasks. For a small to medium sized operation it was difficult to solve these issues where really they are issues everyone running a stack of containers has."
In retrospect I feel this was inevitable to a few key reasons:
* k8s was a second system with all the learnings and experience of building such a system at Google for over a decade. Mesos was birthed by grad students and subsequently evolved into its position at Twitter but the engineers driving the project (myself included) did not have experience building cluster management systems. We learned many things along the way that we would do differently a second time around.
* Mesos was too “batteries not included”: this fragmented the community, made it unapproachable to new users, and led to a lot of redundant effort. Most users just want to run services, jobs and cron jobs, but this was not part of Mesos and you had to choose from the array of ecosystem schedulers (e.g. Aurora, Marathon, Chronos, Singularity, etc) or building something yourself.
* Mesosphere was a VC backed startup and drove the project after Twitter. Being a VC backed startup you need to have a business model that can generate revenue. This led to a lot of tensions and mistrust with users and other vendors. Compare this with Google / k8s, where Google does not need to generate revenue from k8s directly, it can instead invest massive amounts of money and strong engineers on the notion that it will improve Google’s cloud computing business.
Even if k8s hadn’t come along, Mesos was ripe to be disrupted by something that was easier to use out of the box, and that had a benevolent home that could unite vendors and other contributors. Mesos could have perhaps evolved in this direction over time but we'll never know now.