New open source tech Marathon wants to make your data center run like Google’s

contingencies · on Sept 4, 2013

at a broader level, projects like Marathon tie into the greater move toward software-defined networks, storage and even data centers. Companies are trying to replace expensive gear with commodity gear powered by smart software, and being able to automate cluster management and failover is certainly part of that equation.

At least this part is spot on.

They seem to be focused on ZooKeeper (both are Apache projects). I'd very much like to see a decent comparison of Pacemaker/Corosync @ http://clusterlabs.org/wiki/Main_Page, ZooKeeper @ https://zookeeper.apache.org/, OpenReplica @ http://openreplica.org/ .. particularly given that the former is what I am most familiar with and what larger Linux businesses (RedHat, etc.) seem to be focused on.

electic · on Sept 4, 2013

Won't Docker be able to take this type of efficiency to a whole new level?

WestCoastJustin · on Sept 4, 2013

No. Docker just supports provisioning containers. Take a high level view of a data centre with 10,000 machines, you need orchestration software that knows about and automates these 10k machines, which ones are online/offline, utilization, storage, networking, current state, power distribution (what happens if a racks power drops out?), etc and then provisions these containers onto that hardware. Think of this orchestration software like a brain, and it knows about the current state of things, keeps a watch on what is happening in the data centre, fixes things when they go bad, and knows where to put things when you want to do something.

  Docker --> hardware
  Orchestration software --> Docker/LXC/VM/etc --> hardware

An additional example would be dotCloud (makers of Docker), they have orchestration software sitting atop Docker, which knows about users, machines, etc and then provision these docker instances on AWS hardware.

   dotCloud (orchestration software) --> Docker --> AWS EC2 (hardware)

There is a great wired article (linked to below [1] and in the OPs story), which outline how Google uses this orchestration software in its day-to-day operations. There is this great diagram [2], which shows Omega (Google's orchestration software), and how it deploys containers for images, search, gmail, etc onto the same physical hardware. There is an amazing talk by John Wilkes (Google Cluster Management, Mountain View) about Omega at Google Faculty Summit 2011 [3], I would highly recommend watching it!

ps. Orchestration software has a global view of resources across all machines, so it knows how to get the best utilization out of all these machine (think jigsaw puzzle). You submit a container profile to the orchestration software, things like instance lifetime, cpu, storage, memory, redudance, and the software will figure out where to place your instances.

[1] http://www.wired.com/wiredenterprise/2013/03/google-borg-twi...

[2] http://www.wired.com/wiredenterprise/wp-content/uploads/2013...

[3] http://www.youtube.com/watch?v=0ZFMlO98Jkc

electic · on Sept 4, 2013

Thanks! That clears things up. I appreciate that.

P.S. - Not to Justin but others. Funny, you ask a honest question on here and people downvote you...sigh.

wmf · on Sept 4, 2013

I think Docker is great but it's been so overexposed on HN that there's a bit of a backlash.

0xbadcafebee · on Sept 4, 2013

"Marathon launches two instances of the Chronos scheduler as a Marathon task. If either of the two Chronos tasks dies -- due to underlying slave crashes, power loss in the cluster, etc. -- Marathon will re-start an Chronos instance on another slave. This approach ensures that two Chronos processes are always running."

So basically, it's a distributed crond. I could be wrong, but I don't think this will make your datacenter run like Google's.

necubi · on Sept 4, 2013

Really, it's mesos that "makes your datacenter run like Google's," while Marathon is a framework built on top of it that provides distributed service management (think distributed upstart). Mesos is in part based on ideas from Google's Borg cluster management system.

The basic idea is to provide the abstraction of a single machine to developers, who can say things like: I want to run this program on 40 cores with 100GB of memory, and the system will find those resources across the cluster, create a cgroup with your resource limits, and start your services. It also provides various tools for writing distributed services, like an implementation of PAXOS.

0xbadcafebee · on Sept 4, 2013

But Mesos already provides distributed service management. Why is Marathon doing it too? And why do you need Chronos if Marathon performs the same functions?

florianleibert · on Sept 4, 2013

Chronos is the "cron" for the cluster. Marathon is the "upstart" for your cluster. You can start Chronos via Marathon.

0xbadcafebee · on Sept 4, 2013

Yes. I get that. Except Marathon can do 'cron-like' things too. So can Mason.

Mason manages cluster applications. So does Marathon.

Mason can juggle task resources. So does Marathon.

Mason is a framework. So is Marathon.

Mason has a scheduler. So does Marathon.

The only thing Chronos does, at all, is run jobs on a schedule, distributedly. Which Marathon and Mason can do too.

So can someone please explain to me why you need all three??

--

Look, here, at the infographics in the middle of the page: https://github.com/mesosphere/marathon

What they're saying is, Marathon will move your jobs to a new server when one dies. Okay, cool. But Chronos can do that too. And Mesos can do that too!

--

I'm pretty sure all these tools are a giant troll by Google to get its competitors to burn R&D time on reinventing tools that already exist and aren't necessary.

necubi · on Sept 4, 2013

If you think these tools aren't necessary, you probably haven't managed a cluster with thousands of machines and hundreds of users. I imagine that every company in that situation has an ad-hoc implementation of distributed cron and distributed upstart (I know we do).

Without something like Mesos, you generally will run different things on your cluster by statically partitioning it (these ten racks run Hadoop, this rack runs our website, this rack runs Spark because some engineers wanted to try that out, etc.) or by running everything together (typically done with your distributed file system, but can be problematic with more compute-oriented services).

The mesos approach is to stop thinking of your cluster on a machine-by-machine or rack-by-rack basis, but instead as just a giant pool of resources. It's a very powerful abstraction that greatly increases the number of machines and developers that are manageable.

0xbadcafebee · on Sept 5, 2013

I'm familiar with the concept behind it. My problem was with how they all seem to do the same things, and nobody yet has pointed this out; everyone just accepts the fact that they're mostly redundant and moves on.

I've managed SSI clusters, MPI clusters, and clusters of dumb app servers of varying sizes (10 nodes to 10,000). If you really want just a giant pool of resources, you can do much worse than an SSI cluster, but nobody wants to spend time working on a hard problem, so instead we dick around with task-shuffling job-runners inside the components that were written by the hardcore programmers that work in the kernel. But I guess we do what we can with what we have... (I blame Linus's team for not merging openMosix when they had the chance!)

florianleibert · on Sept 4, 2013

I'll outline the specific goals of both Marathon / Chronos to clear any confusion:

Marathon: Execute a long running job in the cluster and make sure it keeps on running. You can specify resource requirements as well as how many instances of this job you want to run. Examples of jobs: Rails App, Jetty Service, JBoss Service.

Chronos: You can specify a repeating & finite job based on a schedule or another job completing. You specify resource requirements. Examples of jobs: Mysqldump (e.g. daily dump prod DB), Hadoop Job (cascading, pig, cascalog, scalding...), bash script using ImageMagick to create thumbnails.

Both of these systems are Mesos frameworks - Mesos does the heavy lifting and offers resources to these frameworks which the frameworks can accept or reject.

You can find more info here:

http://mesos.apache.org/

0xbadcafebee · on Sept 5, 2013

Thank you for the reply, I appreciate it. But from everything i've read (all the documentation for these frameworks suck, though Mason's is the most readable) still seems like Mesos supports the features of the previous two tools, for the most part. Is there some reason its feature set wasn't just tweaked a bit to handle those two cases? Like, why is there a distinction between a long-running and a short-running job? Repeating a job is also not an especially complicated task that should require a whole new framework to accomplish.

The reason i'm asking these questions is I want to know if I need to use these tools. It seems like kids these days just immediately grab up every new tool they can and try to shove them into their environment, vs trying to find the right fit or configure them properly. It feels like Marathon and Chronos are mostly unnecessary - this is evident once you realize Mesos is a framework you can build on to do the same things Marathon and Chronos do with a lot less complexity.

florianleibert · on Sept 5, 2013

If you have specific requests on how to make the docs for these frameworks better, it'd be great if you filed a ticket on their respective github page.

florianleibert · on Sept 4, 2013

Mesos cannot do "cron-like" things. Mesos is a resource manager for your cluster. You need a framework to actually schedule tasks. Marathon / Chronos are both frameworks.

rjurney · on Sept 4, 2013

That is one service, Chronos, which is a distributed cron. Marathon is much more.

charmalloc · on Sept 5, 2013

Right, Marathon is like a distributed init.d

lambda · on Sept 4, 2013

So, how does this compare to the Corosync/Pacemaker cluster stack?

necubi · on Sept 4, 2013

I'm not familiar with those tools, but from the clusterlabs website it looks like they're solving a different problem. Mesos is primary intended for use on clusters with thousands of machines and heterogenous application needs. It provides resource isolation and scheduling as well as some components that make writing distributed systems easier.

icecreampain · on Sept 4, 2013

Skimmed through the article and I didn't find any evidence of "making your data center run like Google's". More specifically: I didn't see any live data connections to the NSA.

necubi · on Sept 4, 2013

What value could this possibly add to the discussion?