But Mesos already provides distributed service management. Why is Marathon doing it too? And why do you need Chronos if Marathon performs the same functions?
What they're saying is, Marathon will move your jobs to a new server when one dies. Okay, cool. But Chronos can do that too. And Mesos can do that too!
--
I'm pretty sure all these tools are a giant troll by Google to get its competitors to burn R&D time on reinventing tools that already exist and aren't necessary.
If you think these tools aren't necessary, you probably haven't managed a cluster with thousands of machines and hundreds of users. I imagine that every company in that situation has an ad-hoc implementation of distributed cron and distributed upstart (I know we do).
Without something like Mesos, you generally will run different things on your cluster by statically partitioning it (these ten racks run Hadoop, this rack runs our website, this rack runs Spark because some engineers wanted to try that out, etc.) or by running everything together (typically done with your distributed file system, but can be problematic with more compute-oriented services).
The mesos approach is to stop thinking of your cluster on a machine-by-machine or rack-by-rack basis, but instead as just a giant pool of resources. It's a very powerful abstraction that greatly increases the number of machines and developers that are manageable.
I'm familiar with the concept behind it. My problem was with how they all seem to do the same things, and nobody yet has pointed this out; everyone just accepts the fact that they're mostly redundant and moves on.
I've managed SSI clusters, MPI clusters, and clusters of dumb app servers of varying sizes (10 nodes to 10,000). If you really want just a giant pool of resources, you can do much worse than an SSI cluster, but nobody wants to spend time working on a hard problem, so instead we dick around with task-shuffling job-runners inside the components that were written by the hardcore programmers that work in the kernel. But I guess we do what we can with what we have... (I blame Linus's team for not merging openMosix when they had the chance!)
I'll outline the specific goals of both Marathon / Chronos to clear any confusion:
Marathon: Execute a long running job in the cluster and make sure it keeps on running. You can specify resource requirements as well as how many instances of this job you want to run.
Examples of jobs: Rails App, Jetty Service, JBoss Service.
Chronos: You can specify a repeating & finite job based on a schedule or another job completing. You specify resource requirements.
Examples of jobs: Mysqldump (e.g. daily dump prod DB), Hadoop Job (cascading, pig, cascalog, scalding...), bash script using ImageMagick to create thumbnails.
Both of these systems are Mesos frameworks - Mesos does the heavy lifting and offers resources to these frameworks which the frameworks can accept or reject.
Thank you for the reply, I appreciate it. But from everything i've read (all the documentation for these frameworks suck, though Mason's is the most readable) still seems like Mesos supports the features of the previous two tools, for the most part. Is there some reason its feature set wasn't just tweaked a bit to handle those two cases? Like, why is there a distinction between a long-running and a short-running job? Repeating a job is also not an especially complicated task that should require a whole new framework to accomplish.
The reason i'm asking these questions is I want to know if I need to use these tools. It seems like kids these days just immediately grab up every new tool they can and try to shove them into their environment, vs trying to find the right fit or configure them properly. It feels like Marathon and Chronos are mostly unnecessary - this is evident once you realize Mesos is a framework you can build on to do the same things Marathon and Chronos do with a lot less complexity.
If you have specific requests on how to make the docs for these frameworks better, it'd be great if you filed a ticket on their respective github page.
Mesos cannot do "cron-like" things. Mesos is a resource manager for your cluster. You need a framework to actually schedule tasks. Marathon / Chronos are both frameworks.