ArangoDB on Mesosphere Using Marathon and Docker

pjgomez · on Jan 26, 2015

Clear evidence that I am getting old and being left behind: of the four technologies named in that title I only have heard of one and have never actually used it.

vezzy-fnord · on Jan 26, 2015

You're still better than I am. I've heard of the latter three, I've never used a single one, and I still find it very difficult to understand what the core of the second one (Mesos) actually is because it's so shrouded in buzzwords and vague descriptions.

All I got was that it's some sort of userspace microkernel geared around abstracting the computing resources of machines (nodes) in categories like CPU/RAM/disk and expose them programmatically through a cluster-wide API. The master daemon makes resource offers to a slave daemon, which in turn runs ported applications (be it Hadoop, Jenkins, ElasticSearch or whatever) called Mesos frameworks that have to be ported to this Scheduler/Executor paradigm (sort of like a cluster resource-level MapReduce, I presume?) in order to perform Tasks.

Mesosphere is the whole stack as an integrated OS. Marathon is pretty easy to understand - it's an init system/framework supervisor that operates on a higher level of abstraction than simple OS processes.

But yeah, it's a pretty tangled mess.

Wilya · on Jan 26, 2015

In a bit more practical terms:

Mesos is a generic cluster manager. Its job is to keep track of available resources (RAM/CPU/Disk space) on a group of machines, and to share these resources between "Frameworks", which are more or less specialized distributed applications.

Available framework include existing applications that have been ported to Mesos (Jenkins, Hadoop, Elasticsearch, ...), along with some that have been developed specifically for Mesos (Chronos for running cron-like tasks, Marathon for running long-running tasks).

For example, you can tell Marathon "I want to run 10 instances of ./my_code, with 512MB of RAM and 2 cores each", and the framework and Mesos will collaborate to figure out where to run them. As a bonus, Mesos has some Docker integration, so you can natively do things like "Start 20 instances of container my_organization/my_app".

The datacenter OS thing is a bit of a shaky metaphor, but the idea is that, like the OS does for a single computer, the role of Mesos is to view your cluster resources as a single pool, and to try to fairly allocate these resources between multiple (distributed) applications.

atonse · on Jan 26, 2015

Maybe, but this article hits the super deluxe jackpot for most buzzwords in a headline.

Argorak · on Jan 26, 2015

Considering that it is a hyperspecific instruction manual and not a blog article, that's probably a good thing. It mentions all the topics.

googamooga · on Jan 26, 2015

This is rather cool, but data in ArangoDB will not survive docker container restart, so, sadly, this setup is still just a toy.

mateuszf · on Jan 26, 2015

It's possible to use volumes functionality to keep db data persistent.

sargun · on Jan 26, 2015

But, it wont be persisted across restarts in Mesos-land, because Mesos doesn't have the idea of data volumes. The way that Google (Omega) gets around this is having GFS deal with replication, and persisting all of their databases into the shared file system.

don71 · on Jan 26, 2015

There is an interesting discussion on the mailing list http://mail-archives.apache.org/mod_mbox/mesos-user/201410.m...

googamooga · on Jan 26, 2015

In order to use docker volume you have to stick your ArangoDB docker instance to exact same mesos worker. Otherwise database instance may be started by marathon on any random chosen worker, without saved volume. IMHO, sticking database instance to exact cluster node renders whole idea of using mesos and marathon completely useless.

lhc- · on Jan 26, 2015

You could mount a network volume on all your mesos hosts and make that available to the DB container, so that no matter which host handles that job, your data will persist easily.

tinco · on Jan 26, 2015

What would the performance characteristics of that be? What configures the network volume? Isn't configuring and maintaining a network volume comparable to configuring and maintaining a database?

Mesos is missing something to deal with persistent storage. From what I've heard all teams that employ Mesos don't actually run their persistant storage clusters in Mesos. Perhaps handy for them, but for me that takes most of the use out of Mesos.

aabhay · on Jan 26, 2015

Mesosphere's DNS implementation (https://github.com/mesosphere/mesos-dns) helps address that problem. You can use dns routing to create an internal storage or implement a cache that routes cache misses to an external store.

googamooga · on Jan 28, 2015

Mesos-DNS has nothing to do with data persistance on mesos. It is just another implementation of cluster service locator.

m_mueller · on Jan 26, 2015

That's a rather large overlook in a tutorial for a DBMS based app though.

chazu · on Jan 26, 2015

This DBMS sounds interesting, but a quick search of the organization's repos on github and I couldn't figure out which one actually represents the core DBMS itself.

sander71 · on Jan 26, 2015

The repo for DBMS itself is here: https://github.com/triAGENS/ArangoDB

preillyme · on Jan 29, 2015

Shouldn't that be on Apache Mesos NOT Mesosphere?