Hacker News new | past | comments | ask | show | jobs | submit login
ArangoDB on Mesosphere Using Marathon and Docker (github.com/arangodb)
45 points by Harrisburg on Jan 26, 2015 | hide | past | favorite | 18 comments



Clear evidence that I am getting old and being left behind: of the four technologies named in that title I only have heard of one and have never actually used it.


You're still better than I am. I've heard of the latter three, I've never used a single one, and I still find it very difficult to understand what the core of the second one (Mesos) actually is because it's so shrouded in buzzwords and vague descriptions.

All I got was that it's some sort of userspace microkernel geared around abstracting the computing resources of machines (nodes) in categories like CPU/RAM/disk and expose them programmatically through a cluster-wide API. The master daemon makes resource offers to a slave daemon, which in turn runs ported applications (be it Hadoop, Jenkins, ElasticSearch or whatever) called Mesos frameworks that have to be ported to this Scheduler/Executor paradigm (sort of like a cluster resource-level MapReduce, I presume?) in order to perform Tasks.

Mesosphere is the whole stack as an integrated OS. Marathon is pretty easy to understand - it's an init system/framework supervisor that operates on a higher level of abstraction than simple OS processes.

But yeah, it's a pretty tangled mess.


In a bit more practical terms:

Mesos is a generic cluster manager. Its job is to keep track of available resources (RAM/CPU/Disk space) on a group of machines, and to share these resources between "Frameworks", which are more or less specialized distributed applications.

Available framework include existing applications that have been ported to Mesos (Jenkins, Hadoop, Elasticsearch, ...), along with some that have been developed specifically for Mesos (Chronos for running cron-like tasks, Marathon for running long-running tasks).

For example, you can tell Marathon "I want to run 10 instances of ./my_code, with 512MB of RAM and 2 cores each", and the framework and Mesos will collaborate to figure out where to run them. As a bonus, Mesos has some Docker integration, so you can natively do things like "Start 20 instances of container my_organization/my_app".

The datacenter OS thing is a bit of a shaky metaphor, but the idea is that, like the OS does for a single computer, the role of Mesos is to view your cluster resources as a single pool, and to try to fairly allocate these resources between multiple (distributed) applications.


Maybe, but this article hits the super deluxe jackpot for most buzzwords in a headline.


Considering that it is a hyperspecific instruction manual and not a blog article, that's probably a good thing. It mentions all the topics.


This is rather cool, but data in ArangoDB will not survive docker container restart, so, sadly, this setup is still just a toy.


It's possible to use volumes functionality to keep db data persistent.


But, it wont be persisted across restarts in Mesos-land, because Mesos doesn't have the idea of data volumes. The way that Google (Omega) gets around this is having GFS deal with replication, and persisting all of their databases into the shared file system.


There is an interesting discussion on the mailing list http://mail-archives.apache.org/mod_mbox/mesos-user/201410.m...


In order to use docker volume you have to stick your ArangoDB docker instance to exact same mesos worker. Otherwise database instance may be started by marathon on any random chosen worker, without saved volume. IMHO, sticking database instance to exact cluster node renders whole idea of using mesos and marathon completely useless.


You could mount a network volume on all your mesos hosts and make that available to the DB container, so that no matter which host handles that job, your data will persist easily.


What would the performance characteristics of that be? What configures the network volume? Isn't configuring and maintaining a network volume comparable to configuring and maintaining a database?

Mesos is missing something to deal with persistent storage. From what I've heard all teams that employ Mesos don't actually run their persistant storage clusters in Mesos. Perhaps handy for them, but for me that takes most of the use out of Mesos.


Mesosphere's DNS implementation (https://github.com/mesosphere/mesos-dns) helps address that problem. You can use dns routing to create an internal storage or implement a cache that routes cache misses to an external store.


Mesos-DNS has nothing to do with data persistance on mesos. It is just another implementation of cluster service locator.


That's a rather large overlook in a tutorial for a DBMS based app though.


This DBMS sounds interesting, but a quick search of the organization's repos on github and I couldn't figure out which one actually represents the core DBMS itself.


The repo for DBMS itself is here: https://github.com/triAGENS/ArangoDB


Shouldn't that be on Apache Mesos NOT Mesosphere?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: