Hacker News new | past | comments | ask | show | jobs | submit login
Tango: Distributed Data Structures Over a Shared Log (2013) (microsoft.com)
93 points by another on Nov 13, 2016 | hide | past | favorite | 14 comments



Here is a nice summary of the Tango paper. http://muratbuffalo.blogspot.com/2014/09/paper-summary-tango...


Sounds like a low-level implementation of event sourcing/CQRS. Is there anything similar that is usable right now? Perhaps built on Kafka?


We're using eventuate[0], which is an event-sourcing framework with deep support for cooperation via shared logs. It's based on the actor framework akka; akka itself has akka-persistence[1], which is similar but different[2]. All of these techs are usable right now.

Though it doesn't feature either implementation (he does something similar on top of Samza), I like this article[3] on the topic: turning the database inside out really is what we're doing.

[0] http://rbmhtechnology.github.io/eventuate/

[1] http://doc.akka.io/docs/akka/snapshot/scala/persistence.html

[2] http://krasserm.github.io/2015/05/25/akka-persistence-eventu...

[3] https://www.confluent.io/blog/turning-the-database-inside-ou...


We have built an implementation of CORFU [1] (the protocol Tango is based on) that runs on Ceph/RADOS, called ZLog [0]. We have a very simple prototype of Tango running ZLog. ZLog could run on other storage systems like Kafka but we have only focused on Ceph/RADOS as the underlying storage.

[0]: https://github.com/noahdesu/zlog

[1]: https://www.usenix.org/conference/nsdi12/technical-sessions/...


Apache Samza, also from LinkedIn, is built on Kafka and I think could be used to do something like this


"the abstraction of a replicated, in-memory data structure (such as a map or a tree) backed by a shared log"

If I read just this piece of text anywhere, the word popping up in my mind would be zookeeper


Indeed, one of the prototype services built on Tango and evaluated in the paper was a Zookeeper clone.


One of the more eye opening aspects of the paper is just how little code it took them to duplicate the Zookeeper API atop Tango. Granted there are some caveats about a research project vs an industry ready codebase, but I still interpret it as strong evidence that their approach is a good foundational abstraction.


A couple of my friends have been looking at this paper and created their own visualization implementation: https://github.com/derekelkins/tangohs


Maybe add '2013' to the title?


Why need a shared log? Remember the CAP theorem. No need for these bottlenecks. If you want to store that A happened after B, just have A store a (hash of) B.


That's a type of logical clock you're describing (without a partial order over all events, just 2 events). Obviously, if you do that with all events, you will have a logical clock. The hash of the previous event is not a good logical clock, as you cannot define higher level operations over the values, such as - is this event 'newer' than this other event.


I'm pretty sure literally nothing in distributing software is as simple as you're trying to make this sound.


Read the linked paper, it explains comparisons and its use cases.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: