Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage

xpe · on June 14, 2014

"Currently, Apollo is developed internally at Facebook. No firm claims were made during the talk that it will be opensourced. It was mentioned as a possibility after internal development settles down." from http://java.dzone.com/articles/facebook-announces-apollo-qco...

xpe · on June 14, 2014

HN readers, what do you think are Facebook's motivations for announcing Apollo at this point?

necubi · on June 14, 2014

I'd guess that the engineers who are building it think it's cool and want to talk about it. Facebook seems to be generally open about their internal systems, presumably because they don't see it as their competitive advantage (unlike, say, Google).

alec · on June 14, 2014

Google talks a fair bit about their internal systems at this level of "descriptions but not code" - Bigtable, MapReduce, Spanner, Flume, Chubby, and more have been influential.

timothya · on June 14, 2014

In fact, they do more than just talk: they often publish papers describing how they work. The open source community has since recreated a lot of them, which has proven useful to a lot of people (e.g. HBase, Hadoop, Apache Crunch, etc.)

gaius · on June 15, 2014

Recruiting tactic.

spiralganglion · on June 14, 2014

One of their supported storage primitives is CRDT-based, according to [1]. I, for one, am really interested to see how this works in practice. I've been quite excited about CRDTs, but haven't seen enough examples of them in the wild to get a sense of their drawbacks — for instance, how difficult it is to use them to model various processes or data structures.

[1] https://twitter.com/adrianco/status/476843040330743809

trhway · on June 14, 2014

it seems that CRDT are just another face, or an implementation, of the CALM :

http://www.bloom-lang.net/calm/

"Informally, a block of code is logically monotonic if it satisfies a simple property: adding things to the input can only increase the output. "

http://db.cs.berkeley.edu/papers/cidr11-bloom.pdf

" A sufficient condition for eventual consistency is order independence ..."

platz · on June 14, 2014

Do you know of a resource for learning the basics CRDTs that doesn't require a PHD?

seiji · on June 14, 2014

The name is intimidating, but the operations are simple.

Basically, your storage has container types ("T"). A list, a set, a dictionary, etc. Container types can be split and added together in a distributed fashion ("R" and "D").

The "C" in CRDT stands for "Convergent and Commutative" to imply your distributed operations can obtain the same value when merged.

Quick example: If you have a node with a key pointing to value (set) [a, b, c] and another node with the same key but different value [c, e, f], then when the nodes communicate, they can do a set union for the actual result of [a, b, c, e, f]. Keys can keep a running log of recent operations to clean up the global result too (like: [c, e, (recently deleted f)], so on merge, if the other list has f, it would be deleted instead of re-added).

Before CRDTs were a thing, Bob made state box and it's very easy to understand. Give the README a read to understand more basics: https://github.com/mochi/statebox

platz · on June 14, 2014

That's helpful, thanks (I've downloaded some crdt videos to watch in the meantime).

At the surface they sound like something vaguely resembling an abelian group (+/- inverses), but the conflict resolution stuff is the heart of it I'd guess.

spiralganglion · on June 14, 2014

Yes, from my (limited but growing) understanding of it, they are indeed similar to abelian groups.

noelwelsh · on June 14, 2014

CRDTs are, in the basic case, a idempotent commutative monoid, aka an idempotent abelian monoid.

If this floats your boat, here's me on CRDTs: https://skillsmatter.com/skillscasts/5301-convergent-replica...

ryanobjc · on June 14, 2014

I don't think you understand the word basic...

CRDTs wont have mainstream success until people stop using the words 'monoid' and 'abelian' etc.

Most programmers aren't required to learn this kind of math in a CS degree, AND furthermore, many programmers dont have a CS degree/forgot it.

So the question is, are CRDTs a useful technique for all developers, or just a way for a minute few to demonstrate their ability to sling around math words?

noelwelsh · on June 15, 2014

Did you read the thread? The parents were using terms from abstract algebra, so I replied using the same language.

In another context I would have avoided those terms and perhaps used an explanation like this: http://noelwelsh.com/programming/2013/12/20/crdts-for-fun-an...

dahjelle · on June 14, 2014

Aral Balkan gave a talk[1] describing WOOT, a CRDT for collaborative editing. Best intro to the idea of how a more complex CRDT works that I've seen. (Really only the last 8 minutes or so are about WOOT. The rest is why he chose it.)

[1] https://www.youtube.com/watch?v=NSTZ4mIv_wk

msrivas · on June 14, 2014

I found this to be really useful -

http://research.microsoft.com/apps/video/default.aspx?id=153...

eslaught · on June 14, 2014

> Apollo, Facebook’s Paxos-like NoSQL database ...

> supports anything from a minimum of three servers to thousands

Sorry, you don't run Paxos on thousands of servers. Typical Paxos cluster sizes are 5-7. The algorithm would never converge if you did run it on thousands of servers.

teraflop · on June 14, 2014

Well, I wouldn't judge the software on the basis of the article. The words "Paxos-like database" are enough of a tip-off that it's not exactly going for rigorous technical accuracy.

mantrax5 · on June 14, 2014

It's sharded, dude. You may want (typically) 3, 5 or 7 machines per shard for redundancy and failover, but there's no limit on the number of shards you may want to have.

tluyben2 · on June 14, 2014

"is on-line low latency storage - in particular Flash and in-memory." "As distinct from a document oriented, or key value store, Apollo is about modifications to data structures, allowing you to represent maps, queues, trees and so on, as well as key values. "

Sounds like Redis?

justincormack · on June 14, 2014

With flash support though, presumably for larger than fits in memory, rather than just as a persistent store, based on the fact it uses leveldb.