Hacker News new | past | comments | ask | show | jobs | submit login
Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage (infoq.com)
135 points by sumitkumar on June 13, 2014 | hide | past | favorite | 22 comments



"Currently, Apollo is developed internally at Facebook. No firm claims were made during the talk that it will be opensourced. It was mentioned as a possibility after internal development settles down." from http://java.dzone.com/articles/facebook-announces-apollo-qco...


HN readers, what do you think are Facebook's motivations for announcing Apollo at this point?


I'd guess that the engineers who are building it think it's cool and want to talk about it. Facebook seems to be generally open about their internal systems, presumably because they don't see it as their competitive advantage (unlike, say, Google).


Google talks a fair bit about their internal systems at this level of "descriptions but not code" - Bigtable, MapReduce, Spanner, Flume, Chubby, and more have been influential.


In fact, they do more than just talk: they often publish papers describing how they work. The open source community has since recreated a lot of them, which has proven useful to a lot of people (e.g. HBase, Hadoop, Apache Crunch, etc.)


Recruiting tactic.


One of their supported storage primitives is CRDT-based, according to [1]. I, for one, am really interested to see how this works in practice. I've been quite excited about CRDTs, but haven't seen enough examples of them in the wild to get a sense of their drawbacks — for instance, how difficult it is to use them to model various processes or data structures.

[1] https://twitter.com/adrianco/status/476843040330743809


it seems that CRDT are just another face, or an implementation, of the CALM :

http://www.bloom-lang.net/calm/

"Informally, a block of code is logically monotonic if it satisfies a simple property: adding things to the input can only increase the output. "

http://db.cs.berkeley.edu/papers/cidr11-bloom.pdf

" A sufficient condition for eventual consistency is order independence ..."


Do you know of a resource for learning the basics CRDTs that doesn't require a PHD?


The name is intimidating, but the operations are simple.

Basically, your storage has container types ("T"). A list, a set, a dictionary, etc. Container types can be split and added together in a distributed fashion ("R" and "D").

The "C" in CRDT stands for "Convergent and Commutative" to imply your distributed operations can obtain the same value when merged.

Quick example: If you have a node with a key pointing to value (set) [a, b, c] and another node with the same key but different value [c, e, f], then when the nodes communicate, they can do a set union for the actual result of [a, b, c, e, f]. Keys can keep a running log of recent operations to clean up the global result too (like: [c, e, (recently deleted f)], so on merge, if the other list has f, it would be deleted instead of re-added).

Before CRDTs were a thing, Bob made state box and it's very easy to understand. Give the README a read to understand more basics: https://github.com/mochi/statebox


That's helpful, thanks (I've downloaded some crdt videos to watch in the meantime).

At the surface they sound like something vaguely resembling an abelian group (+/- inverses), but the conflict resolution stuff is the heart of it I'd guess.


Yes, from my (limited but growing) understanding of it, they are indeed similar to abelian groups.


CRDTs are, in the basic case, a idempotent commutative monoid, aka an idempotent abelian monoid.

If this floats your boat, here's me on CRDTs: https://skillsmatter.com/skillscasts/5301-convergent-replica...


I don't think you understand the word basic...

CRDTs wont have mainstream success until people stop using the words 'monoid' and 'abelian' etc.

Most programmers aren't required to learn this kind of math in a CS degree, AND furthermore, many programmers dont have a CS degree/forgot it.

So the question is, are CRDTs a useful technique for all developers, or just a way for a minute few to demonstrate their ability to sling around math words?


Did you read the thread? The parents were using terms from abstract algebra, so I replied using the same language.

In another context I would have avoided those terms and perhaps used an explanation like this: http://noelwelsh.com/programming/2013/12/20/crdts-for-fun-an...


Aral Balkan gave a talk[1] describing WOOT, a CRDT for collaborative editing. Best intro to the idea of how a more complex CRDT works that I've seen. (Really only the last 8 minutes or so are about WOOT. The rest is why he chose it.)

[1] https://www.youtube.com/watch?v=NSTZ4mIv_wk



> Apollo, Facebook’s Paxos-like NoSQL database ...

> supports anything from a minimum of three servers to thousands

Sorry, you don't run Paxos on thousands of servers. Typical Paxos cluster sizes are 5-7. The algorithm would never converge if you did run it on thousands of servers.


Well, I wouldn't judge the software on the basis of the article. The words "Paxos-like database" are enough of a tip-off that it's not exactly going for rigorous technical accuracy.


It's sharded, dude. You may want (typically) 3, 5 or 7 machines per shard for redundancy and failover, but there's no limit on the number of shards you may want to have.


"is on-line low latency storage - in particular Flash and in-memory." "As distinct from a document oriented, or key value store, Apollo is about modifications to data structures, allowing you to represent maps, queues, trees and so on, as well as key values. "

Sounds like Redis?


With flash support though, presumably for larger than fits in memory, rather than just as a persistent store, based on the fact it uses leveldb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: