How do you handle distribution of documents among nodes? Say document X has 3 ta...

coffeemug · on Oct 1, 2013

Documents are distributed across nodes based on a primary key. Currently we use range-based sharding, but will be moving towards hash-based sharding soon. So, for any given document, we look at the primary key and determine where a document should be.

We store a secondary index for a shard on the same node where the master for the shard resides. So if you're storing users whose last names are between A-M on node A, all secondary indexes for users with last names between A and M will also be on node A. That means that for any secondary index query we have to contact all nodes that have shards for a table, but we do a number of systems tricks to make this really efficient.

continuations · on Oct 1, 2013

> but we do a number of systems tricks to make this really efficient.

Can you talk about these tricks? I'd love to learn more about it.