Hacker News new | past | comments | ask | show | jobs | submit login

If you do that then you need to keep a mapping of object IDs to shards. You can't store it on the application servers because of race conditions and memory constraints, so each object lookup will require two network trips.

They've defined 2^13 virtual shards, which is fine-grained enough that they can move entire shards between physical servers to eliminate hot spots.




I hear what you are saying, but what they have done is derived the their IDs based partially on their current choice of architecture. If, for any reason, they would like to change their architecture later, suddenly they have a real problem.

That feels flaky to me. Its a nice techy solution, and the write up was interesting, but there are definitely a few weak points in their implementation. Their computers have to be absolutely in sync for time, their ID space is entirely guessable once you know the algorithm and they have effectively hard coded their current architecture into their software.


Shard id is usually algorithmically derived rather than looking up from mapping table, which introduces unnecessary complexity. Look up Consistent Hashing for details, which deals with virtual shards, shard addition/removal, object migration in a consistent and simple manner.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: