Hacker News new | past | comments | ask | show | jobs | submit login

What I'd be more curious to hear about, is how they deal with super nodes and if they're storing a map (or any kind of routing table) or simply using generic modulo hashing on a key.



Right now, it's a lookup dict in our Django app--which involves brief downtime just to update the shard map when moving the data (more on this in another post). We hash on user ID (in most cases) and then look the shard # in the dict, then look up the shard # in a logical-to-physical dict.

By the way we use & love Sentry!


Ah so you don't actually have 1000 (or whatever) schemas set up right off the bat?

Or I might be misunderstanding, and you're saying that you're just mapping which physical server has the schemas


We set-up the schemas ahead of time, and then each lookup is (with, say, 1000 schemas):

  user_id % 1000 -> schema ID
  schema ID -> database ID
then select FROM schemaID.tablename etc on that particular database.


Ah ok that's what I figured. We do a lot of the same things right now with various DBs (we just describe it as pre-sharding so we can avoid the actual re-sharding dilemma). We've had to do this with both PostgreSQL and Redis now.

The schemas are definitely a neat way to handle this though so you dont have to worry about table names.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: