What I'd be more curious to hear about, is how they deal with super nodes and if...

mikeyk · on Sept 30, 2011

Right now, it's a lookup dict in our Django app--which involves brief downtime just to update the shard map when moving the data (more on this in another post). We hash on user ID (in most cases) and then look the shard # in the dict, then look up the shard # in a logical-to-physical dict.

By the way we use & love Sentry!

zeeg · on Sept 30, 2011

Ah so you don't actually have 1000 (or whatever) schemas set up right off the bat?

Or I might be misunderstanding, and you're saying that you're just mapping which physical server has the schemas

mikeyk · on Sept 30, 2011

We set-up the schemas ahead of time, and then each lookup is (with, say, 1000 schemas):

  user_id % 1000 -> schema ID
  schema ID -> database ID

then select FROM schemaID.tablename etc on that particular database.

zeeg · on Sept 30, 2011

Ah ok that's what I figured. We do a lot of the same things right now with various DBs (we just describe it as pre-sharding so we can avoid the actual re-sharding dilemma). We've had to do this with both PostgreSQL and Redis now.

The schemas are definitely a neat way to handle this though so you dont have to worry about table names.