Can you tell me what the performance impact is with code logic and extra dB call...

withinboredom · on Aug 10, 2022

The problem with sharding is that you can't join, as most databases don't support "distributed joins" so relationship constraints don't even make sense. Thus it has to be done in code and often requires multiple queries to various servers to get the full picture. Most of this can be optimized by denormalizing as much as possible for common queries and leaving normalized records for less common queries.

There's a performance impact, for sure, but there's really not much that can be done without switching to some database technology that does sharding for you and can handle distributed joins (rethinkdb comes to mind, but I'm not sure if that is maintained any more). But then you have to rewrite all your queries and logic to handle this new technology and that probably isn't worth it. Even if you were to use something like cockroachdb and postgres, cockroachdb doesn't support some features of postgres (like some triggers and other things), so you'll probably still have to rewrite some things.

mlni · on Aug 10, 2022

> I am fairly convinced that most modern DBs have almost zero impact from relationship constraints

It cannot fundamentally be zero impact, the database needs to guarantee that the referred record actually exists at the time the transaction commits. Postgres takes a lock on all referenced foreign keys when inserting/updating rows, this is definitely not completely free.

Found this out when a plain insert statement caused a transaction deadlock under production load, fun times.