Hacker News new | past | comments | ask | show | jobs | submit login

I run a postgresql db with a few billion rows at about 2TB right now. We don't need sharding yet but when we do I was considering Citus. Does anyone have experience implementing Citus that could comment?



It can be great, depending on your schema and planned growth. Questions I'd be asking in your shoes:

1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:

    1a. A use case where most traffic is scoped to a subset of data. (e.g. a multitenant system). This is the easiest use case- just make sure most of your queries contain the column (most likely tenant ID or equivalent), and partially denormalize to have it in tables where it's implicit to make your life easier. Do not use a timestamp. 

    1b. A rollup/analytics based use case that needs heavy parallelism (e.g. a large IoT system where you want to do analytics across a fleet). For this, you're looking for a column that has high cardinality witout too many major hot spots- in the IoT use case mentioned, this would probably be a device ID or similar
2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.

3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this

4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.

5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.

I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.

Disclaimer: former Citus employee




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: