It can be great, depending on your schema and planned growth. Questions I'd be asking in your shoes:
1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:
1a. A use case where most traffic is scoped to a subset of data. (e.g. a multitenant system). This is the easiest use case- just make sure most of your queries contain the column (most likely tenant ID or equivalent), and partially denormalize to have it in tables where it's implicit to make your life easier. Do not use a timestamp.
1b. A rollup/analytics based use case that needs heavy parallelism (e.g. a large IoT system where you want to do analytics across a fleet). For this, you're looking for a column that has high cardinality witout too many major hot spots- in the IoT use case mentioned, this would probably be a device ID or similar
2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.
3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this
4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.
5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.
I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.
Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.
For a multi-tenant use case, yeah, pretty close to thinking about partitioning.
For other use cases, there can be big gains from cross-shard queries that you can't really match with partitioning, but that's super use case dependent and not a guaranteed result.
Seems like this is a similar philosophy, but is missing a bunch of things the Citus coordinator provides. From the article, I'm guessing Citus is better at cross-shard queries, SQL support, central management of workers, keeping schemas in sync, and keeping small join tables in sync across the fleet, and provides a single point of ingestion.
That being said, this does seem to handle replicas better than Citus ever really did, and most of the features it's lacking aren't relevant for the sort of multitenant use case this blog is describing, so it's not a bad tradeoff. This also avoids the coordinator as a central point of failure for both outages and connection count limitations, but we never really saw those be a problem often in practice.
We certainly have a way to go to support all cross-shard use cases, especially complex aggregates (like percentiles). In OLTP, where PgDog will focus on first, it's good to have a sharding key and a single shard in mind, 99% of the time. The 1% will be divided between easy things we already support, like sorting, and slightly more complex things like aggregates (avg, count, max, min, etc.), which are on the roadmap.
For everything else, and until we cover what's left, postgres_fdw can be a fallback. It actually works pretty well.
Any recommendations on telehealth suppliers to contact for that compounded formulation? They're easy to find, but I'm not sure who is trustworthy on this topic.
I just went through the quiz at Mochi and it said I was eligible for their nutrition program but not medication. The FAQ says your BMI has to be over 30 or 27 if you have some other health condition.
I've generally found something similar- lots of gotchas, but also some very useful products.
The best way I've found to approach it is to treat GCP as something that has to be evaluated at an individual service level. It's great if you're on one of their expected workflows/golden paths, and you can get lucky with a good fit if you aren't, but they seem to have a lot of unspoken assumptions and limits baked in that might or might not align with your use case.
Disclaimer: My use cases are pretty unusual from talking to our account rep, so this might be over-fitting to weird data.
The article mentions a need for primary keys for data sync. Does anyone know if compound primary keys are currently supported? That's been a huge pain for me with the existing stack, so it would be nice to have an alternative.
Thanks for posting this question. Composite primary key support is actively being worked on and should be available in 1-2 weeks :) - https://github.com/PeerDB-io/peerdb/pull/499
Was that break on Market Street, out of curiosity? After a related incident, I discovered that I was now the 5th co-worker a friend had that had broken a wrist or leg there.
And this isn't some untouchable problem either: better signage and markings, better separation (I still wince driving past where I fell because people are riding just a couple of feet away from tracks that will injure them), rubber strips that trains can go over while preventing the tracks from swallowing wheels things like bikes and wheelchairs.
This honestly speaks to that problem with the attitude of most of the people who have been here for a while: they just seemed to accept that this awful problem exists, but it's almost treated like a rite of passage. In other cities I feel like there'd be real widespread outrage at the whole situation and something more would happen, yet it's absolutely effortless to find hundreds of cases per year where people are seriously injured and in some cases even killed.
I've seen ZFS being used with Postgres in a few different environments. Seems to work fine for the most part- surprisingly good compression (~8X in one case, usually lower), with the major downside being increased CPU usage when taking advantage of said compression.
I think that only one or two of those environments were heavily used production instances, so if there is a serious gotcha here it might not have been apparent to me.
1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:
2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this
4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.
5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.
I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.
Disclaimer: former Citus employee