Hacker News new | past | comments | ask | show | jobs | submit login

This is a good question. We did, and they are very different databases.

CRDB: Pros: natively distributed so HA and scalability are built-in, simple deployment and configuration, can run on Kubernetes for automated HA operations. Scaling tables is automatic and single-key and small-range OLTP performance is very good. Supports JSON and most data types for compatibility with most things that use Postgres.

Cons: Still maturing and has bugs like `select unnest(some_array_col)` not working. Obviously cant run any Postgres extensions so SQL w/JSON is all you get. Performance on large scans is slow, they're working on this but the distributed consensus required for queries means they will never match the latency of a single-node Postgres. Advanced queries are either very slow or unsupported or every slow.

CITUS: Pros: pure Postgres including extensions so you have access to advanced functionality. If you use shard key for queries, lack of distributed consensus gives low-latency performance just like single-node, but distributed transactions are still possible. Citus scales queries across all CPUs (on nodes holding the accessed data) so greatly improves query performance.

Cons: only distributes data in "distributed" tables (sharded) or "reference" tables (full replicas on all nodes). All other data just sits on single master node. HA uses Postgres streaming replication, requiring an inefficient 2x increase in costs, and is not seamless with failovers. Generally requires much more maintenance because it is still Postgres. Sharding does not accept multiple columns. No columnstores so large scans can still be slow, but they have ZFS in beta.

--

Summary:

CRDB for simpler OLTP with very low ops overhead and great availability, scaling, and durability.

Citus for advanced OLTP or OLAP, low-latency sharded access, and full access to all Postgres features.




Thanks! Did you consider any other options?


What scenario are you looking for?

TiDB is another competitor but mysql dialect and still early, missing lots of features.

For pure data-warehousing, we used MemSQL which is incredibly fast but can be expensive.

SQL Server is a great all around database if you want in-memory tables, columnstores, native graph queries, full-text search, and very high performance and can live with a single-node design (with optional HA cluster).


Mostly an easy and performant HA story that can scale to a huge number of queries per second for an app that is largely CRUD and vanilla webapp saas queries. Cockroach seems perfect but the performance seems pretty scary in certain scenarios.

Being able to just drop the DB into a k8s cluster and not worry too much about failover gotchas and leader election has a lot of value. As does being able to throw more nodes on for more performance. Complicated OLAP queries aren't in scope.


Either are a good fit for you, but since you have simpler queries than CRDB will be easier to run.

Performance is fine, why do you say it's scary? OLAP will just be slow, but it's also distributed and unoptimized. Highly concurrent OLTP can't really get slow unless you're trying to stretch the cluster over multiple geo regions.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: