Hello HN, I am Sai Srirampur, one of the Co-founders of PeerDB. (
https://github.com/PeerDB-io/peerdb). We spent the past 7 months building a solid experience to replicate data from Postgres to data warehouses. Now we're expanding to queues.
PeerDB Streams provides a simple and native way to replicate changes as they happen in Postgres to Queues (Kafka, Redpanda, Google PubSub, etc). We use Postgres logical decoding to enable Change Data Capture (CDC).
Blog post here: https://blog.peerdb.io/peerdb-streams-simple-native-postgres.... 10-min quickstart here: https://docs.peerdb.io/quickstart/streams-quickstart.
We chose queues as many users found that existing tools are complex. Debezium is the most used tool for this use-case. It has large production usage. However, a common pain point among our users is that it has a significant learning curve taking months to productionize.
A few issues are: a) Interacting through a command line interface, understanding the various settings, and learning best practices for running it in production is not trivial. Debezium UI, released to address usability concerns [1], is still in an incubating state [2]. Additionally, reading Debezium resources to get started can be overwhelming [3].
b) Supporting data formats and transformations isn’t easy. It needs a Java project, building JAR packages and setting up a runtime path on the kafka connect plugin. c)Debezium is not as native as Kafka for other queues and doesn’t offer the same level of configurability. For example, with Event Hubs, it is difficult to stream to topics spread across namespaces and subscriptions.
TL;DR Debezium aims to provide a comprehensive experience for engineers to implement CDC rather than making it dead simple for them. So you can do a lot with Debezium but need to know a lot about it.
At PeerDB, we are building a simple yet comprehensive experience for Postgres CDC. The goal is to enable engineers to build prod-grade Postgres CDC with a minimal learning curve, within a few days.
PeerDB’s feature-set isn't at Debezium's level yet, and as we evolve, we might face similar challenges. However, we're putting usability at the forefront and we believe that we can achieve the above goal.
First, PeerDB offers a simple UI to set up Postgres and Kafka by creating PEERs and initiating CDC by creating a MIRROR. Through the UI, users can monitor the progress of CDC, including throughput and latency; set up alerts to Slack/Email based on replication slot growth; investigate Postgres-native metrics, including slot size, etc. Here is a demo showing of PeerDB UI in action:
https://www.loom.com/share/ebcfb7646a1e48738835853b760e5d04
Second, for users who prefer a CLI, we provide a Postgres-compatible SQL layer to manage CDC. This offers the same level of features as the UI and is more intuitive compared to bash scripts.
Third, users can perform row-level transformations using Lua scripts executed at runtime. This enables features such as encrypting/masking PII data, supporting various data formats (JSON, MsgPack, Protobuf, etc.), and more. We offer a script editor along with a bunch of useful templates [5].
Fourth, we provide native connectors to non-Kafka targets. We also provide native configurability options tailored to these platforms. For example, with Event Hubs, users can perform CDC to topics distributed across different namespaces and subscriptions [4].
Finally, We are laser focused on Postgres, enabling specific optimizations like native metrics for replication, wait-events, and # of connections. Features like faster initial loads through parallel snapshotting and decoding transactions in-flight are in private beta.
Our hope is to provide the best data-movement experience for Postgres. PeerDB Streams is another step in that direction. We would love to get your feedback on product experience, our thesis and anything else that comes to your mind. It would be super useful for us. Thank you!
References:
[1] https://debezium.io/blog/2020/10/22/towards-debezium-ui/
[2] https://debezium.io/documentation/reference/stable/operation...
[3] https://medium.com/@cooper.wolfe/i-hated-debezium-so-much-i-...
[4] https://blog.peerdb.io/enterprise-grade-replication-from-pos...
[5] https://github.com/PeerDB-io/examples
[5] https://app.peerdb.cloud
[6] https://github.com/PeerDB-io/PeerDB
[Full disclosure, I work for Prisma and we have a similar product called Pulse (https://prisma.io/pulse)]
Another use case for CDC is compliance. I reckon that in the near future, to ensure with data compliance regulations, CDC will become the better option for devs vs traditional seek/update/delete functions.