I did due-diligence on Builder.AI for a venture firm I was interning at (circa 2019). It was extremely apparent (Glassdoor, talking to any employee) it was complete BS.
When I say apparent, it took less than 15 minutes and a couple of google searches to get a sniff of it.
You have to elaborate! What were the signs? When you did due diligence what were you told about the company? Was the marketing or premise itself fishy or you only realized it was fraudulent after starting the due diligence?
The small writes problem that Iceberg has is totally silly. They spend so much effort requiring a tree of metadata files, but you still need an ACID DB to manage the pointer to the latest tree. At that point, why not just move all that metadata to the DB itself? It’s not sooo massive in scale.
The current Iceberg architecture requires table reads to do so many small reads, of the files in the metadata tree.
great blog. It seems like you might benefit from columnar storage in Postgres for that slow query that took ~20seconds.
It's interesting that people typically think of columnstores for strict BI / analytics. But there are so many App / user-facing workloads that actually need it.
ps: we're working on pg_mooncake v0.2. create a columnstore in Postgres that's always consistent with your OLTP tables.
That sounds awesome. Are you saying you still use your normal OLTP table for writing data and the columnstore table is always in sync with that OLTP table (that's fantastic)? I ready it works with duckdb - how does it work? I guess there's no chance this is going to be available on Azure Flexible Server anytime soon.
exactly. we take the CDC output / logical decoding from your OLTP tables and write into a columnar format with <s freshness.
We had to design this columnstore to be 'operational' so it can keep up with changing oltp tables (updates/deletes).
You'll be able to deploy Mooncake as a read-replica regardless of where your Postgres is. Keep the write path unchanged, and query columnar tables from us.
--- v0.2 will be released in preview in ~a couple weeks. stay tuned!
Ah, I see. So there's a replication process similar to ClickHouse's MaterializedPostgres. Ideally, there would be functionality allowing a columnstore query to wait until all writes to the OLTP tables — up to the query's execution time — are available. This would make the system truly Postgres-native and address issues that no other system currently solves.
What are your thoughts on Fujitsu's VCI? I typically work for ERP's but im always advocating to offload the right queries to columnar DB's (not for DB performance but for end user experience).
it was a couple things.
1. It's really really hard to replace anyone's OLTP.
2. OLTP and OLAP are owned by such different teams. Who do you make your champion?
3. The modern HTAP dream is possible without something like SingleStore. You need a columnstore that can keep up with your OLTP tables and provide transactional correctness. Who cares if it's all within one system
This is really, really exciting. I see it as the “right” way OLTP and OLAP will converge.
The OP and I built an HTAP system at SingleStore. A single database with one copy of data for both OLTP and OLAP workloads. HTAP never took off [0].
What we learned was that OLTP (Postgres) should handle OLTP, while OLAP (data warehouses/lakes) should handle OLAP, with replication between them.
Designing the 'up-to-date' replication between these systems is hard.... columnar stores just aren’t built for OLTP‑style writes, and can't keep up with your OLTP tables.
Let’s see if Databricks and Neon can pull this off
“give me up‑to‑date Postgres tables in Unity Catalog", no debezium --> kafka --> flink --> Iceberg. With Spark jobs in the back ensuring that Iceberg is an optimal state.
With the push towards open table formats (Iceberg) from both Snowflake and Databricks, it's even harder to get your Postgres OLTP tables ready for OLAP.
The problem isn't in the CDC / replication tools in the market.
The problem is that columnar stores (especially Iceberg) are not designed for the write /upserts patterns of OLTP systems.
They just can't keep up...
This is a big problem we're hoping to solve at Mooncake [0]. Turn Iceberg into an operational columnstore. So that it can be keep up (<s freshness) with your Postgres.
DataFile(parquet) is not enough for table with update/delete, (they are part of iceberg "metadata").
for CDC from OLTP use-cases, the pattern involves rapidly marking rows as deleted/ insert new rows and optimizing small files. This is required for minutes-latency replication.
And for second latency replication, it is more involving, you actually need to build layer on top of iceberg to track pk/ apply deletion.
Neon actually does have a columnstore extension with pg_mooncake today.
The key difference with pg_mooncake vs hydra is the bet on open storage formats (Iceberg).
When I say apparent, it took less than 15 minutes and a couple of google searches to get a sniff of it.
Somehow, you can still raise $500MM ++.
I think about that a lot