More

moonikakiss · 2025-06-04T00:16:01 1748996161

I did due-diligence on Builder.AI for a venture firm I was interning at (circa 2019). It was extremely apparent (Glassdoor, talking to any employee) it was complete BS.

When I say apparent, it took less than 15 minutes and a couple of google searches to get a sniff of it.

Somehow, you can still raise $500MM ++.

I think about that a lot

aprilthird2021 · 2025-06-04T00:19:28 1748996368

You have to elaborate! What were the signs? When you did due diligence what were you told about the company? Was the marketing or premise itself fishy or you only realized it was fraudulent after starting the due diligence?

moonikakiss · 2025-05-29T03:30:22 1748489422

totally agreed on 3. You're also missing the challenges of dealing with updates/deletes; and managing the many small files.

CDC from OLTP to Iceberg is extremely non-trivial.

pradn · 2025-05-29T04:09:26 1748491766

The small writes problem that Iceberg has is totally silly. They spend so much effort requiring a tree of metadata files, but you still need an ACID DB to manage the pointer to the latest tree. At that point, why not just move all that metadata to the DB itself? It’s not sooo massive in scale.

The current Iceberg architecture requires table reads to do so many small reads, of the files in the metadata tree.

The brand new DuckLake post makes all this clear.

https://duckdb.org/2025/05/27/ducklake.html

Still Iceberg will probably do just fine because every data warehousing vendor is adding support for it. Worse is better.

moonikakiss · 2025-05-15T02:20:50 1747275650

great point.

with pg_mooncake v0.2 (launching in ~couple weeks), you'll be able to get a columnar copy of your Postgres that's always synced (<s freshness).

Keep your write path unchanged, and keep your Postgres where it is. Deploy Mooncake as a replica for the columnar queries.

NegativeLatency · 2025-05-15T22:28:23 1747348103

I wish, someone unfortunately picked mysql

moonikakiss · 2025-05-15T00:25:48 1747268748

great blog. It seems like you might benefit from columnar storage in Postgres for that slow query that took ~20seconds.

It's interesting that people typically think of columnstores for strict BI / analytics. But there are so many App / user-facing workloads that actually need it.

ps: we're working on pg_mooncake v0.2. create a columnstore in Postgres that's always consistent with your OLTP tables.

It might help for this workload.

I_am_tiberius · 2025-05-15T01:41:44 1747273304

That sounds awesome. Are you saying you still use your normal OLTP table for writing data and the columnstore table is always in sync with that OLTP table (that's fantastic)? I ready it works with duckdb - how does it work? I guess there's no chance this is going to be available on Azure Flexible Server anytime soon.

moonikakiss · 2025-05-15T02:17:48 1747275468

exactly. we take the CDC output / logical decoding from your OLTP tables and write into a columnar format with <s freshness.

We had to design this columnstore to be 'operational' so it can keep up with changing oltp tables (updates/deletes).

You'll be able to deploy Mooncake as a read-replica regardless of where your Postgres is. Keep the write path unchanged, and query columnar tables from us.

--- v0.2 will be released in preview in ~a couple weeks. stay tuned!

I_am_tiberius · 2025-05-15T02:36:05 1747276565

Ah, I see. So there's a replication process similar to ClickHouse's MaterializedPostgres. Ideally, there would be functionality allowing a columnstore query to wait until all writes to the OLTP tables — up to the query's execution time — are available. This would make the system truly Postgres-native and address issues that no other system currently solves.

moonikakiss · 2025-05-15T03:19:36 1747279176

yep exactly. we can wait for replay LSN. So you're only reading once all writes to OLTP are complete.

I_am_tiberius · 2025-05-15T02:15:30 1747275330

A follow up question: You can't join columnar tables with OLTP tables, right?

moonikakiss · 2025-05-15T02:18:25 1747275505

yes you can. Even if the columnar tables are in the read replica. you'll be able to do joins with your OLTP tables

I_am_tiberius · 2025-05-15T02:32:25 1747276345

That's great, thanks.

compton93 · 2025-05-15T00:33:00 1747269180

What are your thoughts on Fujitsu's VCI? I typically work for ERP's but im always advocating to offload the right queries to columnar DB's (not for DB performance but for end user experience).

moonikakiss · 2025-05-14T21:43:58 1747259038

Even more so after Neon + Databricks.

moonikakiss · 2025-05-14T20:35:25 1747254925

it was a couple things. 1. It's really really hard to replace anyone's OLTP. 2. OLTP and OLAP are owned by such different teams. Who do you make your champion? 3. The modern HTAP dream is possible without something like SingleStore. You need a columnstore that can keep up with your OLTP tables and provide transactional correctness. Who cares if it's all within one system

ps: I worked at SingleStore. https://www.mooncake.dev/blog/htap-is-dead

moonikakiss · 2025-05-14T20:16:02 1747253762

This is really, really exciting. I see it as the “right” way OLTP and OLAP will converge.

The OP and I built an HTAP system at SingleStore. A single database with one copy of data for both OLTP and OLAP workloads. HTAP never took off [0].

What we learned was that OLTP (Postgres) should handle OLTP, while OLAP (data warehouses/lakes) should handle OLAP, with replication between them.

Designing the 'up-to-date' replication between these systems is hard.... columnar stores just aren’t built for OLTP‑style writes, and can't keep up with your OLTP tables.

Let’s see if Databricks and Neon can pull this off

“give me up‑to‑date Postgres tables in Unity Catalog", no debezium --> kafka --> flink --> Iceberg. With Spark jobs in the back ensuring that Iceberg is an optimal state.

https://www.mooncake.dev/blog/htap-is-dead

rajnathani · 2025-05-15T09:57:17 1747303037

By OP you mean Nikita Shamgunov the founder of Neon who also founded MemSQL (SingleStore) earlier, right?

moonikakiss · 2025-05-14T19:56:12 1747252572

With the push towards open table formats (Iceberg) from both Snowflake and Databricks, it's even harder to get your Postgres OLTP tables ready for OLAP.

The problem isn't in the CDC / replication tools in the market.

The problem is that columnar stores (especially Iceberg) are not designed for the write /upserts patterns of OLTP systems.

They just can't keep up...

This is a big problem we're hoping to solve at Mooncake [0]. Turn Iceberg into an operational columnstore. So that it can be keep up (<s freshness) with your Postgres.

https://www.mooncake.dev/

ako · 2025-05-14T21:02:29 1747256549

Is Iceberg involved in every read/write? I thought it was mostly metadata?

zhousun · 2025-05-15T17:27:47 1747330067

DataFile(parquet) is not enough for table with update/delete, (they are part of iceberg "metadata"). for CDC from OLTP use-cases, the pattern involves rapidly marking rows as deleted/ insert new rows and optimizing small files. This is required for minutes-latency replication.

And for second latency replication, it is more involving, you actually need to build layer on top of iceberg to track pk/ apply deletion.

moonikakiss · 2025-05-11T15:46:18 1746978378

Neon actually does have a columnstore extension with pg_mooncake today. The key difference with pg_mooncake vs hydra is the bet on open storage formats (Iceberg).

(1) https://neon.tech/docs/extensions/pg_mooncake (2) https://www.mooncake.dev/blog/clickbench-v0.1

moonikakiss · 2025-04-10T17:08:22 1744304902

Is streaming into Iceberg / R2 catalog going to become a priority?