ClickHouse Denormalization is not the answer to slow JOINs

Merick · 2025-04-10T00:21:15 1744244475

YES!! Normalization was the BEST PRACTICE since DBMS was invented in like the 70s, and somehow, we just totally forgot about it the past 10 years ago for OLAP. 10x more expensive storage, impossible schema evolution because of data backfilling, the extra pipelines you have to build and maintain, and the cost scales with the data size and your business growth. I was just talking with a buddy of mine about this and how to run JOINs on the fly without denormalization pipelines using StarRocks in this case.

icedchai · 2025-04-10T02:22:39 1744251759

Many developers are seriously lacking in database fundamentals, especially over the past 10 years. I regularly see tables that don't have primary keys or indexes of any sort. It boggles the mind.

sidewndr46 · 2025-04-10T02:24:06 1744251846

Quite a long time ago I worked on a system using Cassandra for storing some data. The system used about 100 gigabytes for data storage of Cassandra on all of the nodes in the cluster. At some point we needed to upgrade from version 2 to 3. I decided to take a backup using the included tooling of Cassandra. After taking a snapshot and compressing it, the size was 14 megabytes. I'm still in awe that this system existed.

magicalhippo · 2025-04-10T08:01:03 1744272063

As someone who doesn't use ClickHouse or similar, it was interesting to see how the solutions were so similar to what we do in our traditional RDBMS.

One thing that we do use to good effect which I didn't see mentioned is computed columns, or materialized columns as I see they're called in ClickHouse, for optimized indexing. Though I guess that's already a staple.

Anyway, seems like a nice and thorough article.

OnTheFlyJOINs · 2025-04-10T00:25:37 1744244737

EXACTLY!!