In my experience, my pain hasn't been from managing the migration files themselv...

vortex_ape · 2024-12-28T01:38:09 1735349889

This makes a ton of sense! We've faced the same exact problem because whenever we need to add a new materialized table or column, we need to backfill data into it.

For materialized columns, it's a "easy" (not really because you still need to monitor the backfill) because we can run something like `ALTER TABLE events UPDATE materialized_column = materialized_column WHERE 1`. Depending on the materialization that can bring the load up on clickhouse because it still creates a lot of background mutations that we need to monitor, because they can fail due to all sorts of reasons (memory limit or disk space errors), in which case we need to jump in and fix things by hand.

For materialized tables, it's a bit harder because we need to write custom scripts to load data in day-wise (or month, depending on the data size) chunks like you mentioned, because an `INSERT INTO table SELECT * FROM another_table` will for sure run into memory limit errors depending on the data size.

I would love to think more about this to see if it's something that would make sense to handle within Houseplant.

vortex_ape · 2024-12-28T01:38:59 1735349939

I've opened an issue to track this https://github.com/juneHQ/houseplant/issues/30

Would love to chat more here or there if you're keen!