It's worth a mention that there is a next level of challenge for backfilling and correcting historical data - that's when a bitemporal data model is a good idea. In other words, keeping track of the "valid time" (or "application time") of when a fact became true, separately from the "transaction time" (or "system time") of when the fact was ingested into the database.
> you don’t need any special tools
Absolutely agreed. I happen to work on https://opencrux.com which is designed from the ground up for handling bitemporal data, but I was chatting to someone a couple of days ago who built a really neat approximation of Crux purely on Postgres, using JSON columns and 3 simple tables: Transaction_Log table + All_Documents table + Current_Documents table. A small handful of triggers were used to automate the timestamping and the population of the Current Documents table, but it was super simple stuff.
Naturally there is a big difference in scale and performance compared to something like Crux (where you can join within historical timeslices _efficiently_), but I was impressed by how flexible JSON indexing in Postgres made the approach viable with such little upfront effort.
It's probably fair to say that the majority of queries in typical applications won't ever care about prior history, but I think always having history available for native querying without having to import anything from archives or logs is fundamentally liberating.
> you don’t need any special tools
Absolutely agreed. I happen to work on https://opencrux.com which is designed from the ground up for handling bitemporal data, but I was chatting to someone a couple of days ago who built a really neat approximation of Crux purely on Postgres, using JSON columns and 3 simple tables: Transaction_Log table + All_Documents table + Current_Documents table. A small handful of triggers were used to automate the timestamping and the population of the Current Documents table, but it was super simple stuff.
Naturally there is a big difference in scale and performance compared to something like Crux (where you can join within historical timeslices _efficiently_), but I was impressed by how flexible JSON indexing in Postgres made the approach viable with such little upfront effort.
It's probably fair to say that the majority of queries in typical applications won't ever care about prior history, but I think always having history available for native querying without having to import anything from archives or logs is fundamentally liberating.