Model facts, not your problem domain

memexy · on June 11, 2020

> When requirements change, an append-only data model of immutable facts is more useful than a mutable data model that models the problem domain.

Interesting viewpoint and it sounds like a Prolog/Datalog database.

Similarly, many document databases like CouchDB will keep a revision number so the database itself will keep track of the last N revisions and garbage collect stale data.

ruuda · on June 11, 2020

The Prolog/Datalog comparison is fair, Datomic (the subject of the “Deconstructing the Database” talk linked at the end) is built around this idea of accumulating facts, and it uses a Datalog-like query language.

But you don’t need any special tools. A good old relational database will do, just don’t do updates or deletes. In my case I was using a simple SQLite database.

Event sourcing is also kind of the same idea.

refset · on June 11, 2020

It's worth a mention that there is a next level of challenge for backfilling and correcting historical data - that's when a bitemporal data model is a good idea. In other words, keeping track of the "valid time" (or "application time") of when a fact became true, separately from the "transaction time" (or "system time") of when the fact was ingested into the database.

> you don’t need any special tools

Absolutely agreed. I happen to work on https://opencrux.com which is designed from the ground up for handling bitemporal data, but I was chatting to someone a couple of days ago who built a really neat approximation of Crux purely on Postgres, using JSON columns and 3 simple tables: Transaction_Log table + All_Documents table + Current_Documents table. A small handful of triggers were used to automate the timestamping and the population of the Current Documents table, but it was super simple stuff.

Naturally there is a big difference in scale and performance compared to something like Crux (where you can join within historical timeslices _efficiently_), but I was impressed by how flexible JSON indexing in Postgres made the approach viable with such little upfront effort.

It's probably fair to say that the majority of queries in typical applications won't ever care about prior history, but I think always having history available for native querying without having to import anything from archives or logs is fundamentally liberating.