Straight-up copy-paste from my development notes...
* what pivots in current db vs temporal db?
What is the difference between these pictures?
#+BEGIN_SRC text
("Current" DB)
CRUD records
^
|
v
[ current view ] -- update --> [ facts log ]
---------------------------------------------------------
[ current view ] <-- update -- [ facts log ]
| ^
v |
READ records WRITE records (as facts)
("Temporal" DB)
#+END_SRC
- Hint: It is /not/ the schema. It is /what/ "pivots".
- In both cases the current view can be understood to be a pivot table
of the facts log.
- BUT in the current DB, we must "pivot" the /process/, i.e. take a
CRUD op and transform it into an audit log. This /must/ be done
synchronously in in real-time. Whereas in the Temporal DB's case, we
must "pivot" the stored data, which we can do at any point in query
time, as of any point in time of the log.
- The complexity of Current DBs derives from /live process management/
challenges. Whereas the complexity of Temporal DBs derives from
/retroactive process management/ challenges.
/now/ is /never/ cheap. It is the most expensive non-fungible
thing. Once it's gone, it's gone. Fail to transact an audit trail for
the CRUD operation? Too bad. Better luck next time. Whereas disk space
is cheap, and practically infinite which affords Temporal DBs greater
opportunity to find a better trade-off between essential complexity
and DB capabilities. At least as long as disk space remains plentiful
and cheap.
This is why if we are modeling a Temporal DB over a Current DB, it is
preferable to write all tables as /fact/ tables and query their
auto-generated 'current view' versions, for normal query needs. For
audit / analysis needs, we can snapshot the facts tables and operate
on those out-of-band (detached from the live app). Impedance mismatch
occurs when trying to run /both/ current CRUD tables (writing to audit
logs) for some parts of the schema, in parallel with a "main" facts
table for all fact-records. In a given data system, it is better to do
either one or the other, not both at the same time.
- For small-middle-heavy usage (anything that (reasonably) fits on one machine) how to use existing database technology, so that one can store and query this stuff sufficiently fast. This is what I'm trying to do with SQLite... partly because the "V" of E/A/V benefits from SQLite's "Flexible Typing" system. In SQLite parlance, E, A are TEXT, and V is NUMERIC.
- For at-scale usage (anything that needs many machines), how to make a data system from scratch to store log-structured information efficiently, and how to query it efficiently. See what the Datomic, XTDB, and RedPlanetLabs/Rama people are doing. Essentially: separate storage and compute, event-source everything, and build a system from scratch to use it efficiently and fast at scale.
Straight-up copy-paste from my development notes...
* what pivots in current db vs temporal db?
What is the difference between these pictures?
#+BEGIN_SRC text
#+END_SRC- Hint: It is /not/ the schema. It is /what/ "pivots".
- In both cases the current view can be understood to be a pivot table of the facts log.
- BUT in the current DB, we must "pivot" the /process/, i.e. take a CRUD op and transform it into an audit log. This /must/ be done synchronously in in real-time. Whereas in the Temporal DB's case, we must "pivot" the stored data, which we can do at any point in query time, as of any point in time of the log.
- The complexity of Current DBs derives from /live process management/ challenges. Whereas the complexity of Temporal DBs derives from /retroactive process management/ challenges.
/now/ is /never/ cheap. It is the most expensive non-fungible thing. Once it's gone, it's gone. Fail to transact an audit trail for the CRUD operation? Too bad. Better luck next time. Whereas disk space is cheap, and practically infinite which affords Temporal DBs greater opportunity to find a better trade-off between essential complexity and DB capabilities. At least as long as disk space remains plentiful and cheap.
This is why if we are modeling a Temporal DB over a Current DB, it is preferable to write all tables as /fact/ tables and query their auto-generated 'current view' versions, for normal query needs. For audit / analysis needs, we can snapshot the facts tables and operate on those out-of-band (detached from the live app). Impedance mismatch occurs when trying to run /both/ current CRUD tables (writing to audit logs) for some parts of the schema, in parallel with a "main" facts table for all fact-records. In a given data system, it is better to do either one or the other, not both at the same time.