Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So much this!

Straight-up copy-paste from my development notes...

* what pivots in current db vs temporal db?

What is the difference between these pictures?

#+BEGIN_SRC text

  ("Current" DB)

   CRUD records
         ^
         |
         v
  [ current view ] -- update --> [ facts log ]

  ---------------------------------------------------------

  [ current view ] <-- update -- [ facts log ]
         |                             ^
         v                             |
    READ records             WRITE records (as facts)

  ("Temporal" DB)
#+END_SRC

- Hint: It is /not/ the schema. It is /what/ "pivots".

- In both cases the current view can be understood to be a pivot table of the facts log.

- BUT in the current DB, we must "pivot" the /process/, i.e. take a CRUD op and transform it into an audit log. This /must/ be done synchronously in in real-time. Whereas in the Temporal DB's case, we must "pivot" the stored data, which we can do at any point in query time, as of any point in time of the log.

- The complexity of Current DBs derives from /live process management/ challenges. Whereas the complexity of Temporal DBs derives from /retroactive process management/ challenges.

/now/ is /never/ cheap. It is the most expensive non-fungible thing. Once it's gone, it's gone. Fail to transact an audit trail for the CRUD operation? Too bad. Better luck next time. Whereas disk space is cheap, and practically infinite which affords Temporal DBs greater opportunity to find a better trade-off between essential complexity and DB capabilities. At least as long as disk space remains plentiful and cheap.

This is why if we are modeling a Temporal DB over a Current DB, it is preferable to write all tables as /fact/ tables and query their auto-generated 'current view' versions, for normal query needs. For audit / analysis needs, we can snapshot the facts tables and operate on those out-of-band (detached from the live app). Impedance mismatch occurs when trying to run /both/ current CRUD tables (writing to audit logs) for some parts of the schema, in parallel with a "main" facts table for all fact-records. In a given data system, it is better to do either one or the other, not both at the same time.



The problem with `everything is a log` is that is very undisciplined, and trigger from the actual main table have this very serious advantages:

* Your main table has the correct shape, and the derived log too! * MOST of the queries are to that and for now


Yes, if the log has no regular schema, and if the overall system has no regular design pattern.

This is the minimal "standard record" that one can use, for all facts pertaining to all domain entities in one's system:

  system_time, valid_time, Entity, Attribute, Value, assert/redact
Now the problem is:

- For small-middle-heavy usage (anything that (reasonably) fits on one machine) how to use existing database technology, so that one can store and query this stuff sufficiently fast. This is what I'm trying to do with SQLite... partly because the "V" of E/A/V benefits from SQLite's "Flexible Typing" system. In SQLite parlance, E, A are TEXT, and V is NUMERIC.

- For at-scale usage (anything that needs many machines), how to make a data system from scratch to store log-structured information efficiently, and how to query it efficiently. See what the Datomic, XTDB, and RedPlanetLabs/Rama people are doing. Essentially: separate storage and compute, event-source everything, and build a system from scratch to use it efficiently and fast at scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: