Datomic: this is not the history you're looking for

evanweaver · on July 10, 2017

This is a standard bi-temporal model; valid time and transaction time are the terms the author is looking for.

We plan to add more dynamic query control over temporality to FaunaDB to solve this problem; you will be able to do a half-temporal join and get your past blog post versions by their current tags and other tricks.

pjungwir · on July 10, 2017

Temporal tables are a pet interest of mine, so here are some links that people might find useful:

Richard Snodgrass, Developing Time-Oriented Database Applications in SQL.

Hugh Darwen & C.J. Date, "An Overview and Analysis of Proposals Based on the TSQL2 Approach".

Krishna Kulkarni & Jan-Eike Michels, "Temporal features in SQL:2011".

Tom Johnston, Managing Time in Relational Databases. (I haven't read this one yet.)

Tom Johnston, Bitemporal Data.

Magnus Hagander, "A TARDIS for Your ORM": https://www.youtube.com/watch?v=TRgni5q0YM8

Also I've yet to see anyone even talk about managing DDL changes in temporal databases (well Magnus mentions that he won't cover it), but it's the same problem as discussed here. It's very practical, but you could write a dissertation or a book about it.

jeffdavis · on July 10, 2017

Also Temporal Data and the Relational Model https://www.amazon.com/gp/aw/d/B005UY0W0E/ref=mp_s_a_1_1?ie=...

A great book which inspired Range Types in PostgreSQL: https://www.postgresql.org/docs/current/static/rangetypes.ht...

pjungwir · on July 10, 2017

Thank you for range types! They are awesome and I have lost count of how often I've used them. I even have a blog post you might enjoy about defining an inetrange: http://illuminatedcomputing.com/posts/2016/06/inet-range/ (Corrections/suggestions welcome. :-)

I know about that Date book but it is pretty low on my priority list. I've read a few other things by him and they seem like great ideas but hopelessly impractical for a working programmer like me (e.g. Tutorial D). And I am a little sore that rejecting Snodgrass's TSQL2 seems to have set back temporal databases by about 20 years. The Date paper I mentioned above has a lot of criticism but gives no real alternative proposals. So is that book better? Does it have something to add?

jeffdavis · on July 10, 2017

Glad to see all the ways people use them!

Date and Darwen are clear-thinking and they write well. But you are right that they are quick to criticize and some of the alternative solutions they offer are unconvincing.

The most obvious example is their handling of NULL, which had a bunch of great criticisms followed by a totally impractical altenative ("special values"). If they had been a bit more humble, maybe they would have just borrowed Maybe/Option types from ML.

On the other hand, they really helped me make the connection between logic and databases. And that helped me a lot in practical ways (e.g. exploring dirty data precisely).

Regarding temporal data, they divorced from SQL and started from first priciples. That clarified a lot of things for me and then I adapted those ideas to SQL.

So: their books are great, but I don't recommend joining their cult ;-)

EDIT: I think a lot of SQL implementations are going about temporal the wrong way. They should focus on the fundamental building blocks first (like ranges of time) and build up from there.

pjungwir · on July 11, 2017

Okay, I appreciate the reply! I admit they had some good points about problems they found in some TSQL2 edge cases (mentioned in that paper of theirs). I will reconsider how quickly I get to their temporal book. I agree knowing a more pure way to do relational is helpful even if your day-to-day is still SQL.

evanweaver · on July 10, 2017

Temporal DDL is the reason why FaunaDB late binds validations and indexes.

It decouples the history of the schema from the history of the data, and migrating the past is not very practical anyway.

eriknstr · on July 10, 2017

If you could add links for the papers you mentioned that'd be nice. Assuming they are available to the public.

valw · on July 10, 2017

Thanks, it's good to know there's an established terminology. I'll do my research and update the article.

pjungwir · on July 10, 2017

I saw your note in the post apologizing for making up your own terminology. I wouldn't feel too bad. In the SQL world no two authors I've read use the same terminology for these things! :-) Perhaps some terms are more "standard" than others, but they are all far from universal.

pgt · on July 10, 2017

The least ambiguous terminology for me has been Martin Fowler's: `occurred_on`, `noticed_on` and additionally `recorded_on`. That way you keep track of when an event occured, when the accountants realized it and when the system recorded it. Sometimes these matter, esp. when dealing with distributed state.

gphilippart · on July 10, 2017

This issue was discussed in the Datomic google group back in 2012. I think a satisfying solution was found in the end. https://groups.google.com/forum/#!topic/datomic/zDoFsxKgARQ. Also, there's this blog post: http://blog.podsnap.com/bitemp.html

valw · on July 10, 2017

Thanks!

moomin · on July 10, 2017

It's a very important distinction though. Arguably Datomic is missing the point here, which is that t_truth and t_recorded are often different. My classic example of this is an accounting system. Sometimes you want to know "what was the P&L on Dec 2016?" sometimes you want to know "what was the P&L on Dec 2016 believed to be on Dec 2016"? And sometimes you want to do differences between the two: e.g. Dec2016 using Dec2016 facts vs Dec2015 using Dec2016 facts.

Both are equally important, but Datomic blesses one and not the other.

GorgeRonde · on July 11, 2017

Where I work at, we have a special database used to log any state of certain noteworthy user personal info. We also store declarations about future states. We hesitated between a store-facts-then-aggregate model à la Datomic and a bitemporal model. A bitemporal model stores states with a validity period and instead of relying on a single punctual timestamp as with facts, makes use of two date fields to model an interval. Current states are encoded with a +Infinity until_timestamp. Updating a state, means closing the previous state's interval (setting its until_timestamp to Time.now) and opening a new interval (from: Time.now, until: +Infinity or the next future state's from_timestamp). This is stil monotemporal. So far we can log/store data, pretty much the same way Datomic does.' and set states in the future. But in the event we store data about future states, that is also store dates in our temporal system we are tempted to model it with the same interval-based heuristics. Why would we do such a thing if we already can set events in the future thanks to the intervals ? Here is the thing : you can only keep one of these two info: - the interval for which the state was the freshest state in the database - the interval declared by the user about the future state

These two correspond to the temporal two axis of a bitemporal database. They are are most often named system and validity temporal axis. Axis is a bit vain I think as it's an expression that conveys the idea they all entertain the same homogenous relationships as in a space whereas it's in practice more like a subtle dependency graph. In the context of tax-related declarations ahead of time we store both the moment the declaration has been stated (and subsequent amendments that may be made to it) as well as the year for which it applies. For this to work the validity axis heuristics must be built on top of that of the system axis. You have to put one close-previous-interval-open-next-one semantic over another one. You could even build a n-tower of such edit mechanisms in a n-temporal database. But in practice you most likely won't need to go linear like that. Suppose that you also want to confirm all those special facts your customer tell about themselves and you want to log it the same way. Unless you do not need to validate present declarations about future states you won't need to make this confirmation axis stand on top the validity axis that himself stands on top of the system axis (a 3 temporal-system).

What's fun is that with the benefit of a system-axis you can schedule data-maintenance in the future. It's also very fast on a relational database engine.

But it messes up with your relationships cardinalities. You will need two ids on your table. One for the state, and one for the item it models. 1_to_1 becomes 1_to_n when the right-end is temporalized. This gets really hairy if you want to temporalize relationships.

Edit: The author writes: >event time: the time at which stuff happened. >recording time: the time at which you're system learns that stuff happened. >(Disclaimer: this terminology is totally made up by me as I'm writing this.)

In short I do not think such a terminology exists, there is no definite terminology but only metatimes over metatimes and as many field-driven reasons to bring them up, and why not store them all in a badass npm-style time-axis dependency oriented byzantine database.

owebmaster · on July 10, 2017

The problem 1 he describes would be solved with software engineering. Schema migration is a complex issue. It is funny that he asked the intern to do this part. Bad article.

valw · on July 10, 2017

Right, I forgot, curating editorial content is best done via software engineering :) what was I thinking?

owebmaster · on July 10, 2017

If you need to do a HUGE schema migration in a running software, for sure. If it wasn't a running software, put it as a Datomic problem is even more juvenile.