Hacker News new | past | comments | ask | show | jobs | submit login

I have a strong suspicion that bitemporalism makes a lot of these problems less problematic. The actual volumes of data are the same, but the all-or-nothingness of windowing over very large data sets in order to avoid missing anything that arrived late goes away.

I wrote shambolic stream-of-consciousness notes on it several years ago: https://docs.google.com/document/d/1ZlPp099_fV1lyYWACSyuWY_j...

The gist being that the mechanisms of windowing, triggering and retraction a la Beam are actually workarounds for a lack of bitemporalism.




We’re thinking along very similar lines! We’ve got some of our thoughts around evolving how timestamps work in Materialize written down here: https://github.com/MaterializeInc/materialize/issues/1309


Sort of -- the problem I see in the event time / processing time distinction is that it's about instants rather than intervals. There are a number of models and queries that are not reliably expressible with instants alone, unless you reinvent intervals with them.

For example, if I rely on "updated-at" and infer that whatever record has the latest updated-at is the "current" record, then I may create the illusion that there are no gaps in my facts. That may not be so.

A reference system to look at is Crux: https://opencrux.com/


> For example, if I rely on "updated-at" and infer that whatever record has the latest updated-at is the "current" record, then I may create the illusion that there are no gaps in my facts. That may not be so.

I believe that notion is captured by timely's capabilities [0]. Your capability has a current time, and you can only produce records at or greater than the current time. So you could produce a record at, say, t + 3, then t + 5, and then produce a record at t + 1. But not until you downgrade your capability to t + 6 will the record at t + 5 be considered final; downgrading your capability is how you indicate that you have the correct and final set of facts for all times less than t.

If your events can arrive out of order forever, then you have a problem, as you'll never be able to downgrade your capability because you'll never be willing to mark a time as "final." That's where bitemporalism (as mentioned in that issue I linked previously) comes into play. You can mark a result as final as of some processing time, and then issue corrections as of some processing time in the future if some out-of-order data arrives. Materialize will (likely) gain support for bitemporalism eventually, and the underlying dataflow engine supports arbitrary-dimension timestamps already.

Would be happy to chat about this more, if you're curious, but I feel like this discussion is getting a bit unwieldy for an HN thread! (At the very least I might need to put you in touch with Frank.) Feel free to reach out on GitHub [1] or our Gitter [2], or shoot me an email at benesch@materialize.io.

[0]: https://docs.rs/timely/0.11.1/timely/dataflow/operators/stru...

[1]: https://github.com/MaterializeInc/materialize/issues

[2]: https://gitter.im/MaterializeInc/community


The underlying compute framework, differential dataflow, supports multi-temporal timestamps. The Crux folks were at one point looking at it for Juxt, though not sure what they concluded.


This feels like the philosophical conclusion that Kafka Streams has made, i.e. you don't have a strict watermark, and if you really want you can theoretically keep updating and retracting data forever, and build a pipeline that magically stays in sync.


Partially, in my understanding, but not fully. An advantage of bitemporalism that is hard to recreate is queries about past and future states of belief. "What do I believe is true today?" works well with accumulation and reaction and with standard normalised schemata.

"What do I believe I believed yesterday?" is slightly harder and needs additional information to be stored. You can rewind a stream and replay it up to the point of interest, but that can be quite slow.

"What did I believe today would be, last week?", "What is the history of my belief about X?", "have I ever believed Y about X?" etc are much harder to answer quickly without full bitemporalism. So too the problem of having implicit intervals that are untrue, which is where "updated at" can be so misleading.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: