Hacker News new | past | comments | ask | show | jobs | submit login

The continuous aggregates portion of this blog (along with the breakdown of Transition, Combine, and Final Functions) reminded me of "A Theory of Changes for Higher-Order Languages: Incrementalizing Lambda-Calculi by Static Differentiation" (Giarrusso et. al.) [0].

Particularly the part of getting logically-consistent results via different routes of computation. David Kohn says this in the blog post: "But, you have to have in-depth (and sometimes rather arcane) knowledge about each aggregate’s internals to know which ones meet the above criteria – and therefore, which ones you can re-aggregate."

I think Giarrusso addresses the internal requirements in a precise mathematical manner (though I may be wrong). Giarrusso Section 2.1 is the specific area I'm thinking, particularly Definition 2.1 and then later the discussion of the abelian group where certain binary operations must be commutative and associative. Giarrusso Equation 1 in the introducton defines externally what it means to be logically consistent.

Also, Giarrusso talks about "Self-Maintainability" where updating the continuous result only needs to look at the changes to the input. What was obvious to me from reading Giarrusso was that something simple like "AVG" is not self-maintainable unless you keep the intermediate state (sum and count) seperately. Kohn's distinction of Transition Function, Combine Function, and Final Function - together with Giarrusso's abelian group and change structure - is kind of a big deal in making arbitrary functions "continuous"-capable while being "self-maintainable".

I can see designing a [data] structure that adheres to the specific algabraic laws defined by Giarrusso's abelian group along with the 3 functions from Kohn (missing from Giarrusso). You can then feed this structure to a function that spits out two new functions for you: a normal function that can be used to calculate a result, and it's "continuous" version which only needs change to the inputs. For example, "avg()" and "avg_continuous()" can be automatically derived from a data structure of algebraically well-behaved operations.

Plus this would have a really cool Category Theory diagram for it!

David: absolutely love the animated gifs and pictures. Very well written indeed.

[0] Giarrusso et. al. https://www.researchgate.net/publication/269126515_A_theory_...




Thanks for this comment and link, I think this area is one of the most interesting frontiers in computer science today. We're starting to see products like here in TimescaleDB and elsewhere in Materialize that build on these ideas, but in five or ten year's time the developer story here is going to be absolutely incredible. There is so much potential in being able to write declarative specifications for very complex stateful dataflows, that are then just maintained quickly and automagically, especially if we can do it without arbitrary windows and watermarks. Very exciting!


I'll have to read this paper, it seems interesting.

This problem also reminds me of this: https://jon.thesquareplanet.com/papers/phd-thesis.pdf


(NB: Post author here)

Glad you liked the GIFs! Will have to take a look at this. Thanks for sharing!


I also like the GIFs, which tools did you use to make these?


The GIFs were made with Adobe XD


This is the most interesting paper I’ve been pointed to in a while.


This sounds a lot like how Prometheus works.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: