I see. To my small brain it sounds like TD can intelligently memoize or cache th...

frankmcsherry · on Dec 8, 2020

TD lets you write whatever logic you want (it is fairly unopinionated on your logic and state).

Differential dataflow plugs in certain logic there, and it does indeed maintain a synopsis of what data have gone past, sufficient to respond to future updates but not necessarily the entirety of data that it has seen.

It would be tricky to implement DD over classic Spark, as DD relies on these synopses for its performance. There are some changes to Spark proposed in recent papers where it can pull in immutable LSM layers w/o reading them (e.g. just mmapping them) that might improve things, but until that happens there will be a gap.

shay_ker · on Dec 9, 2020

Gotcha. Thanks for answering all my q's!