Hacker News new | past | comments | ask | show | jobs | submit login
Differential Dataflow but at what cost? (2017) (github.com/frankmcsherry)
61 points by mrry on Jan 19, 2020 | hide | past | favorite | 11 comments



Is differential a synonym for "dynamic" (or "incremental") here?


Incremental. I think the problem is this: you have an input that you want to change a bunch of times

    input = ...;
    input += d1;
    input += d2;
    input += d3;
and you have some output that depends on the input. Whenever the input changes, the output needs to be recomputed, like this

    input = ...;     output = f(input);
    input += d1;     output = f(input);
    input += d2;     output = f(input);
    input += d3;     output = f(input);
The idea here is to, instead of recomputing the output every time from scratch, find a function g that will compute the output-difference from the input-difference

    input = ...;     output = f(input);
    input += d1;     output += g(d1);
    input += d2;     output += g(d2);
    input += d3;     output += g(d3);
And I think if you write f using the combinators from the differential dataflow framework it will be able to find g automatically for you.


Yeah, makes sense. Dynamic breadth first search has been a thing for a while; it seems like they're just introducing it with a new name. The automatic part seems cool though!


The idea is that it isn't limited to breadth first searches. It provides a way to write algorithms such that they are automatically differential in nature.


Where did you learn how it worked? Was it just through his other blog posts?


Frank’s written a tutorial style mdbook on Differential [0] that I highly recommend in addition to his blog posts. There’s also the research paper on the spawned the Timely and Differential projects, Naiad [1].

If you get into it, the library itself is well documented on docs.rs and there’s an active Gitter channel.

(Disclaimer: I work with Frank at his company, Materialize.)

[0]: https://timelydataflow.github.io/differential-dataflow/

[1]: http://sigops.org/s/conferences/sosp/2013/papers/p439-murray...


Obviously computing output-diffs from input-diffs looks a lot like automatic differentiation. Is there a precise relationship?


How would writing f using combinators yield the ability to derive g?


Each combinator knows how to produce the output diff from an input diff; the computation is then just a matter of stringing these combinators together.

For example, consider a map combinator. The output diff is just the input diff with a function applied to the data. More formally, for every incoming (data, time, diff) triple, an output triple (f(data), time, diff) is produced.

For a filter combinator, the output is (data, time, diff) if f(data) is true, or (data, time, 0) if f(data) is false.

A computation that wants to filter and map some data then just requires piping together these operators.

Things get interesting when you want to aggregate and join data, but Frank’s blog posts and documentation explain how you build that in a dataflow system far better than I can here.


The main distinction that led to a different name is that (unlike most instances of dynamic or incremental algorithms) differential dataflow allows you to move through a partial order of updates, rather than a sequence of updates. This is important for automatically updating iterative computations, if you want to do multi-temporal streaming computation, and a few other reasons.

It appeared at the time analogous to multivariate calculus, though the better analogy appears to be to Moebius inversion (roughly: integration and differentiation on partial orders). Read more here: https://en.wikipedia.org/wiki/Möbius_inversion_formula

"Moebius dataflow" would have been a pretty bad-ass name, I think we can all agree. And it might have done a better job of preparing the reader for what was about to happen to their brain.


Wow thanks a ton, looks like I have some more reading to do! :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: