Do note that the existing name in the CS literature for algorithms that can cheaply provide an "updated" result given a change in the input is dynamic, not differential. "Differential" actually refers to something else altogether, namely being able to automatically compute the gradient of some general calculation on continuous-valued inputs! (The intersection of differential and dynamic problems, where the point is to recompute answers quickly when some continuous-valued input changes, is called "kinetic".)
"Incremental" is actually a special case of dynamic, where all updates are restricted to 'adding' input data only. (Thus "incremental" algorithms, properly understood, are closely related to "online" algorithms, e.g. those which process input data in a streaming fashion as opposed to assuming a "complete" input to begin with.)
I figured the name was a pun on differential dataflow, and the readme seems to agree. Differential dataflow is a framework for doing "dynamic" computations by modelling them as dataflow with diffs of values rather than values flowing between nodes. The main user, materialize, has bags of tuples on most nodes and diffs are then tulles and the change to their counts. Differential data flow has a system called timely dataflow at its core and timely dataflow sends timestamp-like metadata with values to allow loops and for the output of the dynamic computation to be internally consistent
But I love these. From Functional Reactive Programming through propagator networks and embedding functional languages in datalog datafun-style, there seems to be this new-ish approach to computation that maybe somebody finally cleaves out of academia! The eve-lang guys were close but too ambitious. Maybe VMWare has less ambitious, more focused tool in mind.
Because datalogs are so cool! But only time I see them in practice are bespoke implementations for like ... compiler optimization tuning?
What is the difference with https://github.com/rust-lang/datafrog? It’s a Datalog engine written by Frank McSherry on top of differential dataflow, that’s used here also
Datafrog doesn't use differential dataflow. Datafrog also requires you to do index management and joins manually, but it's very performant. Some of the join strategies are super optimized, but they don't allow for certain things that may be desirable, like negation/anti-join with a dynamic relation.
This seems like it would rock for building Language Server Protocol implementations or other similar things.
There's a tool in Rust called Salsa that's tailored for this, but it's galaxy-brained (I watched an hourlong introduction video and could still barely grasp it).
Funny you say that, the Rust borrow checker tried to adopt differential-dataflow directly - the library behind this language - but Frank McSherry told them it's too complicated, let me build you an easier-to-use Datalog instead [blog]. So he built [datafrog] which is at the heart of the borrow checker today.
I'm not sure what the delta between Salsa and differential-dataflow is. My experience with differential-dataflow is that it's also galaxy brain.
> I watched an hourlong introduction video and could still barely grasp it
I can relate to that. What helped me to understand it though is looking through some projects using Salsa [1] and reading the document about how the Rust compiler uses queries [2]. You might also want to watch the discussion video with Anders Hejlsberg [3] that gives an overview of the problems that Adapton/Salsa try to solve.
If you like this kind of thing, google Self-Adjusting Computation.
One advantage of self-adjusting computation vs. the differential dataflow approach is you can convert existing imperative code to it very easily. For example, a ray tracer with self-adjusting computation is written very similarly to a ray tracer without it.
Learning DDLog was much easier for me than learning Rust and differential-dataflow. I tried writing some differential-dataflow a few years ago and it was beyond me. DDLog was very quick to pick up.
Don’t know by looking at it if it is the related to or the same as my link below but had a class where the professor liked Datalog for some reason. Entire class found it hard to learn and not really useful, and not much more work to do similar things in other languages. Very niche but I know HN loves niche-y languages.
Compared to SQL, Datalog lets you break down queries into smaller queries that are reusable.
I think Datalog got some attention in the 1980s as a logically pure dialect of Prolog, then it practically disappeared from the literature for a while. I remember it being really hard to find anything about it around 2008, then when people got really tired of OWL and other semantic web tools Datalog came roaring back in the 2010s.
Like the semantic web tools people have a hard time recognizing bog-standard database technology when they have been slightly reskinned and Datalog has long had that problem. One issue is that SQL has this crazy idiosyncratic syntax that hides the fact that it has a simple set of operators behind it, in the case of Datalog the operators are quite directly exposed which people seem to find too simple to understand.
In some ways, but it is more dynamic, that is, you can write recursive queries in a natural syntax like the way you write recursive functions in a normal programming language. You don't have awful hacks like
One computational tool I want to see is arbitrary materialized views over computation
In databases views are really powerful how we can build virtual tables that are based on joins and other data.
The value comes when we can update the underlying data from the view. This is what I want for computation. I'm not sure if differential dataflow can provide this.
> I'm not sure if differential dataflow can provide this.
Yes, it can, but you will have to write the views yourself, in Rust.
Materialize (https://materialize.com) exists, though, and compiles SQL to differential-dataflow programs, in order to provide exactly what you're asking for.
As a general answer to that question, syntax is often under-valued.
We didnt get whole fields of mathematics until a clear syntax was developed; and likewise, what patterns of thinking a syntax affords makes a big difference to the designs one chooses.
The effect is bigger when writing not writing "glue code". The goal of threading data through libraries and applying simple business rules to it is a kind of problem best solved by existing mainstream syntaxes.
That doesn't mean, as is often implicitly assumed, that therefore all syntax is basically equivalent.
A DSL for differential-dataflow would be great. A big niche/yawning chasm for such a DSL to fill would be to allow for runtime dataflow programs (i.e., to have a parser/interpreter that then relies on differential-dataflow).
It has been a while since a new version has been released. Has anything been published regarding future plans? Or is the current version "feature complete" for the foreseeable future?
Update - I emailed one of the contributors, and it is in fact in maintenance mode:
> DDlog is now in maintenance mode, because we're working on a new streaming and [incremental computation framework][1] that will be more powerful in multiple ways.
We still have customers using DDlog, so we fix issues as they come up, but not working on any new features.
I talked to some of the devs during HYTRADBOI (https://www.hytradboi.com/) in April and they seemed to be going strong - but in these trying times it’s hard to know.
Lets you define queries over some data set declaratively, and instead of recomputing the query over the entire data set every time you want an updated answer, it uses Differential Dataflow <https://github.com/frankmcsherry/differential-dataflow> to efficiently(^1) calculate the new results by updating the results of the previous query execution in response to new updates to the data set.
^1: I'm not an expert on Differential Dataflow, so I don't know what "efficiently" means in this context, other than "should be faster than running the query from scratch."
"Incremental" is actually a special case of dynamic, where all updates are restricted to 'adding' input data only. (Thus "incremental" algorithms, properly understood, are closely related to "online" algorithms, e.g. those which process input data in a streaming fashion as opposed to assuming a "complete" input to begin with.)