As another commenter said, viewing a neural network as a computation graph is ho...

dekhn · on May 22, 2023

I always thought of seeing most computations of functions as computational graphs- at least, when I used MathLink to connect Mathematica to Python, it basically gives you a protocol to break any Mathematica function into its recursively expanded definition. Konrad Hinsen suggested using python's built-in operator overloading, so if you said "1 + Symbol('x')" it would get converted to "Plus(1, Symbol('x')) and then sent over MathLink to Mathematica, which would evaluate Plus(1, x), and return an expression which I'd then convert back to a Python object representation.

I don't think we talked about doing any sort of automated diff (in my day we figured out our own derivatives!) but after I made a simple eigendecomp of a matrix of floats, the mathematica folks contributed an example that did eigendecomp of a matrix with symbols (IE, some of the terms weren't 5.7 but "1-x"). Still kind of blows my mind today how much mathematica can do with computation graphs.

IIUC this is the basis of LISP as well.

dkislyuk · on May 22, 2023

The one distinction I would add with neural networks is that it's not just a recursive tree traversal that one would get when evaluating an arithmetic statement, but an actual graph: a computation node can have gradients from multiple sources (e.g. if a skip connection is added), so each node needs to keep accumulated state around that can be updated by arbitrary callers.

Of course, optimized autograd / autodiff is more parallelized than node-based message passing, but it's a useful model to start with.

dekhn · on May 22, 2023

I'd have to think about this for a while but I'm not sure I see that as a distinction. if you have a skip conneciton, that's just another node in the graph you can have to execute topologically before your dependent node, and then pass the data. over the edge when the child node is ready to consume.

What you're describing with node-based message passing sounds much more like a petri net, or other agent-based discrete event modelling system. Which is another powerful mental paradigm, but challenging to reason about.

pmoriarty · on May 22, 2023

"What you're describing with node-based message passing sounds much more like a petri net, or other agent-based discrete event modelling system."

It sounds like Smalltalk to me.

garganzol · on May 22, 2023

In terms of abstract algebra, there are no distinctions. What you call gradients are actually data flows. "From multiple sources" - means that a function can take multiple parameters (= inbound gradients, inflows).