I cannot stress enough how foundational of an idea it is to store mathematical expressions as graphs!..
Understanding a function's computational graph is certainly useful but storing a function as a computation graph is, in fact, quite expensive. Deep learning systems don't store their computational graphs, the graphs are implicit in their computation process. Deep learning systems instead store their functions as tensors; generalized arrays/matrices. This allows both efficient storage and efficient computation. It's quite possible do automatic differentiation on these structures as well - that is the basis of "backpropagation".
It's important to distinguish useful conceptual structures like computation graphs (or symbolic representations) and the structures that are necessary for efficient computation. Automatic differentiation itself is important because the traditional symbolic differentiation one learns in calculus "blows up", can have O(m^expression-length) cost, when attempted on a large expression where automatic differentiation has a cost that is not that much higher than the base cost of computing a given function (but if that base cost is high, you lose your advantage also).
> Deep learning systems don't store their computational graphs, the graphs are implicit in their computation process
I'm not sure about other frameworks, but in PyTorch, I think it's fair to say that while graphs are implicit in the sense that they are dynamically constructed at runtime (as opposed to statically defined at compilation time as with TF), they are not implicit in the sense that they are not directly represented by PyTorch's data structures: PyTorch does store the computational graph. After a backward pass, you can ask each Tensor in the graph, for example, for its associated Node: https://pytorch.org/docs/stable/autograd.html#autograd-graph
Understanding a function's computational graph is certainly useful but storing a function as a computation graph is, in fact, quite expensive. Deep learning systems don't store their computational graphs, the graphs are implicit in their computation process. Deep learning systems instead store their functions as tensors; generalized arrays/matrices. This allows both efficient storage and efficient computation. It's quite possible do automatic differentiation on these structures as well - that is the basis of "backpropagation".
It's important to distinguish useful conceptual structures like computation graphs (or symbolic representations) and the structures that are necessary for efficient computation. Automatic differentiation itself is important because the traditional symbolic differentiation one learns in calculus "blows up", can have O(m^expression-length) cost, when attempted on a large expression where automatic differentiation has a cost that is not that much higher than the base cost of computing a given function (but if that base cost is high, you lose your advantage also).
Just sayin'