Differentiable Programming

subnaught · on Jan 3, 2016

This seems to be heavily influenced by (if not outright plagiarized from) this[0], which, needless to say, does a much more thorough job of explaining things.

[0]: http://colah.github.io/posts/2015-09-NN-Types-FP/

throwaway2322 · on Jan 3, 2016

It's a shame it comes across that way, since the article and colah's blog past are talking about completely different concepts.

colah's blog post can be roughly summarized as taking the (well-known) representation of NNs as DFGs of modules (since the very early 90s, if not earlier), and then showing how common combinators in functional programming languages correspond to subgraphs in common modules. Since both FPs and DNNs can be represented as DFGs, this isn't a particularly surprising (or novel) correspondence. Several libraries used exactly these functional combinators for symbolically expressing NNs well before his blog post.

The trend of differentiable programming (as in, using gradient-based methods to train a network to learn an algorithm) is what I think the author is trying to highlight. This has been around since e.g. Das et. al, "Learning context-free grammars: Capabilities and limitations of a recurrent neural network with an external stack memory" (from 1992). There's a lot of history (and duplication) in this area, some keywords to search for being NTMs, RL-NTMs, Pointer Networks, Neural GPU, MemNNs, MemN2N, Stack RNNs, .... - the RAM (reasoning, attention, memory) workshop at NIPS 2015 had a bunch of recent work in this area.

The high-level idea is by augmenting a (trained) controller (e.g. an LSTM) with some external datastructure(s) (stacks, queues, memory, hierarchical memory) with the ability to read from and write to the external datastructure(s), and to propagating errors back to the controller (hence differentiable programming moniker), via hard (e.g. RL) or soft attention mechanisms.

asntbqjkwbaoeuc · on Jan 3, 2016

Depending on how you look at it, this idea has been around for a long time. At least since the 1980's it's been known that functional programming can be considered the logical basis of all of math [0]. A hardliner might even say that the notion that all math is functional programming goes back to the 1920s with Brouwer.

In particular, optimization, being a branch of math, is itself a type of functional programming. Similarly, neural networks are functional programs. Programming with differentiable functions as a field on its own goes back at least as far as the 1960s. Closely related ideas like algebraic topology go back to the 1890s (this would be programming with continuous functions rather than differentiable).

The language of type theory is typically the province of computer science. The corresponding language in math is category theory. There are a few mathematicians who have been working on a categorical theory of neural networks. But this is pretty far from a mainstream research area.

Perhaps with people like Colah and Dalrymple pointing out these connections from a more applied point of view, these ideas will pick up steam.

[0] There is are technical caveats here that aren't interesting when talking about math that can be done on computers.

lebek · on Jan 3, 2016

The idea's been around for a while, this 2014 paper describes a differentiable renderer: http://files.is.tue.mpg.de/black/papers/OpenDR.pdf

sawwit · on Jan 3, 2016

It is much older than this. People have started to come up with all kinds of differentiable structures ever since they've found out that the leaning rule back-propagation of error can be derived using the chain rule in the early 1980s or so.

wrong_variable · on Jan 3, 2016

yeah, I got angry halfway reading OP since I remember colah's article.

upvote for visibility and making sure colah gets credit for coming up with the idea first.

colah goes deeper and talks about how it relates to algebraic types from functional programming.

davidad_ · on Jan 3, 2016

I submitted the OP essay with an attribution ("see Chris Olah's blog for a fuller treatment") but it was removed by the editors of Edge before publication (apparently their policy is never to send readers away from Edge). I have asked them to at least re-insert Chris' name and I hope they do so soon.

cwyers · on Jan 3, 2016

> apparently their policy is never to send readers away from Edge

Holy fucking wow. Do they know what an Internet is?

_prometheus · on Jan 3, 2016

Science converges on truths, so similar (and even the same) ideas form in many minds, often at the same time. And fundamental ideas like these are often -- in hindsight -- revealed to be pretty old; you can trace their lineage and formation across papers, talks, and books. Many people in the field, with a good measure of "conceptual agility", will get this quickly and be able to discuss at length. Not to detract any credit from individual discoverers: they are absolutely critical for the quantized improvement of ideas; often a single person will move science forward in ways the rest of the world could not. And sometimes it is many people that do. A constructive memetic environment is good for both. (it is unfortunate that Credit is THE scientific currency today; it interferes negatively...)

No doubt colah's excellent blog post dives much deeper than this essay, but please note that this was wrtten for Edge.org's http://edge.org/annual-questions book, for a general audience, with very high level perspective, with very little space, and without the ability to get technical. Don't pit them against each other! They complement!

By the way, for one interesting example of the "conceptual agility" of scientists at the very edge, check out this interesting story about Feynman -- from a David Deutsch interview (with Sam Harris) -- in which Feynman derives months of Deutsch's work in a few minutes (listen for ~5 min):

https://www.youtube.com/watch?v=J21QuHrIqXg&t=6524 (1:48:44 - 1:52:40)

Scientists' minds are well primed to understand, derive, and formulate ideas -- they are super fertile memetic environments. And what's more-- they have the epistemic filters to annihilate untruths and converge on scientific knowledge (the best explanation not yet falsified). In math (inc. computer science), it's even better because we study and manipulate the objects themselves, not data from measurements of the things. The formulation, derivation, convergence, remixing, (and so on) of ideas is much, much faster.

Knowing both colah and davidad, I confidently assert they are both among the most brilliant young scientists alive. Humanity stands to gain much from the constructive interference of their minds. As readers and commenters, let's foster that.

throwaway_bob · on Jan 3, 2016

You do realize that these things were already being studied in the 90's, right? The Grefenstette paper mentioned in this article makes use of an architecture from Das et al. in 1992. More generally, people have been talking about functional programming and relationships with AI since there was functional programming. However nice Colah's blog post is, he most certainly did not come up with any of these ideas first.

im2w1l · on Jan 3, 2016

I previously made a toy VM where a program consisted of a matrix, rows corresponded to instruction slot, and then the values in that row were probabilities for different instructions. This means probability of emitting a specific output is differentiable in the probabilities, which can then be learned.

Didn't get very good results though.

_pmf_ · on Jan 3, 2016

There's also [0], which I found to be almost eerie, where someone has tried to use evolutionary programming for an FPGA, with the result that the winning generation exploited the analogue characteristics of the concrete FPGA chip being used (i.e. it did not work on another FPGA chip of identical type).

Maybe I'm a bit peculiar, but this article sent chills down my spine.

[0] http://www.damninteresting.com/on-the-origin-of-circuits/

thomasahle · on Jan 3, 2016

So what's up with that differentiable stack? Is that similar to what the Facebook people were doing on learning addition?

davidad_ · on Jan 3, 2016

Here's the differentiable stack/queue/deque paper: http://arxiv.org/abs/1506.02516

It is indeed similar to some work from FAIR, where one of the tasks was binary addition: http://arxiv.org/abs/1503.01007

nl · on Jan 3, 2016

Yes, similar.

There is a lot of work around this going on at the moment. Google's neural Turing machine is a similar idea.

tluyben2 · on Jan 3, 2016

Some code on the subject:

https://github.com/zenna/Arrows.jl https://github.com/wojzaremba/algorithm-learning

dnautics · on Jan 3, 2016

> This newly dominant approach, originally known as "neural networks," is now branded "deep learning,"

Is this true? My impression is that "deep learning" is a series of mathematical tricks and design and implementation concepts that get you to solve neural networks of depth greater than 3-4, which becomes mathematically challenging.

visarga · on Jan 3, 2016

One other reason machine learning has acquired the "Deep" label was the rise of the computing power. It's not just a bunch of tricks, it's also plain old hardware speed that is driving the revolution.

Also, we now have access to huge datasets to try our algorithms on. Progress in ML depends a lot on the training data that is available.