A Neural Turing machine is a type of neural network with addressable memory - it learns rather than being explicity programmed.
The net also learns to boot, i.e. how to read/ write it's own memory from scratch.
Stochastic Gradient Descent learning relies on backpropagation of the error corrections to the individual weights to make a slight improvement - this requires a differentiable neural net.
This version innovates with content addressable memory. The authors demonstrate learning from big data sets and with reinforcement learning, (i.e. trial and error as an embodied agent).
I'll attempt to explain what I understand from the paper. This will be higher level, and abstracted because that's the best I can do as an undergraduate currently.
The DNC essentially gives the neural network external memory. This memory is controlled by read heads, and write heads. These heads are actually just part of the vector output of the neural network. Each of them has a key vector, which is used to find where in memory to read or write. This is done by finding the most similar vector in memory, using cosine distance. The neural network can then use the memory. It also has mechanisms to remember the order that the memory vectors were written in, to allow for sequential recall.
The network itself is trained by reinforcement learning generally. I'm not entirely sure what is done with the parts of the vectors that are not used at a given timestep, when only input and not output is needed.
Again, this is what I believe I was able to understand from the paper. Please let me know if I got anything wrong.
Can someone explain what a DNC is?
Assume I know most CS undergrad topics -- or can at least google them; eg, I know what a Turing machine is but not what a neural Turing machine is.