I wonder what happens when you simply backprop using experience replay in either a CNN or fully connected net. Just run a random neural net, and take "samples" (inputs + outputs) every 1s or so. After 30s get an error, optionally "discount" it over time, and run backprop.