Does this method make it easier to spread a neural network over multiple GPUs/ma...

nl · on Aug 30, 2016

Does this method make it easier to spread a neural network over multiple GPUs/machines?

Yes, but this isn't the primary focus of this work.

This is about a method of approximating error rates (gradient) back up the neural network.

This is important because allowing the use of approximate error rates means that earlier layers can be trained without waiting for error back-propagation from the later layers.

This asynchronous feature helps on a (computer) network too - there is no need to wait for back-propagation across the network.

As they point out the error will back-propagation eventually. The analogy with an eventually consistent database system (and the effect that has on scalability) is pretty clear.