SIMT is inherently more power efficient than MIMD. Less control flow logic/flop....

deepnotderp · on June 12, 2017

In theory yes, but clean restarts like the Adapteva Epiphany and the Rex Neo can get better efficiency than GPUs because they don't suffer from legacy issues while still running legacy OpenCL code.

As for the matrix multiplier ASICs like the tpu and Volta, I consider them to be incredibly uncreative and an insult of sorts to computer architecture to call that a "deep learning processor". What happens when tomorrow SPNs or graph ConvNets dominate? A proper application specific processor will be able to adapt and still maintain efficiency.

Obviously i have some bias and hubris here, but our simulations show consistently superior efficiency to the tpu while running the same workloads while still retaining the ability to adapt to other computational graphs that TensorFlow may choose to run.