Hacker News new | past | comments | ask | show | jobs | submit login

SIMT is inherently more power efficient than MIMD. Less control flow logic/flop. Even then, it makes sense to devote dedicated logic for specific algorithms. Even NVIDIA GPUs (Volta) are going to have special matrix multiply hardware (tensor cores) to increase power efficiency and performance.

The future lies not in flexible programming model but dedicated hardware/IP. Look at the crypto block, ISP, h264/265 encoding/decoding, and now tensor cores. It's mentioned in what seems like every architecture paper in the last ten years, but dark silicon is driving the need to differentiate compute into smaller blocks. We can pack more and more transistors into a chip, but we can only power a smaller and smaller section of it at any given time. It only makes sense that we make whatever that can be powered on be as efficient as possible.




In theory yes, but clean restarts like the Adapteva Epiphany and the Rex Neo can get better efficiency than GPUs because they don't suffer from legacy issues while still running legacy OpenCL code.

As for the matrix multiplier ASICs like the tpu and Volta, I consider them to be incredibly uncreative and an insult of sorts to computer architecture to call that a "deep learning processor". What happens when tomorrow SPNs or graph ConvNets dominate? A proper application specific processor will be able to adapt and still maintain efficiency.

Obviously i have some bias and hubris here, but our simulations show consistently superior efficiency to the tpu while running the same workloads while still retaining the ability to adapt to other computational graphs that TensorFlow may choose to run.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: