Multi-layer caches are also complex, but the benefits are worth it. We already s...

Multi-layer caches are also complex, but the benefits are worth it.

We already see NPUs spanning 3 orders of magnitude in TOPs in consumer devices. In the next few years that will probably be eight or ten orders of magnitude. I don’t think a LCD approach is going to work.

I would say that everything gets pushed to the cloud, but the economics are brutal. All of those companies spending billions a year on GPUs have very strong incentives to push as much inference to the edge as possible.