I believe what general_ai means is that running models is not the issue here, it's about the FPS. NVidia has special GPUs on the TX1 and TK1 for this. Ability to run a model is about having enough memory for it. Ability to apply a model to a real time task is about having the compute, which for most tasks the Pi doesn't have. IIRC Pete Warden had ported some low level ops to the Pi GPU a few years ago, a difficult task.
This is why it is likely that what Google has in store is a form of inferenve-bound co-processor resembling their TPU.
Many people know what they are talking about on this thread, you just need to pay attention I believe. There's high demand for embedded deep learning at the moment, and I've already shipped several systems for a variety of tasks. At the moment none could live at required speed on the Pi.