:-) I know what inference is. It's just that the speed of inference very much depends on the model you're doing the forward pass on, and the phrase "inference can be run at 10fps" is non-sensical without also specifying the model.
I understand by your name that you may be a neural net enthusiast, but I would question how much practical experience you have.
It's a perfectly valid statement to say that a pi will run inference on 90% of the models out there, and I have experience with the same. It would be similar to claiming that (if it could), a pi could run 90% of the games out there.
Once again, I am speaking from practical experience from having implemented tensorflow neural nets on low power devices. And I sincerely get the feeling that although you're enthusiastic, you have no clue what you're talking about.
Rather than making offhand comments like facepalm, I would challenge you to either offer up some evidence to the contrary (you could start by trying to find a tensorflow model that a pi wont run), or spend more of your time doing something more practical than acting like a clueless rabid fanboy.
I believe what general_ai means is that running models is not the issue here, it's about the FPS. NVidia has special GPUs on the TX1 and TK1 for this. Ability to run a model is about having enough memory for it. Ability to apply a model to a real time task is about having the compute, which for most tasks the Pi doesn't have. IIRC Pete Warden had ported some low level ops to the Pi GPU a few years ago, a difficult task.
This is why it is likely that what Google has in store is a form of inferenve-bound co-processor resembling their TPU.
Many people know what they are talking about on this thread, you just need to pay attention I believe. There's high demand for embedded deep learning at the moment, and I've already shipped several systems for a variety of tasks. At the moment none could live at required speed on the Pi.
feelix - what kind of models do you typically run? I've spent a fair amount of time getting Neural Nets to run on Raspberry Pis and other platforms. In my experience it's possible to do inference with most models but often it's intolerably slow. For example the stock inception model that comes as a demo in the tensorflow code base takes about 10 seconds per image to do inference on my Pi 3. What domains are you typically working in? Do you have some tricks to make things run faster?
It is indeed slow (of course, pretty much everything is slow on something like a Pi). But It's still fast enough for some uses. If you can even get one inference every 5 seconds that still has a lot of applications. And that was what I was saying when I said that I don't agree with the assumption that google are working on giving TensorFlow cloud support for all of their support of the Pi. Running locally could have a lot of uses too.
Besides which, they have been implementing things like 8 bit graphs for processing on low power devices. That should result in a large performance increase for these devices. I tried it on mobile and I got decent FPS (I can't remember the exact figure) by using it. https://www.tensorflow.org/versions/r1.0/how_tos/quantizatio...
You obviously wont do any training on the pi, but low power devices have been used for inference for years now. For example here it is made to run on a phone: https://github.com/tensorflow/tensorflow/tree/master/tensorf...