Machine learning on mobile: on the device or in the cloud?

deepnotderp · on April 27, 2017

If you're interested in fast inference models, be sure to check out QuickNet: https://arxiv.org/abs/1701.02291

Disclaimer: I'm the author on this paper.

It shares many similarities with Google's production MobileNets albeit MobileNets is a family and QuickNet uses PReLU which has helped a lot in terms of parametric efficiency. Also there's a slight difference in the implementation of the separable convolution (no activation function in between the depthwise and the pointwise for QuickNet).

albertzeyer · on April 28, 2017

Have you published that paper anywhere on a conference or so? I guess not because it's not really in the common format. But from a quick look, it actually looks interesting, so why not try to submit it somewhere, e.g. some image conference? But you really should work on the format. I assume that this was not done with LaTeX? Then, in the table in your experimental results, what you are also comparing is number of parameters, and maybe train time, and inference time, and maybe memory consumption during inference, so that should all be also in the table. Also, there should be one or two figures which give small sketches of your models blocks, which helps to better understand the difference to e.g. XCeption. Also, as I understand, QuickNet is mostly based on XCeption? But XCeption is not in the table of your experimental comparisons. Also you mention MobileNets which is also not there. You should add that. Then, if you additionally add compression methods like 8-bit quantization, you should have a separate table where you show how much less memory it needs and how much it degrades the performance. Then, to really boost interest in your paper, it would be really nice if you publish some code. It sounds like you already implemented that in Lasagne or in Keras? So just publish that code.

deepnotderp · on April 28, 2017

The issue with LaTex is that diagrams and tables are a nightmare to deal with, but I've been meaning to get to that once I have some time (swamped atm).

XCeption is not compared because XCeption doesn't have any results on CIFAR, and this was meant to be a fast inference model, which XCeption is not.

I do provide links to full resolution images of the network topology.

A comparison of parameter count, FLOPs and other performance figures would probably be useful, you're right, I'll add that as soon as I have time.

MobileNets came out after this, and I haven't updated it in the meantime.

I used Keras but there are some internal tools, namely a data loader, a visualization tool, a replication environment and an optimizer that I'm not allowed to share externally.

Hope you found it interesting, and thanks for the feedback!!

auggierose · on April 28, 2017

tables are really easy in LaTex, and as for diagrams, just export them as PDFs or something and then include them in LaTex.

AndrewKemendo · on April 28, 2017

Do you have a github or anything else to try your model?

zitterbewegung · on April 28, 2017

This is a really good breakdown on data engineering for mobile. Usually you read something really theoretical or a complex ML solution but here is a great breakdown of the pros and cons of where to do training or inference.

candiodari · on April 28, 2017

When using Google or Amazon assistant, I wish they would do the machine learning on the device. I understand they want to protect their models but just this morning:

Ok Google, what's the time ?

* 2 seconds or so, nothing *

Hey Google, what's the time ?

* Another 2 seconds or so, nothing *

Alexa, what's the time ?

* 2 seconds or so, again *

"Here it currently are the is 9:42 search results for"

What happened ? Network had a hiccup (okay, it sucks), with as a result that Google's connection was blocked for a while, this caused me to repeat the question, which caused Google assistant to resend the query (in other words send a cancel down a tcp connection that hadn't gotten through it's buffer yet, resulting in even more delay). More delay let me to ask alexa instead. Alexa got lucky on the network.

This resulted in Alexa and Google answering at the same time, alexa with the time, google assistant with the search results for "what's the time hey google what's the time alexa what's the time". I was ready to throw things out of the window.

But the root cause of this problem was the delay due to the network.

So PLEASE get local voice transcription going, please ! Save alexa and my phone from getting thrown out the window.

deepnotderp · on April 28, 2017

I think that custom deep learning chips will be the best enabler of edge device deep learning, it's just too difficult to deploy anything useful onto most smartphones or other computationally constrained devices, and to compound this you often have to use the CPU because the GPU is unavailable either due to the framework or due to the drivers unavailable.

zitterbewegung · on April 28, 2017

I think that custom chips won't be made but just better graphics processors that can do inference even faster. Gamers that want more performance on their mobile games will push mobile graphics to the point that you will able to do gpgpu.

deepnotderp · on April 28, 2017

The problem is that even the Titan x drowns when doing real time inference, so a mobile gpu beating it without low precision is very unlikely.

argonaut · on April 28, 2017

That's news to me! I'm pretty darn sure the Titan X has no problems with inference on the amounts of data that a single user would want inference for.

deepnotderp · on April 28, 2017

Well, I was talking about real time object detection on 640x480 video. Perhaps most users would be okay with a 5 second or so delay when processing an image and perhaps you could use Facebook's trick of fast, bad quality style transfer and better quality style transfer once the image is in the servers. But the point is that the current paradigm is very restrictive in terms of deep learning applications.

putnam · on April 28, 2017

I think computing optical flow remains a major time bottleneck. Any attempt at temporal coherence would be great, and I'm sure there is some attempt on Messenger, but it really only works well for the last style transfer filter, all the way to the right in the app, and that one only really looks great in well lit scenes. Also, the phone seems to heat up a lot.

argonaut · on April 29, 2017

I you would downsize the frames first and work on multiple frames in parallel (e.g. however many nets I can fit in VRAM). I find it hard to believe that it wouldn't work after these things.

lowglow · on April 28, 2017

What's the killer for doing real time inference?

intrasight · on April 28, 2017

Both - just like the brain works

lowglow · on April 28, 2017

This is part of what we're working on at Asteria. We're figuring out what can and can't, and what should and shouldn't be run locally on the device.

adrianN · on April 28, 2017

Your brain comes with a cloud connection?

mamon · on April 28, 2017

If you believe in God, then... yes :)