Developer preview of TensorFlow Lite

rajatmonga · on Nov 14, 2017

TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices! TensorFlow has always run on many platforms, from racks of servers to tiny devices, but as the adoption of machine learning models has grown over the last few years, so has the need to deploy them on mobile and embedded devices. TensorFlow Lite enables low-latency inference of on-device machine learning models.

Looking forward to your feedback as you try it out.

linuxkerneldev · on Nov 14, 2017

> Looking forward to your feedback as you try it out.

Thanks Rajat. We use typical Cortex-A9/A7 SoCs running plain Linux rather than Android. We would use it for inference.

1. Platform choice

Why make TFL Android/iOS only? TF works on plain Linux. TFL even uses NDK and it would appear the inference part could work on plain Linux.

2. Performance

I did not find any info on performance of TensorFlow Lite. Mainly interested in inference performance. The tag "low-latency inference" catches my eye, just want to know how low is low latency here? milliseconds?

rajatmonga · on Nov 15, 2017

1. The code is standard C/C++ with minimal dependencies so it should be buildable on even non-standard platforms. Linux is easy.

2. The interpreter is more optimized for being low overhead and the kernels are better optimized especially for ARM CPUs currently. While model performance varies by model - we have seen significant improvements on most models going from TensorFlow to TensorFlow Lite. We'll share benchmarks soon.

linuxkerneldev · on Nov 15, 2017

> The code is standard C/C++ with minimal dependencies so it should be buildable on even non-standard platforms. Linux is easy.

Glad to hear that Rajat. Since it is easy as you say, I look forward to your upcoming release with Linux as standard. :-)

jonnycowboy · on Nov 15, 2017

Also interested in answers to these two questions, as well as OpenCL performance in vanilla linux (iMX6 and above).

azinman2 · on Nov 14, 2017

Will CoreML (or any hardware acceleration) on iOS be supported?

sarasuec · on Nov 15, 2017

We want to provide a great experience across all our supported platforms, and are exploring ways to provide a simpler experience with good acceleration on iOS as well.

zitterbewegung · on Nov 14, 2017

Woah this is cool. I’ve been waiting for this since you announced it. I was thinking about benchmarking it against other solutions . What do you think about other similar frameworks like coreml ?

blackguardx · on Nov 14, 2017

What tradeoffs did you make compared to the original?

rajatmonga · on Nov 14, 2017

A few tradeoffs we had to make:

- As mentioned below - flatbuffers makes the startup time faster while trading off some flexibility

- Smaller code size means trading off dependency on some libraries and broader support vs writing more things from scratch more focused on the user cases people care about

om42 · on Nov 15, 2017

Do you have performance/memory comparisons from using flatbuffer vs protobuf in TF? A quick writeup with how switching effected performance would be really interesting :)

puzzle · on Nov 14, 2017

Flatbuffers also uses less memory.

puzzle · on Nov 14, 2017

Using FlatBuffers, for one?

sandGorgon · on Nov 15, 2017

Hi, I have written about this before ( https://news.ycombinator.com/item?id=15595689 ) , but are there serialization fixes between cloud training and mobile ?

We have had huge issues in trying to figure out how to save models (freeze graph,etc) and load it on Android. If you look at my previous thread - it also mentions bugs,threads and support requests where people are consistently confused.

rajatmonga · on Nov 15, 2017

Agree, that is a big problem that we are working hard to solve. It isn't solved in this release, but it is high up on our task list.

sandGorgon · on Nov 16, 2017

hey, thanks for the reply!

petewarden (https://news.ycombinator.com/item?id=15596990) from Google is also working on this - so im really hopeful you guys will have something soon. This is a serious blocker for doing anything reasonable in TF.

bitmapbrother · on Nov 14, 2017

Will this leverage the Pixel Visual Core SoC on a Pixel 2 device?

sarasuec · on Nov 14, 2017

This release of Tensorflow Lite doesn't leverage the Pixel Visual Core. We will explore different hardware options available to us in the future.

rajatmonga · on Nov 14, 2017

TF Lite supports Android NN API that allows each phone to accelerate these models leveraging the custom accelerator on the phone.

lern_too_spel · on Nov 14, 2017

What about using XLA to compile libraries for mobile deployment fusing only the operations needed by the model?

petewarden · on Nov 14, 2017

One nice thing about Lite is that it's a lot easier to just include the operations you need (compared to TensorFlow 'classic'), there's fusion for common patterns, and the base interpreter is only 70KB. That covers a lot of the advantages of using XLA for mobile apps. In return you have the ability to load models separately from the code, and the ops are hand-optimized for ARM.

I'm still a fan of XLA, and I expect the two will grow closer over time, but I think Lite is better for a lot of scenarios on mobile.

restricted_ptr · on Nov 15, 2017

How about quantization? Does tensorflow lite perform quantization or is it tensorflow supposed to do it? Is it iterative process or straightforward? Or are you training quantized models as nn api docs say?

infnorm · on Nov 17, 2017

The quantization is done with a special training script that is quantization aware. We will be open sourcing a mobilenet quantized training script to show how to do this soon.

infnorm · on Nov 14, 2017

TensorFlow Lite is an interpreter in contrast with XLA which is a compiler. The advantage of TensorFlow lite is that a single interpreter can handle several models rather than needing specialized code for each model and each target platform. TensorFlow Lite’s core kernels have also been hand-optimized for common machine learning patterns. The advantage of compiler approaches is fusing many operations to reduce memory bandwidth (and thus speed). TensorFlow lite fuses many common patterns in the TensorFlow converter. We are of course excited about the possibility of using JIT techniques and using XLA technology within the TensorFlow Lite interpreter or as part of the TensorFlow Lite converter as a possible future direction.

dangjc · on Nov 15, 2017

Is it lite enough to compile with Emscripten and use via WebAssembly?

infnorm · on Nov 18, 2017

This should be possible, but we haven't tried it. We're likely going to add a simplified target that has minimal dependencies (like no Eigen) that allows building on simple platforms.

dangjc · on Nov 18, 2017

Cool. I have something else that uses Eigen in WebAssembly, so that hasn't caused any issues btw.

pjmlp · on Nov 15, 2017

So it uses Bazel on Android....

Google devs, could you please get yourself together in one room and agree in ONE BUILD SYSTEM for Android?!?

Gradle stable, cmake, ndk-build, Gradle unstable plugin, GN, Bazel, ..., whatever someone else does with their 20%.

I keep collecting build systems just to build Gooogle stuff for Android.

d4l3k · on Nov 15, 2017

It might be a tad annoying, but the rest of Tensorflow uses Bazel so it makes sense that Tensorflow Lite also uses it. It also probably matches the internal Google workflow better since Google uses Blaze internally.

pjmlp · on Nov 15, 2017

I thought it was to be used outside Google, not that we have to learn every single build system they happen to use inside.

solipsism · on Nov 15, 2017

If all Google teams who make use of external build systems were going to agree on one (not likely), it would be Bazel.

m3kw9 · on Nov 14, 2017

Why would I use this for iOS when I can use CoreML and convert TensorFlow into a CoreML model where there is already native support?

prodtorok · on Nov 14, 2017

Native support for tensorflow? I don’t think so...

https://developer.apple.com/documentation/coreml/converting_...

dr1337 · on Nov 14, 2017

CoreML doesn't actually support Tensorflow. It's support for Tensorflow is only through Keras which is fine if you just want to build stock standard models but if you're doing crazy research implementations then that's not going to work.

m3kw9 · on Nov 14, 2017

Is all in the converter tool, if the converter tool can get the tf file into a .mlmodel properly, then it will be supported. Inside is just a bunch of weights and layers and parameters. We just need a proper script to translate it

dgacmu · on Nov 15, 2017

"just a bunch of weights and layers and parameters" -- I think you and the GP are agreeing. That's the definition of standard: If the model can be expressed using the currently-blessed set of layer definitions in CoreML, then yes. But if you're doing nonstandard stuff with weird control flow behavior, or RNNs that don't map into some of the common flavors, then all bets are off.

An example: Some of my colleagues put a QP solver in tandem with a DNN, so that the neural network could 'shell out' to the solver as part of its learning, and learned to solve small sudoku problems from examples alone: https://arxiv.org/abs/1703.00443 The pytorch code for it is one of the examples I like to use as a stress-test for doing funky things in the machine learning context.

TensorFlow is a very generic dataflow library at its heart - which happens to have a lot of DNN-specific functionality as ops. It's possible to express arbitrary computations in it, whereas CoreML and and similar frameworks make more assumptions that the computation will fit a particular mould, and optimize it thereby.

m3kw9 · on Nov 15, 2017

Looks like you are right, CoreML only support these 3 DNNs: Feedforward, convolutional, recurrent. I suppose capsule nets are not any one of those, if it were implemented in TF

mtgx · on Nov 14, 2017

How would this differ from uTensor? Did they make uTensor redundant?

https://github.com/neil-tan/uTensor

infnorm · on Nov 14, 2017

We developed TensorFlow lite to be small enough to target really small devices that lack MMU’s like the ARM Cortex M MCU series, but we haven’t done the actual work to target those devices. That being said, we are excited when the ecosystem and community around machine learning expands.

ianhowson · on Nov 15, 2017

Cortex-M compatibility was literally my first thought when I read this -- especially low-memory systems. Might have to hack it up myself.

barbolo · on Nov 14, 2017

Would that be a viable option to deploy TensorFlow models on serverless environments (Lambda, Functions)?

rasmi · on Nov 14, 2017

You can deploy TensorFlow model binaries as serverless APIs on Google Cloud ML Engine [1]. But I would also be interested in seeing a TensorFlow Lite implementation.

[1] https://cloud.google.com/ml-engine/docs/deploying-models

Disclaimer: I work for Google Cloud.

barbolo · on Nov 15, 2017

Thanks, @rasmi. I have a feedback for you guys. The pricing for predictions inference in GCP is not very fair. If I deploy a small model (like a SqueezeNet or Mobilenet) I pay almost the same price of someone deploying large models (like Resnet or VGG). That’s why I’m deploying my models on serverless environments and paying about 5 dollars for 1 million inferences.

The pricing of GCP is: $0.10 per thousand predictions, plus $0.40 per hour. That’s more than 100 dollars for 1 million inferences.

rasmi · on Nov 20, 2017

I see what you mean. To some companies, ML Engine's cost as a managed service may be worth it. To others, spinning up a VM with TensorFlow Serving on it is worth the cost savings. If you've taken other approaches to serving TensorFlow models to get around ML Engine's per-prediction cost, I'm curious to hear about them.

infnorm · on Nov 14, 2017

The main TensorFlow interpreter provides a lot of functionality for larger machines like servers (e.g. Desktop GPU support and distributed support). Of course, TensorFlow lite does run on standard PCs and servers, so using it on non-mobile/small devices is possible. If you wanted to create a very small microservice, TensorFlow lite would likely work, and we’d love to hear about your experiences, if you try this.

barbolo · on Nov 15, 2017

Thanks for the answer. Currently I’m using AWS Lambda to deploy my TensorFlow models. But it’s pretty hard and hacky. I need to remove a considerable portion of the code base that is not needed for inference only routines. I do that so the code loads faster and to fit the deployment package size limit. If TensorFlow Lite is already a compressed code, then it may be much easier to deploy it to a serverless environment. I’ll be trying it in my next deployments.

infnorm · on Nov 18, 2017

Sounds really interested. We're excited to hear about how that goes.

MBCook · on Nov 14, 2017

Is it possible that a future version may be able to leverage CoreML on iOS?

rajatmonga · on Nov 14, 2017

With TensorFlow and TF Lite we are looking to provide a great experience across all platforms, and are exploring ways to provide a simpler experience with good acceleration on iOS as well.

m3kw9 · on Nov 14, 2017

Someone can just write a CoreML tool to convert. There could already be one that converts non-lite version straight to CoreML

therealmarv · on Nov 15, 2017

But on iOS we still cannot use swift with that, see https://github.com/tensorflow/tensorflow/issues/19 ?! Btw. what about Kotlin?

UPDATE: It seems some third party developer have developed some swift compatible APIs.

nightsd01 · on Nov 15, 2017

On iOS, does TensorFlow Lite utilize the GPU for inference when needed or is it CPU only?

If so, does it use OpenCL or something?

aprao · on Nov 14, 2017

Is this the next iteration of TensorFlow for Mobile? Is on-device training something planned for the future?

runesoerensen · on Nov 14, 2017

Yes to your first question, from the article: ”As you may know, TensorFlow already supports mobile and embedded deployment of models through the TensorFlow Mobile API. Going forward, TensorFlow Lite should be seen as the evolution of TensorFlow Mobile, and as it matures it will become the recommended solution for deploying models on mobile and embedded devices.”

Also check out this post for more info and examples: https://research.googleblog.com/2017/11/on-device-conversati...

ausjke · on Nov 15, 2017

This is pretty Android/iPhone-only, wish it can be more flexible to be used on other edge devices such as home routers or other embedded products where resource is constrained.

rajatmonga · on Nov 15, 2017

The current examples talk about Android/iPhone, however the core runtime is pretty lightweight with the goal of supporting all kinds of embedded products.

Do let us know if you build/run on other platforms.

qhwudbebd · on Nov 17, 2017

I was hoping this link might be to a version of TensorFlow that sheds the heavyweight java dependency for building. Sadly not; still bazel-infested.

tadeegan · on Nov 14, 2017

How does this compare to using XLA for AOT compilation?

rajatmonga · on Nov 14, 2017

XLA for AOT is useful for cases when you know exactly what architecture you are shipping to, and are ok updating the code whenever the model changes.

TF Lite addresses the segment where you need more flexibility

- you ship single app to many types of devices

- would like to update the model independent of the code itself e.g. no change to Android APK, and update the model over the wire.

Even with this generality, TF Lite is still quite fast and lightweight as that was the focus building it up.

ralphc · on Nov 15, 2017

How do I know which handsets or tablets have "New hardware specific to neural networks processing" for the NNAPI?

thepoet · on Nov 15, 2017

Is the Lite convertor also doing some sort of quantization or is it purely for file format conversion?

d4l3k · on Nov 15, 2017

Tensorflow has supported quantization for a long time (and is recommended for mobile devices) so it very likely is.

infnorm · on Nov 18, 2017

Quantization comes in many different forms. TensorFlow lite provides optimized kernels for 8-bit uint quantization. This specific form of evaluation is not directly supported in TensorFlow right now (though it can train such a model). We will be releasing training scripts that show how to setup such models for evaluation.

amq · on Nov 15, 2017

What are the minimum requirements? Would something like ARM M4F with 72 MHz and 512 KB RAM work?

kau_mad · on Nov 15, 2017

I would like to how small an Inception-V3 model becomes when converted into .tflite format.

cyberpunk0 · on Nov 14, 2017

Didnt they announce this at Google I/O? Where it was supposed to be available that day

theDoug · on Nov 14, 2017

Definitely announced at I/O, but all the language I'm finding from around that time is of the "want to" and "will" variety, like this Wired piece:

https://www.wired.com/2017/05/google-really-wants-put-ai-poc...

> “Google won't say much more about this new project. But it has revealed that TensorFlow Lite will be part of the primary TensorFlow open source project later this year”

1_over_n · on Nov 14, 2017

how does this relate to other hardware beyond iOS / Javascript i.e. raspberry pi, nvidia jetson etc andddddd.........whats the likelihood of libraries that sit on top of TF supporting this like keras and pytorch. Just some questions that spring to my mind

amelius · on Nov 15, 2017

I'm wondering if TF has something like pytorch's autograd. Does anyone know?

igorbark · on Nov 15, 2017

I only just briefly read the doc for autograd, but automatic differentiation is the strong default in TF if that's what you're asking.

rajatmonga · on Nov 15, 2017

Yes, it does have auto differentiation from day one. There's also a new autograd like functional API as part of eager. See https://research.googleblog.com/2017/10/eager-execution-impe...

piratebroadcast · on Nov 15, 2017

Any React Native APIs?

infnorm · on Nov 18, 2017

Not at this time. However, in principle it would be possible to create such bindings.

fiatjaf · on Nov 14, 2017

So we'll start to see more and more battery-consuming "AI" apps in mobile devices?

dgacmu · on Nov 15, 2017

And we'll start to see more battery-efficient hardware to run those apps without consuming all of your battery. :)

(I'm saying that glibly, but I'm dead serious -- look at what we've seen emerge just this year in Apple's Neural Engine, the Pixel Visual Core, rumored chips from Qualcomm, and the Movidius Myriad 2. The datacenter was the first place to get dedicated DNN accelerators in the form of Google's TPU, but the phones -- and even smaller devices, like the "clips" camera -- are the clear next spot. And this is why, for example, TensorFlow Lite can call into the Android DNNAPI to take advantage of local accelerators as they evolve.

Being able to run locally, if battery life is preserved, is a huge win in latency, privacy, potentially bandwidth, etc. It'll be good, though it does need advances in both the HW and the DNN techniques (things like Mobilenet, but we need far more).

fiatjaf · on Nov 16, 2017

Thank you.

bluetwo · on Nov 15, 2017