Shrinking Machine Learning Models for Offline Use

anonymousDan · on Aug 13, 2018

Can anyone with more knowledge in the area point me to some resources/surveys regarding state of the art techniques for compressing machine learning models? I'd be particularly interested to see experiments exploring what a plot of model size reduction vs. accuracy cost looks like for different techniques. For example is there usually a graceful degradation in terms of accuracy loss as you compress more, or is there often some kind of tipping point where accuracy plummets?

Qworg · on Aug 14, 2018

There are low-bit networks as well: https://arxiv.org/abs/1603.05279

That technology is now a spinoff of AI2 and UW: https://www.xnor.ai/

h4b4n3r0 · on Aug 13, 2018

State of the art is basically 8 bit weights. Anything below that doesn't really work. You will see lots of benchmarks and figures saying that it does, but nearly all of those neglect the absolute accuracy, meaning, they compare deeply quantized models against the _shitty_ variants of full-precision or half-precision models which are not useful in practice. Another trick is to deeply quantize an overly redundant model that's no longer state of the art, and show a few percent degradation in accuracy on top of an already barely acceptable number.

IMO, we need to pay attention to absolute accuracy if any of this is to become actually practical. I.e. I don't care how fast or small your compressed network is if its top5 accuracy on ImageNet is below 80%, or some other such criterion. Now granted, this is not perfect, because such models might still be useful for a smaller number of classes, but then maybe come up with a separate metric for that, too. A pedestrian detector is not very useful if it misses or misplaces 30% of pedestrians.

ladberg · on Aug 13, 2018

This WWDC video shows the effects quantization at various levels: https://developer.apple.com/videos/play/wwdc2018/708/

A bit Apple specific, but the main ideas carry over to any ML model. There's also a part 2, which I haven't watched.

web007 · on Aug 13, 2018

The most effective "shrinking" of ML models that I've seen (very limited experience, YMMV) is through "pruning". Searching for "arxiv pruning" is an excellent starting point, and a couple of those papers include metrics for accuracy vs size and the tradeoffs therein.

mlthoughts2018 · on Aug 14, 2018

I came to the comments to say the same thing. Quantization and hashing tricks for embeddings are cool and all, but not really important for model compression.

Rather, training companion models to prune away whole subnetworks of weight and layer combinations can allow you to remove tens of thousands of parameters from the model entirely— not wasting space on their quantized weights when they end up not being a contributing pathway to predictions.

buildbot · on Aug 13, 2018

In my limited experience with CNNs/MLPs, it is more of a tipping point. There is a very small knee point in the tradeoff curve - below this point you get no accuracy, within the point some tradeoffs, and above it very little increase in accuracy for more compression.

John_KZ · on Aug 14, 2018

Is there really a need to shrink models?

As far as I know, most machine learning models can be very compact, often well under 1 GB. Even high-res vision CNNs aren't anywhere close to being fully-connected. They might have millions of weights, but that's just in the MegaByte range.

My understanding in the the real problem is obfuscating the machine learning model. If they decide to put their model on the local memory, they'd be giving away their well-guarded trade secret. Also they'd be giving away their justification for collecting all user data.

Is anyone around here working of production ML software? Am I really wrong?

chowyuncat · on Aug 14, 2018

For a security product deployment of the model over the internet is a significant cost. Hundreds of thousands of machines in an enterprise may need to be updated at once. These machines may not have an easy way to accelerate the evaluation of the model, either, making feature reduction important for CPU work as well.

_5nsu · on Aug 14, 2018

Cpdddkm