Compressing is indeed a hot topic, but I do feel like this article (the one on a...

Compressing is indeed a hot topic, but I do feel like this article (the one on arXiv https://arxiv.org/abs/1611.05162 ) has some major shortcomings. First off, the datasets used (spiral and MNIST) are simple and small. They can be used as illustration, but should be avoided for benchmarking. Secondly, despite it being a hot topic, the authors did not compare with other algorithms. Thirdly, they have a 2 hidden dense layer network with over a million parameters for mnist, of course you can prune 95% of those parameters. You could probably have achieved the same result by simply training with 5% of the weights. Finally, there seems to be no approach for convolution layers?

In network pruning, my experience is that simple heuristics sometimes outperform hard math approaches. Also different problems can have wildly different approaches which work best. A good approach on one problem and one network can be very bad on a slightly different network. In this sense, it is sad that LeNet is usually used for benchmarking as the results typically dont generalize well.