Scaling Down Deep Learning

stared · on Dec 22, 2022

Well, MNIST is a trivial dataset. Many simple methods work astonishingly well. For example, it takes some tuning to beat k Nearest Neighbors with a neural network. Or it is enough to use t-SNE to cluster digits in an unsupervised way without any preprocessing.

Fashion MNISt is not better - it shares the same issue with MNIST. At the very least, use non-MNIST (letters A-H from various fonts). Instead, I wholeheartedly recommend Google Quickdraw - hand-drawn doodles; more samples, more engaging, and more diverse. And images of the same size as MNIST.

See an example of usage for someone's first neural network: https://github.com/stared/thinking-in-tensors-writing-in-pyt...

p1esk · on Dec 23, 2022

This criticism is addressed right in the beginning:

The foundational discoveries in genetics centered on far simpler organisms such as peas, molds, fruit flies, and mice. To this day, biologists use these simpler organisms as genetic “minimal working examples” in order to save time, energy, and money. A well-designed experiment with Drosophilia, such as Feany and Bender (2000), can teach us an astonishing amount about humans.

The deep learning analogue of Drosophilia is the MNIST dataset. A large number of deep learning innovations including dropout, Adam, convolutional networks, generative adversarial networks, and variational autoencoders began life as MNIST experiments. Once these innovations proved themselves on small-scale experiments, scientists found ways to scale them to larger and more impactful applications.

stared · on Dec 23, 2022

Again, MNIST is not Drosophilia of deep learning. For deep learning, it is a trivial. And often misleading, a noticed by François Chollet:

> Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].

It might be considered Drosophilia of machine learning, though. Take a look at this beautiful table of the results of various classifiers http://yann.lecun.com/exdb/mnist/.

p1esk · on Dec 23, 2022

Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].

Unless your dataset is more similar to mnist than to “real computer vision” (eg galaxies or piano rolls) and you still want to use deep learning to classify it.

stared · on Dec 23, 2022

Then, could you give an example of a concrete MNIST-like dataset you worked with?

Galaxies are not like such (more diverse, more noise, you don't start with centered images, usually more data - in size and channels).

p1esk · on Dec 23, 2022

All similarly sparse data samples would suffer from the bachnorm issue. I don’t remember if I tried a convnet with batchnorm on galaxy classification but I did try it on piano rolls - it was bad - precisely because of batchnorm, and had I first tried the same model on mnist I would have caught the issue much faster (I tested it on cifar).

I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.

in3d · on Dec 23, 2022

It might “address” the criticism but it doesn’t rebut it. It’s absolutely true that MNIST can be misleading and the results on it might not be representative.

p1esk · on Dec 23, 2022

Okay, what would you rather start your experiments on to do a quick sanity check?

kkoncevicius · on Dec 22, 2022

kNN with k=1 is a more complex model than a neural network, not a simpler one. If we had more data it would scale and if we reach a point where we have a labeled image for every possible pixel combination in MNIST it would be unbeatable.

stared · on Dec 22, 2022

Sure, kNN scales very poorly. Yet, MNIST is simple enought that it works. Try to use kNN on ImageNet.

naillo · on Dec 22, 2022

This is a beautiful blog, e.g. this one is one of my favorites https://greydanus.github.io/2020/10/14/optimizing-a-wing/

matmatmatmat · on Dec 22, 2022

Man, ever find someone who's interested in all the same things as you but has had time to explore them, correctly, and even publishes the results?

What an amazing find, this blog, especially wing optimization (as you pointed out). I hope this guy gets the resources to run free with his work and just create incredible things.

jszymborski · on Dec 22, 2022

I think this is a wonderful dataset for teaching and will certainly try to include it in the assignments I write.

Other datasets I tried:

Fashion MNIST has so many mislabels and also is pretty trivial to separate with UMAP alone

Google's QuickDraw is better than MNIST, but I also haven't tested it against e.g. log. regression.

Of course there is CIFAR but those images hardly look like images. I can't classify half of them.

yorwba · on Dec 22, 2022

Previous discussion (2020): https://news.ycombinator.com/item?id=25314066

unixlikeposting · on Dec 22, 2022

>The ideal toy dataset should be procedurally generated so that researchers can smoothly vary parameters such as background noise, translation, and resolution.

It's been a minute since I last touched ML, but that seems like a fairly extreme claim. Am I wrong in thinking this?

pkghost · on Dec 22, 2022

What's extreme about it? I'm new to ML, but this seems great from a testing and verification perspective. I actually feel like Christmas came early, here, because I'm eager to explore novel model architectures, and having a small and easily manipulable dataset to experiment with seems perfect for that.