Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Down Deep Learning (greydanus.github.io)
122 points by caprock on Dec 22, 2022 | hide | past | favorite | 16 comments



Well, MNIST is a trivial dataset. Many simple methods work astonishingly well. For example, it takes some tuning to beat k Nearest Neighbors with a neural network. Or it is enough to use t-SNE to cluster digits in an unsupervised way without any preprocessing.

Fashion MNISt is not better - it shares the same issue with MNIST. At the very least, use non-MNIST (letters A-H from various fonts). Instead, I wholeheartedly recommend Google Quickdraw - hand-drawn doodles; more samples, more engaging, and more diverse. And images of the same size as MNIST.

See an example of usage for someone's first neural network: https://github.com/stared/thinking-in-tensors-writing-in-pyt...


This criticism is addressed right in the beginning:

The foundational discoveries in genetics centered on far simpler organisms such as peas, molds, fruit flies, and mice. To this day, biologists use these simpler organisms as genetic “minimal working examples” in order to save time, energy, and money. A well-designed experiment with Drosophilia, such as Feany and Bender (2000), can teach us an astonishing amount about humans.

The deep learning analogue of Drosophilia is the MNIST dataset. A large number of deep learning innovations including dropout, Adam, convolutional networks, generative adversarial networks, and variational autoencoders began life as MNIST experiments. Once these innovations proved themselves on small-scale experiments, scientists found ways to scale them to larger and more impactful applications.


Again, MNIST is not Drosophilia of deep learning. For deep learning, it is a trivial. And often misleading, a noticed by François Chollet:

> Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].

It might be considered Drosophilia of machine learning, though. Take a look at this beautiful table of the results of various classifiers http://yann.lecun.com/exdb/mnist/.


Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].

Unless your dataset is more similar to mnist than to “real computer vision” (eg galaxies or piano rolls) and you still want to use deep learning to classify it.


Then, could you give an example of a concrete MNIST-like dataset you worked with?

Galaxies are not like such (more diverse, more noise, you don't start with centered images, usually more data - in size and channels).


All similarly sparse data samples would suffer from the bachnorm issue. I don’t remember if I tried a convnet with batchnorm on galaxy classification but I did try it on piano rolls - it was bad - precisely because of batchnorm, and had I first tried the same model on mnist I would have caught the issue much faster (I tested it on cifar).

I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.


It might “address” the criticism but it doesn’t rebut it. It’s absolutely true that MNIST can be misleading and the results on it might not be representative.


Okay, what would you rather start your experiments on to do a quick sanity check?


kNN with k=1 is a more complex model than a neural network, not a simpler one. If we had more data it would scale and if we reach a point where we have a labeled image for every possible pixel combination in MNIST it would be unbeatable.


Sure, kNN scales very poorly. Yet, MNIST is simple enought that it works. Try to use kNN on ImageNet.


This is a beautiful blog, e.g. this one is one of my favorites https://greydanus.github.io/2020/10/14/optimizing-a-wing/


Man, ever find someone who's interested in all the same things as you but has had time to explore them, correctly, and even publishes the results?

What an amazing find, this blog, especially wing optimization (as you pointed out). I hope this guy gets the resources to run free with his work and just create incredible things.


I think this is a wonderful dataset for teaching and will certainly try to include it in the assignments I write.

Other datasets I tried:

Fashion MNIST has so many mislabels and also is pretty trivial to separate with UMAP alone

Google's QuickDraw is better than MNIST, but I also haven't tested it against e.g. log. regression.

Of course there is CIFAR but those images hardly look like images. I can't classify half of them.



>The ideal toy dataset should be procedurally generated so that researchers can smoothly vary parameters such as background noise, translation, and resolution.

It's been a minute since I last touched ML, but that seems like a fairly extreme claim. Am I wrong in thinking this?


What's extreme about it? I'm new to ML, but this seems great from a testing and verification perspective. I actually feel like Christmas came early, here, because I'm eager to explore novel model architectures, and having a small and easily manipulable dataset to experiment with seems perfect for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: