Well, MNIST is a trivial dataset. Many simple methods work astonishingly well. For example, it takes some tuning to beat k Nearest Neighbors with a neural network. Or it is enough to use t-SNE to cluster digits in an unsupervised way without any preprocessing.
Fashion MNISt is not better - it shares the same issue with MNIST. At the very least, use non-MNIST (letters A-H from various fonts). Instead, I wholeheartedly recommend Google Quickdraw - hand-drawn doodles; more samples, more engaging, and more diverse. And images of the same size as MNIST.
This criticism is addressed right in the beginning:
The foundational discoveries in genetics centered on far simpler organisms such as peas, molds, fruit flies, and mice. To this day, biologists use these simpler organisms as genetic “minimal working examples” in order to save time, energy, and money. A well-designed experiment with Drosophilia, such as Feany and Bender (2000), can teach us an astonishing amount about humans.
The deep learning analogue of Drosophilia is the MNIST dataset. A large number of deep learning innovations including dropout, Adam, convolutional networks, generative adversarial networks, and variational autoencoders began life as MNIST experiments. Once these innovations proved themselves on small-scale experiments, scientists found ways to scale them to larger and more impactful applications.
Again, MNIST is not Drosophilia of deep learning.
For deep learning, it is a trivial. And often misleading, a noticed by François Chollet:
> Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].
It might be considered Drosophilia of machine learning, though. Take a look at this beautiful table of the results of various classifiers http://yann.lecun.com/exdb/mnist/.
Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision].
Unless your dataset is more similar to mnist than to “real computer vision” (eg galaxies or piano rolls) and you still want to use deep learning to classify it.
All similarly sparse data samples would suffer from the bachnorm issue. I don’t remember if I tried a convnet with batchnorm on galaxy classification but I did try it on piano rolls - it was bad - precisely because of batchnorm, and had I first tried the same model on mnist I would have caught the issue much faster (I tested it on cifar).
I suspect a chess position evaluation would suffer from batchnorm just as much, if the intermediate feature maps remain sparse.
It might “address” the criticism but it doesn’t rebut it. It’s absolutely true that MNIST can be misleading and the results on it might not be representative.
kNN with k=1 is a more complex model than a neural network, not a simpler one. If we had more data it would scale and if we reach a point where we have a labeled image for every possible pixel combination in MNIST it would be unbeatable.
Man, ever find someone who's interested in all the same things as you but has had time to explore them, correctly, and even publishes the results?
What an amazing find, this blog, especially wing optimization (as you pointed out). I hope this guy gets the resources to run free with his work and just create incredible things.
>The ideal toy dataset should be procedurally generated so that researchers can smoothly vary parameters such as background noise, translation, and resolution.
It's been a minute since I last touched ML, but that seems like a fairly extreme claim. Am I wrong in thinking this?
What's extreme about it? I'm new to ML, but this seems great from a testing and verification perspective. I actually feel like Christmas came early, here, because I'm eager to explore novel model architectures, and having a small and easily manipulable dataset to experiment with seems perfect for that.
Fashion MNISt is not better - it shares the same issue with MNIST. At the very least, use non-MNIST (letters A-H from various fonts). Instead, I wholeheartedly recommend Google Quickdraw - hand-drawn doodles; more samples, more engaging, and more diverse. And images of the same size as MNIST.
See an example of usage for someone's first neural network: https://github.com/stared/thinking-in-tensors-writing-in-pyt...