Visualizing GoogLeNet Classes

amelius · on July 30, 2015

Nice. But for a change, I want to see a blog that explains the training of the network, including the fine details that are necessary to reach good results.

nl · on July 30, 2015

Where does one find the ImageNet class numbers? It doesn't appear to be http://image-net.org/challenges/LSVRC/2014/browse-synsets

matsiyatzy · on July 30, 2015

You can download the file "synset_words.txt" with class numbers in caffe via running the script ./data/ilsvrc12/get_ilsvrc_aux.sh . But here's the file online as well : https://github.com/sh1r0/caffe-android-demo/blob/master/app/...

nl · on July 30, 2015

Is there a good explanation of inception layers in GooglLeNet?

I know they are one of the major differences between it and the Oxford models, but I don't completely understand it.

escherize · on July 30, 2015

Isn't the fact that I can see the face of a dog (in the animated 'Shetland sheepdog' example) a sure sign of overfitting?

317070 · on July 30, 2015

No, the quality of visualization and the amount of overfitting are uncorrelated. Both an overfitted and a perfectly generalizing model will return excellent faces of dogs. The method is however a way to tell whether the model is underfitted (because then you won't see dog faces).

In the field of generative models, the way to tell whether a model is 'working', is by calculating the chance that an image in the validation set could have been generated by the model. The higher the total chance of your validation set, the better your generative model.

These 'deepdreaming' methods (and I know, because I created one of them before Google's code was out [1]) are nice and pretty and stuff, but it's literally impossible to tell whether the stuff they generate is a copy from the train set or is truly original. And as long as there is no way to evaluate them based on a validation set, there is no basis to assume that any of them is actually working and not copying and melding images it knows.

But then again, if they are copying and melding, but we can't tell, maybe it's good enough to pass a Turing test? After all, aren't we humans just copying and melding the stuff we know neither?

And, the stuff these networks create are definitely good enough to be 'interesting'.

[1] http://317070.github.io/Dream/

teraflop · on July 30, 2015

How so?

Overfitting just means that the model is fitting noise in the training set, instead of the broad patterns of interest. (With a sufficiently generous definition of "noise" that includes "the random process that selected training data from an underlying distribution".) Just seeing recognizable images in a visualization of the model doesn't tell you anything about overfitting. As an extreme example, a k-nearest-neighbors classifier is a model that consists of nothing except verbatim examples from the training set, but that doesn't mean it's always overfit.

Plus, it's not immediately obvious that the image you're seeing is actually memorized from the training set. Maybe the model is just really good at synthesizing plausible-looking sheepdog images from random noise.

sxyuan · on July 30, 2015

Seeing the face of a dog just means that the network has learned a good representation of dog faces. I don't think that has anything to do with the generalization ability of the network. I'm not clear on the details of this hallucination process, but this reminds me of earlier work by Zeiler & Fergus[1], where you can see the bottom layers of the network learning general low-level features and later layers learning higher level representations of the objects being classified.

[1] http://arxiv.org/abs/1311.2901

smcl · on July 30, 2015

The dogs (and eyes) thing is not just you, it seems a few different communities have picked up on this:

http://www.somethingawful.com/comedy-goldmine/deep-dreams-co...

"computer (being shown Rorschach ink blot tests): I see...a dog and 14 eyeballs. a dog and 13 eyeballs. 42 eyeballs, and...a dog. 2 dogs and 7 eyeballs..."