Nice. But for a change, I want to see a blog that explains the training of the network, including the fine details that are necessary to reach good results.
No, the quality of visualization and the amount of overfitting are uncorrelated. Both an overfitted and a perfectly generalizing model will return excellent faces of dogs. The method is however a way to tell whether the model is underfitted (because then you won't see dog faces).
In the field of generative models, the way to tell whether a model is 'working', is by calculating the chance that an image in the validation set could have been generated by the model. The higher the total chance of your validation set, the better your generative model.
These 'deepdreaming' methods (and I know, because I created one of them before Google's code was out [1]) are nice and pretty and stuff, but it's literally impossible to tell whether the stuff they generate is a copy from the train set or is truly original. And as long as there is no way to evaluate them based on a validation set, there is no basis to assume that any of them is actually working and not copying and melding images it knows.
But then again, if they are copying and melding, but we can't tell, maybe it's good enough to pass a Turing test? After all, aren't we humans just copying and melding the stuff we know neither?
And, the stuff these networks create are definitely good enough to be 'interesting'.
Overfitting just means that the model is fitting noise in the training set, instead of the broad patterns of interest. (With a sufficiently generous definition of "noise" that includes "the random process that selected training data from an underlying distribution".) Just seeing recognizable images in a visualization of the model doesn't tell you anything about overfitting. As an extreme example, a k-nearest-neighbors classifier is a model that consists of nothing except verbatim examples from the training set, but that doesn't mean it's always overfit.
Plus, it's not immediately obvious that the image you're seeing is actually memorized from the training set. Maybe the model is just really good at synthesizing plausible-looking sheepdog images from random noise.
Seeing the face of a dog just means that the network has learned a good representation of dog faces. I don't think that has anything to do with the generalization ability of the network. I'm not clear on the details of this hallucination process, but this reminds me of earlier work by Zeiler & Fergus[1], where you can see the bottom layers of the network learning general low-level features and later layers learning higher level representations of the objects being classified.
"computer (being shown Rorschach ink blot tests): I see...a dog and 14 eyeballs. a dog and 13 eyeballs. 42 eyeballs, and...a dog. 2 dogs and 7 eyeballs..."