You could have the net predict an entire distribution for each pixel. Like a mixture of gaussians or something. But then sampling from it would be incorrect. E.g. it might not know if the car is blue or red, so half the pixels would randomly be red, and the other have blue. It would look terrible.
Somewhere, the neural net needs to decide "this car is going to be blue" and then be consistent with that. Adversarial nets allow that, by having random inputs. One of the inputs to the NN is a random number, and that random number might determine if the car is going to be blue or red this time.
The cool thing about this is that it allows you to generate multiple samples. You can generate 10 different images and select the best one. And the adversarial nets should learn to approximate the true distribution as closely as possible. And I don't think there is any other method that can do that.
Another idea would to just have a loss function that doesn't punish it for getting a wrong color. But rewards it only when it gets very close to the right color. This way the algorithm doesn't worry about producing muddy brown colors when it isn't' sure, it just goes with a best guess.
When I referred to sampling I was talking about the joint distribution of all pixels (hence "space of images"), which would work fine. But I suspect predicting distributions is impractical, it may be better to use methods that sample directly without ever explicitly finding the distribution.
You do need a source of entropy to perform the sampling. This amount should be more than a minimum given by how precisely you want to sample from the continuous source, related to the Kullback-Leibler divergence of the distribution.
Well in theory, the adversarial nets should learn to model the distribution perfectly. But there might be a way to do it directly. You could train an NN to produce samples from a random input source, just like the adversarial nets. But unlike the adversarial nets, the inputs don't need to be random. You could train another NN to predict a distribution of what they should be. And then instead of adversarial training, just regular training to predict the exact pixels, and backproping all the way through.
Somewhere, the neural net needs to decide "this car is going to be blue" and then be consistent with that. Adversarial nets allow that, by having random inputs. One of the inputs to the NN is a random number, and that random number might determine if the car is going to be blue or red this time.
The cool thing about this is that it allows you to generate multiple samples. You can generate 10 different images and select the best one. And the adversarial nets should learn to approximate the true distribution as closely as possible. And I don't think there is any other method that can do that.
Another idea would to just have a loss function that doesn't punish it for getting a wrong color. But rewards it only when it gets very close to the right color. This way the algorithm doesn't worry about producing muddy brown colors when it isn't' sure, it just goes with a best guess.