the net comes pretrained. I haven't dug into the code much, but from the surface it seems it works by using the net as a scoring system for feature similarity, and then does a gradient descent on that (in very broad terms, I saw the style layer being used multiple times at multiple scales, probably to capture high and level patterns independently)