I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains i...

I'm pretty sure he wasn't talking about the swiss roll specifically. Big gains in neural net performance have been made through better initialization schemes (not dataset specific, just in general, e.g. an initialization scheme might adapt the initial weight distribution depending on the number of hidden units in the next layer), and smaller models are in general more sensitive to initialization.