I hear this phrase everywhere "Every layer is a change in representation of the previous layer". Is this mathematically proven or is this an assumption looking at the image classification models layer?
That seems true by definition, as each layer is a functional transformation of the previous layer.
I think want they want to imply is "Each layer is a semantically meaningful representation, changed from the previous layer" which I don't think is a verifiable claim (as what is semantically meaningful isn't an objective claim)
Well since the next layer outputs are a linear transformation plus some nonliearity of the previous function, it's a fact that it's a change in representation. But I guess the true question is broader: "Is it proven that the next layer is preparing a feature representation for the next one?".
I don't know if it is mathematically proven, but you can easily see it yourself when making an image classifier with one hidden layer. It has some really good evidence for an assumption, at this point I would call it at least empirical evidence.