> All your points are good ones and were knowable by any researcher at the time who wasn’t, idk, a new grad or new to CV.
I think you are radically overstating how obvious some of these things are.
What you call "just threw the VAE in there using the default options from the original VAE paper" is what another person might call "used a proven reference implementation, with the settings recommended by its creator"
Sure, there are design flaws with SD1.0 which feel obvious today - they've published SDXL and having read the paper, I wouldn't even consider going about such a project without "Conditioning the Model on Cropping Parameters". But the truth is this stuff is only obvious to me because someone else figured it out and told me.
I’m not criticizing them or the approach. That’s what I would have done most likely. But the things you mentioned aren’t particular to stable diffusion, or even VAEs. Yes, the best way to learn is to be told or to build up applied/implemen6ation experience until you learn them directly. But almost any CV model will run into at least one of those issues, and I would expect someone with idk > 1y experience in applied work to know these things. Perhaps I am wrong to do that.
I think you are radically overstating how obvious some of these things are.
What you call "just threw the VAE in there using the default options from the original VAE paper" is what another person might call "used a proven reference implementation, with the settings recommended by its creator"
Sure, there are design flaws with SD1.0 which feel obvious today - they've published SDXL and having read the paper, I wouldn't even consider going about such a project without "Conditioning the Model on Cropping Parameters". But the truth is this stuff is only obvious to me because someone else figured it out and told me.