2) The problem that I would expect with a hybrid method is that conv features ar...

2) The problem that I would expect with a hybrid method is that conv features are usually trained to be redundant with dropout, so they should be highly correlated with each other and, thus, have a high cosine similarity.

3) I agree that my argument is scientifically unfair. I was trying to argue from the perspective of a prospective user. My customers tend to have a budget limit of how much their classifier is allowed to cost. Training from scratch would be too expensive. But a chopped reset with some conv layers will work OK and be cheap enough.

So for me the user, the ecosystem around your architecture and the availability of pretrained models might make the critical difference on whether I'll use it or not.