Clustering also isn't a supervised learning technique. Even though you might say DNNs can be unsupervised (autoencoders), it generally is not the case in practical systems. So it's not a good comparison at all.
Well, feature learning is unsupervised by necessity, otherwise you're not learning features, you're learning a mapping between features and labels.
Deep nets used in the way you say are first trained unsupervised to extract features, then the features are used in supervised learning, to learn a mapping from those new features to labels.
You can also do this "by hand" using unsupervised learning techniques like clustering, Principal Component Analysis etc: you make your own features then, and train a classifier afterwards, on the features you extracted in that way.