To some extent it's true that "deep learning" is a buzzword, though it is less arbitrary that "data scientist" (which makes no sense whatsoever). In a nutshell, what people mean by "deep learning" is the collection of tips and tricks that make it possible to efficiently train multiple layers of non-linear hidden units (or feature extractors).
Sometimes these are multi-layer neural nets, but they don't have to be.
That's an interesting (and relevant) question. In fact, if you're using something like stochastic gradient descent to optimize the weights of your network, it might be very hard for the network to escape the general local minimum (or basin of attraction) in which it ended after having trained over a large dataset, even if you present it with the same examples but with the labels flipped (which would be the easiest way to "unlearn").
In theory stochastic gradient descent allows you to escape local minima: the noise in the stochastic error surface will likely be enough for the network to escape whatever minimum it is in. Practically, because weights will tend to have a large magnitude and because most of the times you'll be using a saturating non-linearity (such as a sigmoid), the number of steps required to escape that local minimum might be too big.
Presumably, you could use second-order optimization methods to perhaps escape from minima -- because it allows you to make "bigger" steps -- but that comes with its own set of problems (negative curvature being one of them).
I encourage you to actually test these hypotheses: train a simple network on something stupid like MNIST, and make it achieve a reasonable error with many passes through the data. Then change the labels of 10-20-50% of your inputs and continue training (with the same learning rate... or not!) to see how long it takes for the network to get to another minimum.
Sometimes these are multi-layer neural nets, but they don't have to be.