Just a note that you can interpret regularization as placing a prior on weights. L2 regularization is a Gaussian prior, and L1 is a Laplacian prior. I.e. this is doing Bayesian statistics rather than an arbitrary hack to improve predictions.
Elements of Statistical Learning is firmly in the frequentist world from what I recall, so this might not be discussed in that book.
This is discussed in Chapter 1 (or maybe 2), I think, which suggests to me that the author should probably read a little bit more of it.
Mind you, it's a wonderful book, and I recommend that people should just read it in general (you may not be able to do very many of the exercises, but it's still worth it).
Additionally, when he rails against introducing bias to improve generalization, I believe of some parts of statistical learning theory: Expected risk can be viewed as empirical risk (fit) and model complexity (lack of bias).
Elements of Statistical Learning is firmly in the frequentist world from what I recall, so this might not be discussed in that book.