If you mainly care about prediction, rather than inspecting the fitted parameters, a lot of this detail is usually overkill.
To generalize well, it's almost always a good idea to have some sort of regularization, such as penalizing the sum of the square of the parameters. The extra term in the cost function will usually make the naive "normal equations" approach work fine, and give much the same predictions as fancy pivoted QR approaches. On my machine it's also a lot faster (the ball-park is ~~10x faster for large systems).
I'm glad R has super-solid robust GLM implementations. And unless you're fitting many models, you should probably just use such a library routine. However, I wish more tutorials and textbooks would spend more time on the reasons for numerical stability, and when one should care, rather than pushing that detail off into a trail of citations.
Nice overview. I thought it was a bit odd that they pulled out a stieltjes integral so early on to define the mean and variance (I = Integral x dF(x)). I wouldn't expect most people reading this sort of introductory material to be familiar with it.
I suppose it's the only single definition that works for both continuous and discrete variables. Although I actually read it as the more common Riemann integral definition (i.e. mean = integral of x*f(x)dx) until you pointed it out.
This is a great post! I tried implementing my own GLMs [1] a short while ago. But I ran into a lot of trouble with numerical instability and had a hard time tracking down ways to solve these edge cases.
Hopefully with this as a resource I'll be able to make some more progress on it!
To generalize well, it's almost always a good idea to have some sort of regularization, such as penalizing the sum of the square of the parameters. The extra term in the cost function will usually make the naive "normal equations" approach work fine, and give much the same predictions as fancy pivoted QR approaches. On my machine it's also a lot faster (the ball-park is ~~10x faster for large systems).
I'm glad R has super-solid robust GLM implementations. And unless you're fitting many models, you should probably just use such a library routine. However, I wish more tutorials and textbooks would spend more time on the reasons for numerical stability, and when one should care, rather than pushing that detail off into a trail of citations.