Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for providing more detail and context. I wonder how the clash in terminology came about ! Who would have imagined naming things uniquely would be so hard (pun intended).

I understand that your example is entirely pedagogic, but just a cautionary note for the unwary (a) although it is tempting to fold the variance calculation into a single loop (accumulate the totals of x and x^2), (b) neither that obvious single loop version nor the code above are good ways to compute variance if one cares about preserving precision, more so if x has a wide dynamic range. In large scale problems it does raise its ugly head and these bugs are difficult to catch because the code is mathematically correct. Using double mitigates the problem to an extent (but then floats are faster for SIMD vectorization).

Another side note, I have gradually come to realize and appreciate the unique position that Fortran holds. It is not often that you have compiler writers with a background in numerical analysis or vice versa. I BTW have background in none and sorely miss that.




The way I compute the variance above is, to my knowledge, the standard algorithm implemented by most software packages (apparently including MATLAB). (The single pass computation you mention is subject to catastrophic cancellation, and thus pretty terrible unless the mean of your data is very small relative to the variance.) However, the "real" implementation in Julia standard library is indeed a bit better: it performs pairwise summation, which has O(log n) error growth instead of O(n) error growth at negligible performance cost (see http://en.wikipedia.org/wiki/Pairwise_summation).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: