The hacker's guide to uncertainty estimates

BeniBoy · on Oct 10, 2018

Nitpick, but shouldn't you set the ddof of np.std at 1 when you estimate the variance in your plot_confidence_interval? If I recall correctly, the default is 0, which makes it a biased estimator. Weird default by the way!

Great article though, definitely learned some useful things!

Edit: Seems like it is, https://docs.scipy.org/doc/numpy/reference/generated/numpy.s...

mlthoughts2018 · on Oct 10, 2018

Unless your sample size is small, it makes no practical difference and isn’t worth splitting hairs over.

Also, the topic of correct degree of freedom adjustment can get quite complicated, for example see Satterthwaite degree of freedom calculation.

Sometimes you have to calibrate a confidence interval or standard error calculation by comparing with the result from some paper or library function, and you need to match the degree of freedom calculation even if it is not the ideal choice.

I had to do this once when migrating a legacy program from Stata into Python and matching the exact rounding conditions for percentiles (Stata’s xtile) and degrees of freedom for standard error in a few different significance test functions became a huge pain.

While I could get the source for NumPy and SciPy, I could not for all the relevant Stata functions, and spent a lot of time creating experiments that could decide which of these subtle choices were being used in Stata, and then match them in Python.

BeniBoy · on Oct 10, 2018

Definitely true!

But as in the precise situation you described, this can leads to discrepancies in result. So if you're coming from Matlab of R (or even Pandas), this can help to know that in Numpy the default is not the usual statistical one!

mlthoughts2018 · on Oct 10, 2018

I would not say there is any such thing as "the usual statistical" number for degrees of freedom. There are just different formulas for different estimators that have different properties (like bias) that sometimes matter in practical results and sometimes don't matter.

For example, in Bayesian statistics, summary statistics coming from a posterior distribution are _always_ biased, exactly according to the degree of bias encoded into the prior distribution, You _want_ results that are biased, so long as you believe your prior model of the bias accurately reflects the information you have available at the time.

If you chose to use a summary statistics like MAP in that setting, you would not care one bit whether it was biased or consistent or whatever. You'd just care that the posterior distribution is useful for a practical purpose.

In this sense, I think frequentist stats education falls short a lot of time time. There is nothing special about NHST as a framework for measuring significance of an effect. It's just one way to do things that sometimes is useful and other times isn't.

Similarly there is no special reason to ever care about BLUE estimators, unbiased estimators, efficient estimators, consistent estimators, etc. etc., or differences between different hypothesis tests like Wald test or Welch's t-test or Mann Whitney non-parametric test, yadda yadda yadda.

They are just different things with different properties and different formulas. None of them are "the usual statistical one." And any time you reference a software package using any of them, you should not expect the software package to have the same assumption about what default choices to make that you might believe from a textbook or something. There's no reason to expect them to be connected really at all.

rademacher · on Oct 10, 2018

But your estimate isn't very meaningful if you only have enough samples such that the N v (N-1) normalization matters, the variances on the estimator will be too large. The 1/N normalization comes from the MLE BTW, I'm sure most know this already. As an aside, to see the power of a biased estimator look at the James-Stein Estimator [1].

[1] https://en.wikipedia.org/wiki/James%E2%80%93Stein_estimator

im3w1l · on Oct 11, 2018

1. Kinda, but not quite. ddof=1 will give an unbiased estimate of the variance. It turns out that an unbiased estimate of the standard deviation isn't possible without making assumptions about the probability distribution.

2. If you have a finite population and know all the values, ddof=0 will give you an unbiased estimate of the variance. "These are the heights of every person in my class, what is the variance?"

If you have an infinite population and a finite sample and want to estimate the variance of the infinite population, then ddof=1 is correct.

Then there's also the possibility of correcting for the finite size of the population but that's even less important than correcting for the size of the sample.

clircle · on Oct 10, 2018

> Since all outcomes are 0 or 1, and drawn with the same (unknown) probability, we know that the number of ones and zeros follows a binomial distribution. This means that the confidence interval of a “k out of n” scenario is a Beta distribution.

Pretty confused by this sentence. The mean of a bunch of 0s and 1s cannot follow a beta distribution. The support set is not continuous. I think the author is making Clopper-Pearson intervals, but dropping some terms.

c_moscardi · on Oct 16, 2018

Yep, my read too. The code he provides is definitely the Clopper-Pearson formula. [1]

This confused me as well, since I've always used the normal central-limit-based confidence interval for binary outcomes.

[1] https://en.wikipedia.org/wiki/Binomial_proportion_confidence...

(I am no expert in the analytic underpinnings of the beta distribution or precisely how it is the conjugate prior to the binomial -- or, rigorously speaking, what conjugate prior means -- but the formula here lines up with his formula :P )

ezke · on Oct 10, 2018

The beta dist is conjugate to the binomial, it is a natural prior for the proportion parameter. The mean of a bunch of 0s and 1s will be between [0,1], the support of the beta dist.

clircle · on Oct 10, 2018

The mean of n 0s and 1s has support {0, 1/n, 2/n, ..., n/n}, not [0,1].

>The beta dist is conjugate to the binomial, it is a natural prior for the proportion parameter.

I don't think the author is doing Bayesian inference.

chewbaxxa · on Oct 10, 2018

The author probably means that the unknown probability follows a beta distribution. Then you can construct confidence intervals for p the usual way.