Frequentist Statistics are Frequently Subjective

hamilton · on Dec 5, 2009

This debate has been raging for 250 years. Brad Efron, one of the greatest living statisticians today and one of the few who can transcend the debate, has VERY interesting things to say about it. He believes that Empirical Bayes, or using some of your data to model your prior, will win the day.

http://www-stat.stanford.edu/~ckirby/brad/papers/2005NEWMode...

nova · on Dec 5, 2009

> or using some of your data to model your prior

I think that kinda missing the point.

hamilton · on Dec 5, 2009

I beg to differ (not sure if you actually read the Efron talk I just posted). I get the thrust of the Less Wrong article, though I think that frankly his language and vitriol is direly misplaced. He's railing against some class of Frequentists who attack Bayesians. This is sort of a Fox News tactic; pick someone, in this case, a commenter on a blog, and generalize their comment to the entire population. It's fairly unstatistical, if you ask me. I don't get why anyone would do this. A good Frequentist is not a lost idiot, nor does she rail against Bayesians. In fact, a good Frequentist understands the beauty of Bayesian methods, too. I'd even wager that a good Bayesian understands why Frequentist methods have dominated the 20th century.

I'm attempting to realize that rare event in these vitriolic philosophical academic debates - add a bit of middle-groundness. In this case, the middle ground of Empirical Bayes has proven to be very, very useful in large-scale simultaneous inference problems. It's proof that there IS such thing as convergence in this debate. Taking the shrill tone the author takes as a signal of how little regard the two sides have for each other, it's obvious that we need some convergence.

nova · on Dec 6, 2009

> it's obvious that we need some convergence.

Why? This isn't politics. Maybe one approach IS better than the other. Or maybe there are better methods we don't know.

In fact I think most Bayesians would consider Frequentist methods either plain "wrong" (like, at best works in special cases) or as approximations to the full Bayesian way.

And that doesn't mean they are useless, because going fully Bayesian normaly requires "a lot" of computation, or maybe we have issues with elicitation (I suspect this is the biggest hurdle for frequentists). But Bayes is still the gold standard.

tel · on Dec 6, 2009

I don't think an upvote does this enough justice.

At the end of the day, we're all just trying to be better. Shaking strawmen at progress just because it's not perfect (you'll note that some of the Less Wrong/Overcoming Bias arguments leave small disclaimers of assumptions of infinite computing power; not all, but some) is just as foolish as refusing to see when the ground moves beneath your feet.

tel · on Dec 6, 2009

Hamilton, I'd like to email you about your work at Stanford if you don't mind. Could you drop me a line?

smanek · on Dec 5, 2009

To my (limited) understanding, Bayesian statistics are just as subjective. If you and I start with different priors, and are then are fed the same evidence, we can end up with vastly different conclusions.

Since priors are defined as a function of my state of mind (or state of ignorance, as it were), it seems pretty subjective that the 'result' should depend so heavily on my initial state of mind.

E.T. Jaynes' book has a example of this with regard to E.S.P. research (I believe http://omega.albany.edu:8008/ETJ-PS/cc5d.ps is the relevant chapter - but it's been a year or two since I've read the book).

A real statistician should feel free to correct me though - I'm more of an algebra guy ...

yummyfajitas · on Dec 5, 2009

The argument is that Bayesian statistics are less subjective because while we won't come to the same conclusion, we will perform the same update to our prior probabilities.

An example: consider a medical diagnostic test with a 1% false positive rate and no false negatives.

If I understand him correctly, Eliezer is arguing that the scientist (i.e. the lab running the test) should present an update rule, not the posterior probability. I.e., the lab report should say: "Whatever your prior estimate of P(sick) is, your posterior should now be P(sick)/(0.01 P(not sick) + P(sick))" instead of saying "the patient has an x% chance of being sick".

Suppose before the test I believe the odds the patient is sick are 1% and you think they are 10%. Once the test comes out positive, I've updated my probability estimate to 50%, you to 92%.

The update rule provided by Bayesian statistics is objective, even if the posterior is subjective. The frequentists have nothing comparable which is objective.

See also this link:

http://www.overcomingbias.com/2009/02/share-likelihood-ratio...

smanek · on Dec 5, 2009

Ah, that makes sense - thanks.

btilly · on Dec 5, 2009

Your understanding is incomplete. Bayesian statistics are not subjective at all. Instead they objectively describe the correlation between prior belief and conclusion. People may draw different conclusions for subjective reasons, but that subjectivity is in the people, not the statistics.

Now it seems you are objecting to the fact that your initial state of mind affects your conclusion. But it is unavoidable that there is no way to make sense of observation except through the lens of prior beliefs. You can be explicit about it and draw conclusions with a correct methodology about it as in Bayesian statistics. You can be implicit about it and draw conclusions with an incorrect methodology as in frequentist statistics. (See my other post in this thread for an explanation of why frequentist methodology is incorrect.)

Let me offer a simple example. Suppose a pregnant woman you know gives birth to a boy. Is that evidence that babies are more likely to be boys than girls? Obviously it is. Is it strong evidence? Obviously not. Should we upon observing that conclude that boys are more likely than girls? Obviously not.

Now suppose that you have no prior knowledge other than we see about similar numbers of boys and girls. I know that of the last 100,000 babies born in the USA, 51,157 are boys. Suppose we are both told that at the local hospital, 92 of the last 200 babies born were boys. I submit that we both will and should wind up with different conclusions about the relative likelyhood of boys and girls. Why? Because different prior knowledge lead to different prior beliefs, and those prior beliefs when modified by identical evidence lead to different conclusions.

smanek · on Dec 5, 2009

You have to admit that your example is pretty contrived though - usually setting priors isn't so clean.

How do you 'properly' set the prior probability that someone really has E.S.P.? That a researcher is secretly colluding with a test subject? That a researcher is flat out fabricating their results? That this is just an instance of the Hans effect? Or a thousand other possible hypotheses ...

btilly · on Dec 5, 2009

I picked my example so that the proper influence of the prior would be clear. You are absolutely right that there are many cases where different people with equivalent information have different beliefs. But being faced with complications like this makes correct reasoning more important, and not less.

When you try to sweep the proper influence of the prior under the carpet, people will manipulate their statistics to draw the conclusions that they want. Worse still, if each is using subjective techniques that they believe are objective, the unexamined assumption will lead to them talking past each other.

By contrast with correct reasoning you can show each why they continue to believe what they believe while showing them how the other side is not going to have their opinions changed. In my experience this opens up a bigger possibility of useful dialog and changed opinions.

frig · on Dec 5, 2009

Say a proper Bayesian who's a fan of 'da Bears' just saw the bears lose again, badly. How is he supposed to update these two beliefs:

- this season 'da Bears' will tear shit up every Sunday (fwiw assume prior on this has it as likely)

- I am usually over-optimistic about 'da Bears''s performance (fwiw assume prior on this also has it as likely)

...in light of the new evidence?

(Side note: I'm specifically not talking about bayesian analysis or statistics in a formal setting with a prespecified, finite list of allowed hypotheses (like spam or ham).

If I wrote this out in extreme detail I'd get nothing done today, I think you can charitably fill in the gaps and missing pieces of what I'm getting at; if not I'll fill them out in a few.)

tel · on Dec 5, 2009

There are a ton of things to clarify, but I'll take a stab at it.

A proper Bayesian who is betting and doesn't want to lose badly and still ends up as a fan of the 'da Bears' has probably got a prior that gives some confident edge to them winning. Belief A could be stated using this joint distribution as the product of their predicted chances of winning each game in order. If it's actually 'likely' then that means you've got a pretty incredible edge on them winning (the size proportional to how much season is left).

Belief B is a weird one though. It's a meta-hypothesis about the calibration of your own personal beliefs. The evidence you use to update on this belief is the discrepancy between what you would honestly predict and what actually happened. A proper Bayesian with money on the line would want to recalibrate as best as they could using data already available in order to get the probability of B as low as possible before starting to look at A.

So our proper Bayesian first looks over old predictions he has about the Bear's performance, reworking whatever internal understanding of the factors that go into winning in football he has, until he is well calibrated. At this point, his probability for belief A has almost certainly dropped because it's a pretty unlikely thing for a team to just take a thing apart at every single game for the whole season, but if he still ends up with a strong prior on them winning then a single loss, even if it's pretty bad, won't shift it around a lot.

In short, he'll think about it a lot, cancel out whatever personal biases he can manage, then bet conservatively unless he has some sort of knowledge that provides a really, really strong edge on them winning. IMO, he's got an inside line with some dirty, dirty men.

frig · on Dec 5, 2009

The specific example of 'da Bears' was a weak attempt at humor. I can't easily clarify what I'm getting at and keep this short, and I've got limited time so I'll do the best I can.

I see a lot of people using "informal" bayesian reasoning (meaning a lot of talk about priors and updating and reference to theorems but never any use of actual distributions beyond super-super-cursory examples applied to trivial situations like the boy/girl thing here or stuff like the monty hall problem).

I don't have any problem at all with bayesian analysis applied in a rigorous setting to a rigorously specified problem (like spam detection and so on).

In an informal setting I'm extremely skeptical of the uses I tend to see b/c there's no careful attempt to clearly delineate which informally-statable hypotheses are valid and which are "invalid" "meta-hypotheses" like the optimism thing.

What you've described here is a way in which someone reasonably smart would eliminate the meta hypothesis, which is fine. In general I wouldn't expect it to be feasible to take a full mental inventory, do a topological sort on your beliefs, and then apply the same procedure; most people most of the time will be running around holding partially-inconsistent beliefs (where "hold" means if you were to ask them to give an estimate of, say, what beliefs they had about what # of their beliefs were likely to wind up revealed to be significantly off in the future, or to give an estimate of what they believe about the frequency with which they'd encounter evidence leading to significant revisions of their beliefs, they'd have an answer on offer which would still have "work to do", the way the unexamined belief that "I'm too optimistic about the bears" really has work to be done).

What I'm curious about is if there's either a clearly-specifiable criteria for which types of beliefs or hypotheses are workable and which are "too meta to work", or there's some kind of theorem guaranteeing that starting out with "inconsistent" beliefs -- in the sense of "meta-hypotheses" like with da bears -- you can apply this algorithm to process evidence and over time you'll converge on beliefs that're at least more consistent than you started with.

It's hard to say much more without getting formal and I'm out of time for now; since I'm mainly concerned with informal use of "bayesian" metaphors it's not hugely critical to formalize this stuff but later I could give it a proper whack.

tel · on Dec 5, 2009

That's definitely an interesting space. I think the highly principled side of the Bayesian boat would state that meta-hypotheses are tied into your prior on model building information (Pr(I), etc) and that it needs to be updated alongside everything else. So now if your hypothesis' posterior becomes something like Pr(H, theta, I) the whole business needs to be updated and will include all the meta level intellectual rigor. At this point I feel like I might be walking into the space of Structural Causal Modeling and I'm not too well versed there at all.

In the informal setting though you're only ever likely to be trying to "update" one belief at a time, so, yeah, it definitely requires intellectual care to make sure to follow dependencies. Worse though, is that it should be possible to two have codependent estimations and if you aren't aware of that codependency you won't ever be able to get along.

I think that's all interesting, but I'm not sure it applies to informal situations as well as one might hope. Frequently, Bayesian techniques are only used informally in conjunction with strong rationalist heuristics which help to build these reductionist hierarchies of effects and then allow for clear(er) methodology to find an accurate answer.

Few people thinking carefully and rationally would be willing to bet on their beliefs so long as they know that thy have an outstanding miscalibration. That's why scientists, good scientists anyway, will so often preclude things with disclaimers. They want you to be aware of whatever biases they can before you start to judge their opinions.

jules · on Dec 5, 2009

The difference is that bayesians acknowledge this explicitly. You publish the way to calculate the posterior given someones prior so that everyone can fill in their own priors to determine their posteriors. This calculation is objective. With frequentist statistics everything is subjective, including the results you publish.

mnemonicsloth · on Dec 5, 2009

To my (limited) understanding, Bayesian statistics are just as subjective.

I'm still reading, but that seems to be the point. As Yudkowsky puts it, "probability is ignorance and ignorance is a state of mind"

mattmcknight · on Dec 5, 2009

I get lost in the argument with these simple coin flipping examples. The way it's presented is so confusing for the weird second case of how many trials it takes to get a tail.

In the first case, the probability of getting "five tails or more" from a fair coin is 11%, while in the second case, the probability of a fair coin requiring "at least five tails before seeing one heads" is 3%.

I didn't check the numbers here, but this seems perfectly reasonable to me in terms of hypothesis testing. He's playing on the fact that for cases like THXXXX you would get a result of 2 in the second experiment. You are taking the same results, but applying a different metric to them (position of first head versus count of tails). Of course p will be different.

I understand that the value chosen for significance of p is subjective, but the experiment itself makes perfect sense to me.

btilly · on Dec 5, 2009

The problem is that the conditional probability of the coin being biased given the observation of TTTTTH absolutely does not and cannot depend one whether you intended to flip until you saw a heads, or were going to flip 6 times. Therefore you've got a subjective factor influencing your conclusion that probability theory says should not be a factor if you are reasoning consistently.

gort · on Dec 5, 2009

To me, more interesting than Bayesianism versus Frequentism was the observation that scientific procedure as it currently stands allows a field like Parapsychology to survive.

btilly · on Dec 5, 2009

The real problem with statistics is that people want an answer to an impossible problem. Namely they want to be told the probability of the world being a particular way. But anyone familiar with conditional probability can easily see that there is no way to come up with that answer, because the conditional probability of something after an observation depends on the assumed probability before the observation.

There are multiple approaches to this problem. What frequentist statistics does is answers a different question. Namely, "What is the probability of getting a result this strongly against the null hypothesis if the null hypothesis is true." This is appealing in that it is an objective probability that seems to say something about the problem under discussion. However people consistently read it as, "What is the probability that the null hypothesis is true?" Which is simply wrong.

There is a second problem with frequentist statistics, which is what this article is about. Which is that the objective-looking probability you get depends on the null hypothesis chosen in ways it shouldn't. Basic familiarity with Bayes' Theorem and conditional probability demonstrates that it is impossible for it to matter whether your intention was to flip a coin 6 times, flip it until you get both heads and tails, or flip it until you get heads. This factor cannot affect the conditional probability of the coin being biased. But those three different intentions translate into 3 different calculations, and 3 different answers in frequentist statistics.

So that's the frequentist approach. What is the Bayesian approach? It is to come up with statistics that inform us on how prior beliefs before observation turn into posterior beliefs after observation. The advantage of this method is that it is intellectually honest. The disadvantages are that it is complicated and people notice that it is confusing. (The frequentist approach is confusing as well, but people don't notice their confusion. Instead the confidently draw the wrong conclusion that the null hypothesis has been proven.)

There are other approaches. The article touched on my favorite when it pointed out that we should report likelyhood ratios rather than probabilities. This is absolutely right. The effect of an observation is to modify our prior beliefs, and likelyhood ratios concisely describe how we should modify them. Plus they have the great ability to stack - you can take 3 experiments and combine their likelyhood ratios to come up with the likelyhood ratio for having seen all three experiments.

Unfortunately, though, everyone knows frequentist methods, people accept them, and it is very hard to get people to see what is wrong with them. So alternate approaches, though theoretically superior, face an uphill battle towards acceptance.

tel · on Dec 5, 2009

The difficulty I tend to encounter in promoting Bayesian statistics among scientists is the "sudden" appearance of a statistical model. Too many people go one further than believing frequentist statistics are answering "What is the probability that H0 is true" but instead just fully over the analysis and believe that frequentist methods tell you, simply, accurately, objectively, whether an experimental treatment is "significant".

If you try to press on what people believe "significant" to mean it gets ugly fast, but it's generally a good thing and definitely necessary to publish. If you don't get significance it's just because you need a bigger n. If you can think of some factors or covariates then you really need to use ANOVA.

Stating that there is anything more complex to looking at data and deciding what it means is practically unthinkable.

Likelihood ratios are definitely nice, though. I've sort of gotten people to think about it at a high level by talking about "information flows" and log likelihood values.

Eliezer · on Dec 5, 2009

A separate problem, not dealt with in that particular essay, is the quite hideous degree to which average scientists don't understand the statistics they use.

I would put a good deal of the blame for this squarely on frequentism as well. Bayesianism isn't hard to understand, it's just takes an effort of the teacher to explain well - I've made certain notable efforts in that direction myself. Once you do get it, you get it.

tel · on Dec 5, 2009

I think, roughly, the blame goes out to Fisher and anyone else who promoted the "Recipe for Understanding the World" style statistics. It's not that people are being blocked by their understanding of complex frequentist methods but instead the idea that they don't need to understand anything more because statistics is just a black box you use for verification.

Insert results, get a green or red "significance" light, move on.

hamilton · on Dec 5, 2009

I think you're on to something regarding "significance." Over in my dept. we like to say that significance is a measure of sample size. The question, then, hinges on whether or not something has practical significance. Because we've built the whole research reputation incentive structure on the .05 significance level, studies can be designed to get that.

A great article on the problem with statistical significance (by a Frequentist) is here, called Why Most Published Research Findings Are False: http://www.plosmedicine.org/article/info:doi/10.1371/journal...

The punchline seems to be, well, that there's also a large human element that contributes to the problem. I think it's one thing to rail on the Frequentist way-of-thinking; it's entirely another to state that the institution of scientific research built on it creates unwanted incentives.

xtho · on Dec 6, 2009

If you plan experiments and the like, the definition of a significance level (always) comes with a definition of sample size.

tel · on Dec 6, 2009

The relationship between significance level and sample size is reliant on a complex set of assumptions to say the least, and, when everything is stripped away, is perhaps best seen as a way of discovering just how difficult it will be to deblur the world. What power prescription we need.

Often (always?) these constraints are all so very much more complex than Gaussian power analysis states. You do it as a way of sketching the depth of a problem I think, not much more.

The linked paper is a pretty clear introduction of the high level problems. I think it's perhaps a little more grim than necessary, but then again that might just be my own bias.