Hacker News new | past | comments | ask | show | jobs | submit login
Statistical Rethinking (2022 Edition) (github.com/rmcelreath)
471 points by eternalban on Jan 16, 2022 | hide | past | favorite | 124 comments



I've read this book and taken this course twice, and it is easily one of the best learning experiences I've ever had. Statistics is a fascinating subject and Richard helps bring it alive. I had studied lots of classical statistics texts, but didn't quite "get" Bayesian statistics until I took Richard's course.

Even if you aren't a data scientist or a statistician (I'm an infrastructure/software engineer, but I've dabbled as the "data person" in different startups), learning basic statistics will open your eyes to how easy it is to misinterpret data. My favorite part of this course, besides helping me understand Bayesian statistics, is the few chapters on causal relationships. I use that knowledge quite often at work and in my day-to-day life when reading the news; instead of crying "correlation is not causation!", you are armed with a more nuanced understanding of confounding variables, post-treatment bias, collider bias, etc.

Lastly, don't be turned off by the use of R in this book. R is the programming language of statistics, and is quite easy to learn if you are already a software engineer and know a scripting language. It really is a powerful domain specific language for statistics, if not for the language then for all of the statisticians that have contributed to it.



Even if you don't like R, your can do the entire course with Julia/Turing, Julia/Stan or Python, the course github's page has a list of “code translations” for all the examples.


There is also other translations, for example, in pytorch/pyro: https://fehiepsi.github.io/rethinking-pyro/

I would say statistical rethinking is a great way to compare and contrast different ppl impls and languages, I've been using it with Turing, which is pretty great.


I frequently prefer R to python/pandas/numpy for data analysis--even if most of my other programming is in python.


What's the advantage, if you already know Python? (genuine interest)


I don't want to say "advantage", so much as preference. But a few things come to mind.

- Lots of high quality statistical libraries, for one thing.

- RStudio's RMarkown is great; I prefer it to Jupyter Notebook.

- I personally found the syntax more intuitive, easier to pick up. I don't usually find myself confused about the structure of the objects I'm looking at. For whatever reason, the "syntax" of pandas doesn't square well (in my opinion) with python generally. I certainly want to just use python. But, shrug.

- The tidyverse package, especially the pipe operator %>%, which afaik doesn't have an equivalent in Python. E.g.

    with_six_visits <- task_df %>%
      group_by(turker_id, visit) %>%
      summarise(n_trials = n_distinct(trial_num)) %>%
      mutate(completed_visit = n_trials>40) %>%
      filter(completed_visit) %>%
      summarise(n_visits = n_distinct(visit)) %>%
      mutate(six_visits = n_visits >= 6) %>%
      filter(six_visits) %>%
      ungroup()
Here I'm filtering participants in an mturk study by those who have completed more than 40 trials at least six times across multiple sessions. It's not that I couldn't do the same transformation in pandas, but it feels very intuitive to me doing it this way.

- ggplot2 for plotting; its really powerful data visualization package.

Truthfully, I often do my data text parsing in Python, and then switch over to R for analysis, E.g. python's JSON parsing works really well.


I can see how this is more intuitive. In pandas I'd assign the output of groupby to a variable, and then add the new column in a separate statement.

(The below is off topic, but I don't use R so I'd love to know whether I'm reading the code correctly)

"Here I'm filtering participants in an mturk study by those who have completed more than 40 trials at least six times across multiple sessions."

A user with this pattern of trials seems like they would fit the above definition:

Session 1: 82 trials Session 2: 82 trials Session 3: 82 trials

But the code seems to want 6 distinct sessions with >40 trials each. Have I misunderstood?

Also, is 'mutate' necessary before 'filter' or is that just to make the intent of the code clearer to your future self?


My initial wording was sloppy.

There were 50 trials in each session; so I counted a session completed if they did more than 40 in that session. They needed to have completed at least six sessions.

The mutate is unnecessary. I forget why I did that.


What it woul take to recreate dplyr in python:

https://mchow.com/posts/2020-02-11-dplyr-in-python/


Didn’t R introduce the native pipe operator?

%>% is now simply >|


They did. I just haven't gotten around to using it yet!


Tabular data manipulation packages are better, easier to make nontrivial charts, many R stats packages have no counterparts in Python, less bureaucracy, more batteries-included.

R is a language by and for statisticians. Python is a programming language that can do some statistics.


For me, I use R data.table a lot and I see as the main advantages are performance and the terse syntax. The terse syntax does come with a steep learning curve though.


Indeed, data.table is just awesome for productivity. When you're manipulating data for exploration you want the least number of keystrokes to bring an idea to life and data.table gives you that.


I totally agree. I often find myself wanting data.table as a standalone database platform or ORM-type interface for non-statistical programming too.


What is terse syntax? I can parse lisp and C, how would this be different and challenging?


The syntax isn't self-describing and uses lots of abbreviations; it relies on some R magic that I found confusing when learning (unquoted column names and special builtin variables); and data.table is just a different approach to SQL and other dataframe libraries.

Here's an example from the docs

  flights[carrier == "AA",
    lapply(.SD, mean),
    by = .(origin, dest, month),
    .SDcols = c("arr_delay", "dep_delay")]
that's clearly less clear than SQL

  SELECT
    origin, dest, month,
    MEAN(arr_delay), MEAN(dep_delay)
  FROM flights
  WHERE carrier == "AA"
  GROUP BY arr_delay, dep_delay
or pandas

  flights[filghts.carrier == 'AA'].groupby(['arr_delay', 'dep_delay']).mean()

But once you get used to it data.table makes a lot of sense: every operation can be broken down to filtering/selecting, aggregating/transforming, and grouping/windowing. Taking the first two rows per group is a mess in SQL or pandas, but is super simple in data.table

  flights[, head(.SD, 2), by = month]
That data.table has significantly better performance than any other dataframe library in any language is a nice bonus!


Taking the first two rows is a mess in pandas?

flights.groupby("month").head(2)

Not only is does this have all the same keywords, but it is organized in a much clearer way to newcomers and labels things to look up in the API. Whereas your R code has a leading comma, .SD, and a mix of quotes and non-quotes for references to columns. You even admit the last was confusing to learn. This can all be crammed in your head, but not what I would call thoughtfully designed.


I agree the example in GP is not convincing. Consider the following table of ordered events:

    | Date | EventType |
and I want to find the count, and the first and last date of an event of a certain type happening in 2020:

    events[
        year(Date) == 2020L, 
        .(first_date = first(Date), last_date = last(Date), count = .N),
        EventType
    ]
Using first and last on ordered data will be very fast thanks to something called GForce.

When exploring data, I wouldn't need or use any whitespace. How would your Pandas approach look like?


To do that, the code would look something like:

mask = events["Date"].year == 2020 events[mask].groupby("EventType").agg(first_date=("Date", min), last_date=("Date", max), count=("Date", len))

Anyway, I don't understand why terseness is even desirable. We're doing DS and ML, no project never comes down to keystrokes but ability to search the docs and debug does matter.


It helps in quickly improving your understanding of the data by being able to answer simple but important questions quicker. In this contrived example I would want to know:

- How many events by type

- When did they happen

- Are there any breaks in the count, why?

- Some statistics on these events like average, min, max

and so on. Terseness helps me in doing this fast.


You mean something like

    SELECT
    origin, dest, month, AVG(arr_delay), AVG(dep_delay)
    FROM flights
    WHERE carrier == 'AA'
    GROUP BY origin, dest, month
and

    flights[flights.carrier == 'AA'].groupby(['origin', 'dest', 'month'])[['arr_delay', 'dep_delay']].mean()


Yep thanks, you can tell I use a "guess and check" approach to writing sql and pandas...


On top of what has been said, if you want to do some more advanced statistical analyses (in the inference area, not ML/predictive field), then chances are that these algorithms are either published as R or STATA packages (usually R).

In Python, there is statsmodels. Here, you'll find a lot of GLM stuff, which is sort of an older approach. Modern inferential statistics, if not just Bayesian, is usually in the flavor of semi-parametric models that rely on asymptotics.

As R is used by professional researchers, it is simply more on the edge of things. Python has most of the "statistics course" schoolbook methods, but not much beyond that.

For example, it has become very common to have dynamic panel data which require dynamic models. Now if you want to do a Blundell-Bond type model in PYthon you have to... code it yourself using GMM, if it exists even.

For statistics, that's pretty much like saying you have a Deep Learning package that maybe has GRU but no transformer modules at all. So yeah, you can code it yourself. Or you use the other one.


I have the opposite question. I have been programming in R since I was 19. I know no other programming languages.

Hence my question:

What's the advantage of Python if you already know R?

I've heard they have similarities. Is there anything Python does better than R in terms of statistical analysis, charting, etc.?


> What's the advantage of Python if you already know R?

AFAIK in statistical modelling Python is better only in neural networks, so if you do not need to do fancy things with images, text, etc. you do not need Python. R is still the king.

In terms of charting and dashboards, I would say that if you work high level R and Python are both pleasant. R has ggplot, but Python has Plotly Express. R has Shiny, but Python has Dash and Streamlit. You can do great with both.


One difference I've noticed is that R libraries are usually authored and maintained by academics in the associated field; the same can't always be said about equivalent Python libraries. This means that R library authors generally use their own libraries for publication and have an academic stake in its correctness.


R is used by many researchers and consequentially has many more statistical libraries (e.g. try doing a dynamic panel modelling in python).


Self plug: After reading this book if you're looking to continue I recently published a book in the same series with the same publisher.

The book is available for open access, though I appreciate folks buying a copy too! https://bayesiancomputationbook.com

https://www.routledge.com/Bayesian-Modeling-and-Computation-...


Awesome book, Ravin! I’m waiting for my physical copy to arrive (should be here tomorrow!) before really diving in, but what I’ve skimmed in the digital copy so far is great.

Btw I’ve been using PyMC2 since 2010 and contracted a bit with PyMC Labs, so I’m surprised we’ve never bumped into each other!


Thanks for ordering one.

Which company are you with? Perhaps we did bump into each other, asking to see if that is the case


Looks fun! Thanks for sharing. It seems like it covers complementary topics in a very concrete and clear way.


Thanks for checking it out and the feedback. I appreciate it!


Just received my copy, looking forward to tackling it after rethinking!


Thanks for getting a copy. Hope you enjoy it and learn a lot


Thank you!


Here is a direct link to the playlist: https://www.youtube.com/playlist?list=PLDcUM9US4XdMROZ57-OIR... Prof. McElreath has been adding two new videos every week.

Also, for anyone who prefers to use the pythons for the coding, I recommend the PyMC3 notebooks https://github.com/pymc-devs/resources/tree/master/Rethinkin... There is also a discussion forum related to this repo here https://gitter.im/Statistical-Rethinking-with-Python-and-PyM...


Im one of the Core devs for Arviz and PyMC! Glad you found those resources useful. If any has any questions feel free to ask them in gitter and we'd be happy to help


I've been reading polemics and tutorials for at least 12 years now arguing for Bayesian methods over Frequentist methods. They all seem persuasive, and everyone seems to be convinced that Bayesianism is the future, and Frequentism was a mistake, and the change to a glorious future where Bayesian methods are the standard way to do stats is just round the corner. It hasn't happened.

Meanwhile I've never read a real argument for Frequentism and I don't know where I'd find one, short of going back to Fisher who is not well known for clear writing.

What is going on? Is the future of Bayesian statistics just more and more decades of books and articles and code notebooks and great presentations showing how great Bayesianism is? Is it just institutional inertia preventing Bayesian stats from becoming standard, or does Frequentism have a secret strength that keeps it hegemonic?


The biggest problem with Bayesian statistics in practice is the frequent reliance on relatively slow, unreliable methods such as MCMC.

The Bayesian methodology community loves to advocate for packages like Stan, claiming that they make Bayesian stats easy. This is true... relative to Bayesian stats without Stan. But these packages are often much, much harder to get useful results from than the methods of classical statistics. You have to worry about all sorts of technical issues specific to these methodologies, because these issues don't have general technical solutions. Knowing when you even have a solution is often a huge pain with sampling techniques like MCMC.

So you have to become an expert in this zoo of Bayesian technicalities above and beyond whatever actual problem you are trying to solve. And in the end, the results usually carry no more insight or reliability than you would have gotten from a simpler method.

I recommend learning about Bayesianism at a philosophical level. Every scientist should know how to evaluate their results from a Bayesian perspective, at least qualitatively. But don't get too into Bayesian methodology beyond simple methods like conjugate prior updating... unless you are lucky enough to have a problem that is amenable to a reliable, practical Bayesian solution.


>>> Every scientist should know how to evaluate their results from a Bayesian perspective, at least qualitatively.

I'm a scientist. How do I do this? I took one stats course, more than 30 years ago, and it was mostly proofs. Of course we learned Bayes' Theorem, and solved some problems using it.

What I don't remember is being taught to choose an "ism" for doing math, so it could easily be something that I'm already doing and taking for granted.


At a minimum, recognize that not all p < 0.05 results have an equal claim on scientific truth. The study power and the prior probability of the hypothesis being true also influence the posterior probability [1]. In particular, underpowered studies of implausible hypotheses provide almost no scientific signal [2], and p < 0.05 is meaningless in such cases.

[1] https://journals.plos.org/plosmedicine/article?id=10.1371/jo...

[2] http://www.stat.columbia.edu/~gelman/research/unpublished/po...


Check. Thankfully I come from the physics world, where we don't trust ourselves with any analysis method unless we have some independent way to make sure our results make sense. Most physicists are uneasy about "fancy" statistics.

There's the famous quip: "If an experiment needs statistics, then one ought to have done a better experiment." While this is a bit extreme, it does represent a tendency to design experiments, when possible, that lend themselves to relatively basic data analysis. In fact, I haven't ever calculated a p value in my work.

From what I've read about Bayesian statistics, I'm not sure it's going to get the life sciences out of the mess they're in. That's just my personal hunch. The trouble is, for justifiable reasons, life science research is hard, and is conducted with urgency due to the immediate need for solutions (such as vaccines, therapies for mental illness, or cynically, advertising revenue).

A good rule of thumb seems to be: Multiply p by 10. In other words, p < 0.05 is a coin toss. This rule seems to replicate the entire replication crisis. ;-)


For those coming from a CS background a possible (crude) intuition sometimes given is that

frequentist :: Bayesian ~ worst-case analysis :: average-case analysis

There are a good reasons why we don't usually do average-case analysis of algorithms, chief among them that we have no idea how inputs are distributed (another reason is computational difficulty). Worst-case bounds are pessimistic, but they hold.


In CS you're often dealing with adversarial inputs, for which worst-case analysis is obviously the right approach. Not sure that there's anything comparable to that in most statistical settings.


Institutional and community resistance shouldn’t be underestimated.

Until about a decade ago, it could be really hard to get papers accepted in my scientific field (geosciences, particularly geology) that used Bayesian methods because senior faculty and journal editors deemed them subjective and less rigorous than frequentist ‘letter tests’. This didn’t really change until more powerful computers and mathy scripting languages allowed the younger scientists to generate undeniable better results, particularly for very poorly conditioned inverse and parameter estimation problems. The wave of pro-Bayesian propaganda has been promulgated in no small part by the younger scientists who have been fighting for acceptance for a long time, but the resistance fought through editorial and grant proposal rejection letters, not blog posts, more powerful but less noisy. It’s also still unlikely to find Bayes stats classes in most geoscience departments, so practitioners are largely self-taught outside of Michigan, Caltech etc.

I imagine it’s been similar in many branches of science.


> Meanwhile I've never read a real argument for Frequentism and I don't know where I'd find one, short of going back to Fisher who is not well known for clear writing.

Check out the works of Deborah Mayo and her peers. She has several books and papers on the virtues of Frequentism that range from non technical to technical. Her blog is excellent as well.

errorstatistics.com

Also, check out modern formulations of frequentist models in terms of estimating equations, which add a certain level of coherence to frequentist models that approaches a Bayesian coherence.


I don't know much about statistical uses of Bayesianism but can say something opinionated about the underlying philosophy.

From a philosophical point of view, Bayesianism is fairly weak and lacks argumentative support. The underlying idea of probabilism - that degrees of belief have to be represented by probability measures - is in my opinion wrong for many reasons. Basically the only well-developed arguments for this view are Dutch book arguments, which make a number of questionable assumptions. Besides, priors are also often not known. As far as I can see, subjective utilities can only be considered rational as long as they match objective probabilities, i.e., if the agent responds in epistemically truth-conducive ways (using successful learning methods) to evidence and does not have strongly misleading and skewed priors.

I also reject the use of simple probability representations in decision theory, first because they do not adequately represent uncertainty, second because they make too strong rationality assumptions in the multiattribute case, and third because there are good reasons why evaluations of outcomes and states of affairs ought to be based on lexicographic value comparisons, not just on a simple expected utility principle. Generally speaking, Bayesians in this area tend to choose too simple epistemic representations and too simple value representations. The worst kind of Bayesians in philosophy are those who present Bayesian updating as if it was the only right way to respond to evidence. This is wrong on many levels, most notably by misunderstanding how theory discovery can and should work.

In contrast, frequentism is way more cautious and does not make weird normative-psychological claims about how our beliefs ought to be structured. It represents an overall more skeptical approach, especially when hypothesis testing is combined with causal models. A propensity analysis of probability may also sometimes make sense, but this depends on analytical models and these are not always available.

There are good uses of Bayesian statistics that do not hinge on subjective probabilities and any of the above philosophical views about them, and for which the priors are well motivated. But the philosophical underpinnings are weak, and whenever I read an application of Bayesian statistics I first wonder whether the authors haven't just used this method to do some trickery that might be problematic at a closer look.

I'd be happy if everyone would just use classical hypothesis testing in a pre-registered study with a p value below 1%.


> The underlying idea of probabilism - that degrees of belief have to be represented by probability measures - is in my opinion wrong for many reasons. Basically the only well-developed arguments for this view are Dutch book arguments, which make a number of questionable assumptions.

Why don't you consider Cox's theorem - and related arguments - well-developed?

https://en.wikipedia.org/wiki/Cox%27s_theorem


That's an excellent question. The answer is that I don't really count such kind of theorems as positive arguments. They are more like indicators that carve out the space of possible representations of rational belief and basically amount to reverse-engineering when they are used as justifications. Savage does something similar in his seminal book, he stipulates some postulates for subjective plausibility that happen to amount to full probability (in a multicriteria decision-making setting). He motivates these postulates, including fairly technical ones, by finding intuitively compelling examples. But you can also find intuitively compelling counter-examples.

To mention some alternative epistemic representations that could or have also been axiomatized: Dempster-Shafer theory, possibility theory by Dubois/Prade, Halpern's generalizations (plausibility theory), Haas-Spohn ranking theory, qualitative representations by authors like Bouyssou, Pirlot, Vincke, convex sets of probability measures, Jøsang's "subjective logic", etc. Some of them are based on probability measures, others are not. (You can find various formal connections between them, of course.)

The problem is that presenting a set of axioms/postulates and claiming they are "rational" and others aren't is really just a stipulation. Moreover, in my opinion a good representation of epistemic states should at least account for uncertainty (as opposed to risk), because uncertainty is omnipresent. That can be done with probability measures, too, of course, but then the representation becomes more complicated. There is plenty of leeway for alternative accounts and a more nuanced discussion.


Thanks. I found interesting that you like the Dutch book arguments more than the axiomatic ones.

> Moreover, in my opinion a good representation of epistemic states should at least account for uncertainty (as opposed to risk), because uncertainty is omnipresent.

Maybe I'm misunderstading that remark because the whole point of Bayesian epistemology is to address uncertainty - including (but definitely not limited to) risk. See for example Lindley's book: Understanding Uncertainty.

Now, we could argue that this theory doesn't help when the uncertainty is so deep that it cannot be modelled or measured in any meaningful way.

But it's useful in many settings which are not about risk. One couple of examples from the first chapter of the aforementioned book: "the defendant is guilty", "the proportion of HIV [or Covid!] cases in the population currently exceeds 10%".


Dutch book arguments are at least intended to provide a sufficient condition and are tied to interpretations of ideal human behavior, although they also make fairly strong assumptions about human rationality. The axiomatizations do not have accompanying uniqueness theorems. The situation is parallel in logic. Every good logic is axiomatized and has a proof theory, thus you cannot take the existence of a consistent axiom system as an argument for the claim that this is the one and only right logic (e.g. to resolve a dispute between an intuitionist and a classical logician).

The point about uncertainty was really just concerning the philosophical thesis that graded rational belief is based on a probability measure. A simple probability measure is not good enough as a general epistemic representation because it cannot represent lack of belief - you always have P(-A)=1-P(A). But of course there are many ways of using probabilities to represent lack of knowledge, plausibility theory and Dempster-Shafer theory are both based on that, and so are interval representations or Josang's account.

I'll check out Lindley's book, it sounds interesting.


> "degrees of belief have to be represented by probability measures", "the philosophical thesis that graded rational belief is based on a probability measure"

Of course it all depends on how we want to define things, we agree on that. There is some "justification" for Bayesian inference if we accept some constraints. And even if there are alternatives - or extensions - to Bayesian epistemology I don't think they have produced a better inference method (or any, really). [I know your comment was about the philosophical foundations, not about the statistical methods. But the alternative statistical methods do not have better philosophical foundations.]


Sorry, I can't agree with you on that one at all. It doesn't "...all depend on how we want to define things." Whether the representation of an epistemic state -- any state, really -- is suitable and adequate for a task is not just a matter of definition, it depends on the reality of what you want to describe. You cannot represent the throw of a six-sided die with a set {1, 2, 3, 4, 5}, for example. If you model in ideal rational agent's belief with a probability measure, then you cannot adequately represent lack of belief. Whether that's okay or not depends on the task.

> I know your comment was about the philosophical foundations, not about the statistical methods.

Absolutely, at the risk of sounding picky I have to say that you've answered to a comment I've never made.

> But the alternative statistical methods do not have better philosophical foundations.

Frequentism and the propensity view have better philosophical justifications, though. You may disagree, but that was the whole point of my first comment. We know that there are genuine stochastic processes with corresponding objective probabilities, for example. Frequentism also prevents incorrect applications of probability such as using statistics to predict the outcome of singular events based on mere conjecture about the priors. You can only do that with an analytic model.


> Frequentism and the propensity view have better philosophical justifications, though.

Not really if the knowledge we care about is related to a concrete situation (rather than the frequency of something under hypothethical replications defined in some ad-hoc way). As you said, whether that's okay or not depends on the task.

If we care about whether there was life on Mars or whether Aduhlem is an effective treatment for Alzheimer's I don't think that frequentist inference has good philosophical support. Frequentist epistemology is not directly applicable.

Of course if you consider the frequentist methods themselves as genuine stochastic processes with corresponding objective probabilities (which also requires a valid model, by the way) you have good philosophical support to say things about those methods and their long-term frequency properties.

But this knowledge about the statistical methods used doesn't translate into knowledge about the existence of life on Mars or the efficacy of Aduhlem unless you are ready to make additional assumptions - 'philosophically unjustified' as they may be.


You're involuntarily confirming my negative criticism of Bayesianism by suggesting Bayesian methods could tell us whether there is life on Mars. Sometimes you really need to gather more information and/or develop an analytic model. It seems that a lot of Bayesianism consists of wishful thinking and trying to take shortcuts (e.g. trying to avoid randomized controlled trials for new drugs).


> suggesting Bayesian methods could tell us whether there is life on Mars.

What I suggest is that Bayesian methods provide a framework to reason about the plausability of some statement about the world in a systematic way (unlike Frequentist methods, whatever the limitations in Bayesian methods).

> Sometimes you really need to gather more information and/or develop an analytic model.

Bayesian methods are definitely not a way to escape the need for an analytic model (including all the prior knowledge) and data gathering. What they provide is a mechanism to integrate the data using the model and calculate the impact of incremental information on our knowledge / uncertainty.

I’m not saying that it’s easy to have a good model and useful data for complex questions. But with Frequentist methods in addition to the model and the data you’d be missing the mechanism to use them in a meaningful way.

I wonder why do you say that Bayesians try to avoid randomized controlled trials for new drugs, by the way. Bayesian methods are increasingly used in randomized clinical trials.


Dempster–Schafer theory is the obvious counterexample to "degrees of belief have to be represented by probability measures."

https://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory


Does is somehow imply that the Dutch book argument is better developed than Cox's argument?


You asked, "Why don't you consider Cox's theorem - and related arguments - well-developed?" I consider Cox's argument not well-developed because D–S theory shows the postulates miss useful and important alternatives. So it fails as an argument for a particular interpretation of probability.


I quoted 13415 saying that the only well-developed arguments were […] and asked him why didn’t he consider […] well-developed - compared to the former. I apologize if the scope of the question was not clear.


You can get to a frequentist technique from a given instance of a bayesian 'trajectory', so I don't really understand what leg frequentistism has left to stand on? How is frequentistism more 'cautious'?

The anti-bayesian frequentist argument, especially re. priors has always reminded me of that story about Minksy and his student 'randomly wiring' his machine. http://magic-cookie.co.uk/jargon/mit_jargon.htm

" In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

"What are you doing?", asked Minsky.

"I am training a randomly wired neural net to play Tic-Tac-Toe" Sussman replied.

"Why is the net wired randomly?", asked Minsky.

"I do not want it to have any preconceptions of how to play", Sussman said.

Minsky then shut his eyes.

"Why do you close your eyes?", Sussman asked his teacher.

"So that the room will be empty."

At that moment, Sussman was enlightened. "


My impression is that philosophers and statisticians are often working with different focal examples. I think that in many fields important scientific knowledge essentially takes the form of a point estimate (e.g. the R0 of Covid is XXXX). It is also easy to come up with useful priors (e.g. the R0 is likely below 20) that arise more from characteristics of the model rather than theory.

Note that it is possible to reformulate the Covid example into a Null hypothesis test at the cost of being less informative (e.g. Is the R0 significantly above 1?) but then the knowledge becomes less useful for making certain important decisions.

Anyways, my general impression is that Bayesian statistics are probably more useful for making good decisions that require precise numerical knowledge of certain types of information but maybe less useful for many of the sorts of conceptual issues philosophers are often interested in.


Regarding "and third because there are good reasons why evaluations of outcomes and states of affairs ought to be based on lexicographic value comparisons, not just on a simple expected utility principle": do you have any suggested references that describe this in more detail?

Same question for "This is wrong on many levels, most notably by misunderstanding how theory discovery can and should work."

Also, do you have any suggestions for statistics books that you do like? Especially those with an applied bent (i.e. actually working with data, not philosophical discussions).


I guess it really depends on your discipline, but Bayesian methods have become more and more popular in a lot of academic communities, being published alongside papers using frequentist methods, so I wouldn't say it's hegemonic anymore.


This is a great book.

However, I really hate the "Golem of Prague" introduction. It presents an oversimplified caricature of modern frequentist methods, and is therefore rather misleading about the benefits of Bayesian modeling. Moreover, most practicing statisticians don't really view these points of view as incompatible. Compare to the treatment in Gelman et al.'s Bayesian Data Analysis. There are p-values all over the place.

Most importantly, this critique fails on basic philosophical grounds. Suppose you give me a statistical problem, and I produce a Bayesian solution that, upon further examination with simulations, gives the wrong answer 90% of time on identical problems. If you think there's something wrong with that, then congratulations, you're a "frequentist," or at least believe there's some important insight about statistics that's not captured by doing everything in a rote Bayesian way. (And if you don't think there's something wrong with that, I'd love to hear why.)

Also, this isn't a purely academic thought experiment. There are real examples of Bayesian estimators, for concrete and practical problems such as clustering, that give the wrong estimates for parameters with high probability (even as the sample size grows arbitrarily large).


Gill's book, Bayesian Methods, is even more dismissive, and even hostile towards Frequentist methods. Whereas I've never seen a frequentist book dismissive of Bayes methods. (Counterexamples welcome!)

It boils down to whether you give precedence to the likelihood principle or the strong repeated sampling principle (Bayes prefers the likelihood principle and Frequentist prefers repeated sampling). See Cox and Hinkley's Theoretical Statistics for a full discussion, but basically the likelihood principle states that all conclusions should be based exclusively on the likelihood function; in layman's terms, on the data themselves. This specifically omits what a frequentist would call important contextual metadata, like whether the sample size is random, why the sample size is what it is, etc.

The strong repeated sampling principle states that the goodness of a statistical procedure should be evaluated based on performance under hypothetical repetitions. Bayesians often dismiss this as: "what are these hypothetical repetitions? Why should I care?"

Well, it depends. If you're predicting the results of an election, it's a special 1 time event. It isn't obvious what a repetition would mean. If you're analyzing an A/B test it's easy to imagine running another test, some other team running the same test, etc. Frequentist statistics values consistency here, more so than Bayesian methods do.

That's not to come out in support of one vs the other. You need to understand the strengths and drawbacks of each and decide situationally which to use. (Disclaimer: I consider myself a Frequentist but sometimes use Bayesian methods.)


> Whereas I've never seen a frequentist book dismissive of Bayes methods

Nearly every Frequentist book I have mentioning Bayesian method attempts to write them off pretty quickly as "subjective" (Wasserman, comes immediately to mind but there are others), which is falsely implying that some how Frequentist methods are some how more "objective" (ignoring the parts of your modeling that are subject does not somehow make you more object). The very phrase of the largely frequentist method "Empirical Bayes" is a great example of this. It's an ad hoc method that somehow implies that Bayes is not Empirical (Gelman et al specifically call this out).

Until very recently Frequentist methods have near universally been the entrenched orthodoxy in most fields. Most Bayesians have spend a fair bit of their life having their methods rejected by people who don't really understand the foundation of their testing tools, but more so think their testing tools come from divine inspiration and ought not to be questioned. Bayesian statistics generally does not rely on any ad hoc testing mechanism, and can all be derived pretty easily from first principles. It's funny you mentioned A/B tests as a good frequentist example, when most marketers absolutely prefer their results interpreted as the "probability that A > B", which is the more Bayesian interpretation. Likewise the extension for A/B to multi-armed bandit trivially falls out of the Bayesian approach to the problem.

Your "likelihood" principle discussion is also a bit confusing here for me. In my experience Fisherian schools tend to be the highest champions of likelihood methods. Bayesians wouldn't need tools like Stan and PyMC if they were exclusively about likelihood since all likelihood methods can be performed strictly with derivatives.


This sounds to me very much like a political debate between people arguing for the best method, rather than focusing on the results that you can get with either method.

As long as this debate is still fuelled by emotional and political discourse, nothing useful will come out of it.

What is really needed is an assessment which method is best suited for which cases.

The practitioner wants to know “which approach should I use”, not “which camp is the person I’m listening to in?”


"Whereas I've never seen a frequentist book dismissive of Bayes methods. (Counterexamples welcome!)"

Indeed! There's a lot of Bayesian propaganda floating around these days. While I enjoy it, I would also love to see some frequentist propaganda (ideally with substantive educational content...).


All of Statistics by Larry Wasserman is a great introductory book from the frequentist tradition that includes some sections on Bayesian methods. It's definitely not frequentist propaganda - more like a sober look at the pros and cons of the Bayesian point of view.


My first year of grad school I ordered a textbook but what I got was actually All of Statistics with the wrong cover bound on.

I skimmed through a couple chapters before returning it for a refund. I sometimes regret not keeping it as a curio, but I was a poor grad student at the time and it was an expensive book.


https://archive.org/details/springer_10.1007-978-0-387-21736...

Statistics & machine learning book authors seem to be really good at providing a free, electronic copy.


> Indeed! There's a lot of Bayesian propaganda floating around these days. While I enjoy it, I would also love to see some frequentist propaganda

I think that frequentist statistics doesn’t need marketing. It’s the default way to do statistics for everyone and, frankly, Bayesian software is still quite far away from frequentist software in terms of speed and ease of use. Speed will be fixed by Moore’s law and better software and easy of use will also be fixed by better software at some point. McElreath and Gelman and many others do a great job in getting more people into Bayesian statistics which will likely result in better software in the long run


A book by Deborah Mayo "Statistical Inference as Severe Testing" might fit.


I've read it. Unfortunately, I thought it was terribly written. Also, it's a philosophy book, not a guide for practitioners.


In my opinion books for practitioners is not the place for such discussions. Deborah's book might be poorly written, but if we want to go where the foundations of disagreements are we have to reach philosophy. Bayessian advocates are also often philosophers, like i.e. Jacob Feldman.

From theoretical statisticians Larry Wasserman is more on the frequentist side. See for example his response on Deborah's blog [1]. But he doesn't advocate for it in his books. So yeah, besides Deborah, I am not aware of any other frequentist "propagandist".

[1] https://errorstatistics.com/2013/12/27/deconstructing-larry-...


> Gill's book, Bayesian Methods, is even more dismissive, and even hostile towards Frequentist methods.

I'm skeptical of this because Frequentist (likelihood) methods are a special case of Bayesian methods, with flat/uniform priors for parameters (and the "flatness" of a parameter is dependent on your chosen parameterization anyway; it's not a fixed fact about the model). So it's reasonably easy to figure out when frequentist methods will be effective enough (based on Bayesian principles), and when they won't.


> I consider myself a Frequentist

Grab the pitchforks!


>Whereas I've never seen a frequentist book dismissive of Bayes methods.

I think it more has to do with the long history of anti-Bayesianism championed by Fischer. He was a powerhouse who did a lot to undermine its use. The Theory that Would Not Die went into some of these details.


Yeah, I mean...Fisher was a pretty big jerk. It seems like he got in fights with everyone!


Thank you! That's the kind of comments why I come here.


Hmm. He was comparing the Bayesian models to golems as well, not just frequentist. It was an analogy to all statistical models.

Second in the lectures he said that he uses frequentist techniques all the time and that it's often worth looking at it from each perspective.

I interpreted it as his problem is not with the methods themselves, but with how they are commonly used in science. To me this made a lot of sense.


I think I'm misremembering. I read through some of the introductory material in the second edition of his book and found it less critical than I recalled.

But in some places, it definitely comes across as hostile (e.g. footnote 107).

Also, the sentence "Bayesian probability is a very general approach to probability, and it includes as a special case another important approach, the frequentist approach" is pretty funny. I know the exact technical result he's referring to, but it's clearly wrong to gloss it like that.

He does mention consistency once, page 221, but (unconvincingly) handwaves away concerns about it. (Large N regimes exist that aren't N=infinity...)


Honestly I think it is a little hostile. Not towards frequentist directly, but towards the mis-use of frequentist methods in science. He works in ecology and I think he comes across a bunch of crap all the time. He talks at length about the statistical crisis in science and I can't really blame him.

But I could see how someone might take this as an attack on the methods themselves.


I agree. The golem is presented as an analogue to any statistical inference: powerful but ultimately dumb, in the sense that it won't think for you. That's in my opinion the major theme of the book---you have to think and not rely on algorithms/tools/machines...or golems to do that for you.


> If you think there's something wrong with that, then congratulations, you're a "frequentist".

And more than that - if you use bootstrap, or do cross-validation, you are being a frequentist.


I think the classes opt for starting with a simple mental model students can adopt, which is gradually replaced with a more robust and nuanced mental model.

In this case he wasn't talking just about frequentist methods tho, it's also talking about doing statistics without first doing science (and formulating a causal model).

I would be wary of jumping to conclusions from that introduction alone if you haven't seen the rest of the course or the book.


> There are real examples of Bayesian estimators, for concrete and practical problems such as clustering, that give the wrong estimates for parameters with high probability (even as the sample size grows arbitrarily large).

Could you give some specific examples, and/or references? This is new to me, and I would like to read deeper into it.



Thanks for the detail! I took a look at the first paper, the result was new to me.

In the vogue days of reversible jump MCMC I played with mixture estimation of the number of components under a basic prior (an approach which gives decent results in Figs 1 and 3), but I never used a Dirichlet process prior for this problem. This paper points out that even this simple approach is problematic because it’s only consistent if the true distribution is such a mixture, and in my case it definitely was not.

Anyway, one takeaway, esp. from sec 1.2.1, is that the Dirichlet process prior is not suitable for estimating #components in most cases; it favors small clusters. And indeed, the concept of estimating #components is tricky to begin with, as noted above.

Just because you can compute the posterior, doesn’t mean it’s saying what you think it is about the underlying true distribution!


Agreed. There are situations where frequentist guarantees is what you want rather than optimal estimates of parameters.

Frequentists consider the model parameters fixed and allow the data to vary. Bayesian consider the data fixed and allow parameters to vary.

For example many software engineering, drug testing, and large scale pipelines want frequentist guarantees, because your system will have varying input data and you want theoretical bounds on what inferences you can make.


Suppose you give me a particle physics problem, and I produce a quantum mechanics solution that, upon further examination, is wrong.

If you think there's something wrong with that, then congratulations, you're a "quantum negationist," or at least believe there's some important insight about physics that's not captured by doing everything in a rote quantum way. (The important insight being that GIGO.)


The issue isn't that Bayesian methods used incorrectly can have bad frequentist properties. It's that, according to many flavors of Bayesianism, having bad frequentist properties isn't a valid line of critique.

You may not believe in the particular stances I'm calling out, but if so, we don't disagree.


Maybe we don't disagree. You wrote:

> a Bayesian solution that, upon further examination with simulations, gives the wrong answer 90% of time on identical problems

If "with simulations" means either

"with simulations using a probability distribution different from the prior used in the Bayesian analysis"

or

"with simulations using a model different from the one used in the Bayesian analysis"

are we expected to conclude that there is something wrong with the Bayesian way?


I mean "with simulations using a probability distribution [for the true parameter] different from the prior used in the Bayesian analysis." (The issue of model error is a separate question.)

Yes, in this case would should conclude there is something wrong with the Bayesian way. If you hand me a statistical method to e.g. estimate some parameter that frequently returns answers that are far from the truth, that is a problem. One cannot assume the prior exactly describes reality (or there would be no point in doing inference, because the prior already gives you the truth).


At least a Bayesian posterior tries to describe reality. In a way which is consistent with the prior and the data. But again, GIGO. Including prior information into the inferential process will be beneficial if it's correct but detrimental if it isn't. Hardly surprising.

On the other hand, Frequentist methods do not claim anything concrete about reality. Only about long-run frequencies in hypothetical replications.

You may think that makes them better, it's your choice.


Sure, I agree bad priors will give inaccurate inferences. My point is simply that to make a statement like, "an inaccurate prior generates many inaccurate inferences, and therefore it is garbage," one has to adopt a frequentist criterion for the quality of an estimator (like "gets good results most of the time").


Uh, dude. If you read the book, you'd see the Golem of Prague isn't a parable about frequentist models specifically, it's about all models, period. He calls his Bayesian models golems all the time.


I'm following the course using Julia/Turing.jl and it's simply awesome.

Richard McElreath clearly has a talent for teaching, and both the lectures and his book also give a very insightful discussion on the philosophy of science and common pitfalls of common statistical methods.

Last semester I took my first classical Probability and Statistics course at my uni, and this course has been positively refreshing in comparison.


There's so much great content on the internet that's not easily discoverable. This is one.

If you have discovered a great resource for intuitively learning about fat-tailed-distributions related mathematics, please share. I have fallen into the Taleb rabbit-hole and would really like to gain an *intuitive* understanding of what he's talking about when he mentions topics such as gamma distributions, lognormal distributions, loglikelihood.


Does the course require prior statistical knowledge? Couldn't quite figure that out. It looks interesting, and there are python versions of the examples as well..


I got through it without any priors :).


Thanks! :)


I think you'll be fine, actually. I've read through the first edition, and it's kept pretty intuitive.


You'll probably need some basic notions of statistical distributions and data analysis; I recommend reading the first chapter of the book or the first lecture and seeing whether you're missing anything important.


Past related threads:

Statistical Rethinking [video] - https://news.ycombinator.com/item?id=29780550 - Jan 2022 (10 comments)

Statistical Rethinking: A Bayesian Course Using R and Stan - https://news.ycombinator.com/item?id=20102950 - June 2019 (14 comments)


I wanna buy the hardcopy textbook but still have access to an epub version - do any retailers allow this? Linked publisher site doesn't seem to.


From slide 5 of lecture 2: Explanations with more ways to produce the data are more plausible. Sorry, I disgress, the bias/variance trade off implies that you should ponder the complexity of the model to avoid overfitting.

I don't know much about Bayesian data analysis, but I think the first idea to be introduced is not the concept of likelihood, Bayes theorem is the foundation and I think it should be introduced before likelihood.

On slide 31, it is not clear if you state a causal model before making observations, you should indicate a prior before making the observation so that you can then see how it updates, if done otherwise you are using the observations to elaborate the model and then the analysis is crookely designed.

On slide 35, the concept of relative plausibility seems to me a little sloopy to develop some theory.

On slide 48: In the context of confidence intervals: "95% is obvious superstition". I think 95% interval is using a standard way to communicate results that gives a useful information and avoid the question of why you computed, for example, the 93.728 confidence interval.


> The unfortunate truth about data is that nothing much can be done with it

This is a fairly strong statement that goes against a lot of other work in data science and information visualization (John Tukey, Edward Tufte, Jacques Bertin, Hadley Wickham, ...). For example, see [0] and [1].

[0] https://en.wikipedia.org/wiki/Exploratory_data_analysis [1] https://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSc...


You are leaving out a very important part of the sentence - "until we say what caused it". If you listen to the first few lectures you'll understand exactly what he intends with this sentence.


This cleaves very close to an aphorism I stole mercilessly many years ago: charts are for asking questions, not answering them.

“What caused it” is the answer, and a graph can reveal just as easily as it can conceal the cause. Lies, damn lies, and statistics.


To your point...Data has context. It has a source. It likely has flaws and/or (so to speak) bias. To get anything of it It's essential to understand what went into it. Else you'll deceive yourself or your stakeholders and bad decisions will be made.


Thanks, though I actually meant to copy the entire thing (my fault).

My point was that a lot of people working in data analysis would (strongly) disagree with the idea that we need to model the data in order to do anything with it. Visualisations and tabulations can tell a lot without any mathematical formalism.


This is taking the quote completely out of context, it's not the data itself that conveys useful information, it's the data combined with a causal model!


> The unfortunate truth about data is that nothing much can be done with it until we say what caused it

Nonparametric methods say 'hi'.


This book is amazing. In my opinion the best book to get started with advanced statistics (all statistics, not just Bayesian statistics).


One problem though. If you start with McElreath, you will likely find all books which require you to wrangle your brain into sided p-values and confidence intervals stupid


Not only is this a great book, but something that hasn’t been mentioned is that it’s also fantastic for absolute statistical beginners. My reasonably intelligent 12 year old nephew had no issues understanding much of it.


Will there be other course schedules? These dates don’t work for me unfortunately


Unless you're a student there you won't be able to attend the classes and get a grade anyway. You just watch the Youtube video's, he makes new ones each year.


Also, having self-taught from the previous edition of his excellent book, I can say that it is very useful even if you aren't able to attend his class.


Oh I misunderstood that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: