Hacker News new | past | comments | ask | show | jobs | submit login

As a non-statistician with a lot of interest in statistics, I found the Book of Why frustrating. Modeling causation seems like an undeniably important step towards understanding the world better. But the biggest question I had was: how can you actually verify that your causal model is true? This is not clearly explained, or wasn't before I gave up on the book. Models are only useful if we can have some confidence that they correspond to reality.

I was especially interested in the answer to this question, because my only exposure to the language of "causal chains" has been on Twitter, where they seemed to serve a distinctly ideological purpose. One (non-mathematical) person says "I think X is caused by Y", and then a statistician chimes in and says "you're missing other parts of the causal chain, the real causes are Z and Q." Where of course, Z and Q are things that one political perspective prefers to blame, and Y are things blamed by the other side.

For example: https://twitter.com/gztstatistics/status/1000914269188296709. Here's a great comment from today about the difficulty of establishing causality in practice: https://news.ycombinator.com/item?id=18886275

I want to know how causal chains can be actually proven or falsified, to be convinced that this isn't just highbrow ideological woo.




> how can you actually verify that your causal model is true?

This is addressed in the introduction. See box 4 in the flow-chart (“testable implications”).

“The listening pattern prescribed by the paths of the causal model usually results in observable patterns or dependencies in the data. [...] If the data contradict this implication, then we need to revise our model.”


The part you removed with ellipses undermines this point:

"These patterns are called "testable implications" because they can be used for testing the model. These are statements like "There is no path connecting D and L," which translates to a statistical statement, "D and L are independent," that is, finding D does not change the likelihood of L."

This says nothing about testing causality, or the direction of causality. If two things are uncorrelated, then there is probably not a causal relationship between them, granted. But this is not a very novel or useful observation.

However if D and L are correlated, the test above says nothing about how to validate whether D caused L, L caused D, both were caused by a third thing (or set of things), or the correlation is just coincidence.

For a book whose entire thesis is "causality is rigorous," I expect a much more rigorous treatment of how to validate causality using more than mere correlation.


From your previous comment I understand you didn’t read the whole book so I don’t know if you got to chapter 4, section ”the skillful interrogation of nature: why RCTs work.” In short, you can use interventions (i.e. a properly designed experiment) to verify that the “cause” does indeed produce the “effect”.


RCT's indeed seem like a good way of establishing causality. But RCT's are well-established, so what is "The New Science of Cause And Effect" (as claimed by Pearl and MacKenzie) bringing to the table?

Intuitively I might guess that RCT's are the only way of rigorously establishing cause and effect. I would have been very interested if the book had confirmed or denied this intuitive conjecture of mine.

Another comment in this thread claims that you can infer causality without intervention: https://news.ycombinator.com/item?id=18884104 Perhaps this is true?

This is the kind of discussion that I wish the book had focused on. I want to probe at the line between belief and established fact, and understand what we can rigorously say given the evidence we have. I have a strong aversion to reading extended flowery descriptions of big ideas if the speaker has not rigorously shown that the model maps to the real world. Otherwise it's like listening to just-so stories.


This is the kind of discussion the book focuses on, you should try to read it. RCTs are not the only way to answer these questions and observational data can be used in some cases (but note that the validity of the inference is conditional on the model being correctly specified).


Maybe I should put some more effort into the book. But statements such as this make me extremely wary:

> but note that the validity of the inference is conditional on the model being correctly specified

This strikes me as begging the question. The model is exactly what I don't trust unless it is rigorously justified, so anything conditional on the model being correctly specified I also don't trust.

It all feels like a house of cards.


The sense I've gotten so far is that given a causal model with non-controversial causal assumptions, you can do algebra in some cases to come up with conclusions that otherwise (in the absence of do-calculus) would have required experimentation. And in other cases, you're still stuck.


What kind of answer do you expect?

You can get no causality from data alone. You always need additional assumptions.

If you can do an intervention and manipulate a variable as you wish, the assumption of its independence is warranted. A correlation with the outcome indicates a causal path (or you're being [un]lucky). Even in that case a more complex causal model is useful to get better estimates, distinguish direct and mediated effects, etc.

If you have observational data only there is not much that can be done without a causal model. Given a model, the causal effect of one variable on another may be estimated in some cases. But if your model is wrong you may conclude that there is an effect when none exists or deny the existence of a real effect.


I think I expected the book to better live up to its billing. Here is an excerpt from the in-sleeve summary:

"Beginning with simple scenarios, like whether it was rain or a lawn sprinkler that got a sidewalk wet, the authors show how causal thinking solves some of today's hardest problems, including: whether a drug cured an illness; whether an employer has discriminated against some applicants; and whether we can blame a heat wave on global warming."

In all of these "hard problems", it is the model itself that is the most contentious piece, and the most ideological. Some people have a mental model where CO2 produced by humans is causing climate change (which I agree with), and others believe that the changes can be explained by natural fluctuation. These beliefs are undoubtedly influenced by a person's biases. It's not very useful to say "once you have accepted a causal model, you can draw lots of useful inferences." Because the main point of contention is over what is causing what.

I found this statement of yours honestly more useful than anything I read in the book so far: "You can get no causality from data alone. You always need additional assumptions." The downside of this is that different people can make different assumptions, and so this implies that this kind of causal analysis can't mediate disagreements between different groups of people who see the world very differently.


> It's not very useful to say "once you have accepted a causal model, you can draw lots of useful inferences." Because the main point of contention is over what is causing what.

Well, accepting a causal model and drawing lots of useful inferences seems better than drawing lots of misleading inferences because no attention is paid to the model (or being unable to make any inference because it's not obvious how the data can be used).

Even if people may not agree on what is the right model at least this approach makes the model explicit. And in many cases there is no reason for disagreement, but if there is no careful analysis the wrong model may be used by mistake. For example, chapter 8 has an extense discussion of the potential outcomes approach in the context of salary as a function of education and experience.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: