This seems well done and well-researched. I appreciate the diagrams and art and references to the likes of Rubin and Pearl.
A few headings down:
> Causal inference provides us with tools that allow us to answer the question of why something happens.
This is not necessarily so.
Randomized controlled trials suffer from black box problems the same as models. This is clear enough when thinking about something like a tutoring program. Suppose I randomly assign a bunch of schools to learn algebra with curriculum X and the rest to continue business as usual.
Program X does better, so we infer the program has a causal impact on algebra learning.
However, we still do not know for sure why program X does better, only that it does better. This is important to inform how to take what works about the program and apply it to other circumstances, adapt it, and so on.
I suppose compared to a big data set, we have a better "why" answer to the variation between the outcome and the treatment. The difference being that we actually know the cause of the observed effect with a trial, whereas with correlational analyses we're not so sure. But that's a very deflationary view of "why." I don't mean to be too cynical here; we can always push "real" causality one more level down. For example, suppose we figure out the secret sauce to better algebra teaching relates to a specifical pedagogical practice. We can then say "but why does that practice work? what does it do in the brain?" So I don't want be too reductive.
But even gold standard RCTs don't always give us a "why?" answer. I remember attending a conference about a decade ago among causal inference-devoted social researchers specifically about "the black box" of causal inference as it pertains to RCTs.
This is why experimentation tries to focus on a singular change at a time -- in short, you're right, and we can focus on iteration to iteration to tease out fully causal factors.
But you know that, based on randomized assignment (or at least representative assignment school by school) an impact that A versus B determines.
So you do know E(Y | do(X), Z) to a degree, at least partially.
It’s like the old “5 Whys” technique for getting at the root of a problem. But as you suggest, there may not be a root: we can always push “real” causality one level down. Whys all the way down.
But that doesn’t mean we can’t answer the question of why something happens. A cause doesn’t cease to be cause just because it also has a cause.
It tells us "why something happens" in that we observe differences in how good people are at algebra, and the "why" is that some people took this class that causally improved their scores.
What specifically about the class improved the scores? It could be something intrinsic like a method that helps students remember things better. Or it could just be because the class was new the teachers were more involved versus the ones who are teaching the same old curriculum. These causes suggest different changes. One suggests everyone should teach X. The other suggests the solution needs to get teachers more involved.
Nobody said there couldn’t be follow-up questions after an RCT. But i disagree that the presence of additional whys disproves the goal of causal inference as proving a “why”
Right, what causal inference really gives you is a tool for specifying assumptions about a model of a data generating process and then estimating a parameter representing the effect size (and the uncertainty surrounding it), provided that your specified assumptions are approximately correct and your measurements are sufficiently accurate.
It never directly answers a "why" or "how" type question. You provide the why/how and then use data to estimate "by how much?"
Why is a philosophical question. We can only point to correlations and if time is relevant we can assume that future data can not change past data and only then conclude that causality has happened
I'm a fan of the Do Why library out of Microsoft. Even as a novice in the field of causal modeling it can get you up and running by estimating the causal graph based on your data. https://github.com/py-why/dowhy
I like the way that this book is structured. There are no equations, which makes it approachable, and some of these concepts are better explained with language than equations, at least in the beginning.
A similar book is What If by Hernán and Robins. By the end they focus on time-varying treatments, but the first half introduces a lot of the same concepts as this book. What If ia also available for free - https://www.hsph.harvard.edu/wp-content/uploads/sites/1268/2...
Causality has been the objective of econometrics for many years. I always wonder why people in machine learning overlook it's contributions. They almost never discuss it (even to criticize it) and prefer reinventing the wheel.
I think the fact that machine learning so hyped is a problem; there's the Dunning-Kruger effect. A lot of people are entering the field, who don't understand mathematics behind the algorithms, but they can make Keras models that run and train and seem to produce some promising results. They might not know about normalizing data, how to prepare data, how to deal with missing data, etc...
In Academia and in "more reputable" projects, domain knowledge is vital to the success of an ML project. This doesn't seem to be the case for startups, looking to make some quick cash. Lots of online articles tout the lack of domain knowledge needed to create models... just googling "machine learning without domain knowledge" brings up a ton of articles saying machine learning is "easy" even without expert knowledge.
I have a hunch that from this path of inquiry and others, we are going to identify and classify modes and methods of reasoning particular to ML systems. I think it seems overly-simplistic to assume that a thinking machine would need to think exactly like a human.
I don't think we can trust AI until it bears a resemblance to "Thinking Fast and Slow", and it won't soar until it can do better than humans.
We react to situations and then rationalize why we reacted that way at leisure, and those often turn into excuses and not reasons. It's a story about why you got angry. Why you got angry was something only slightly related to your stated reason.
If an AI can narrow that gap then they will have exhibited the sort of capacity for reason that we expect from them but have yet to even glimpse.
I do not think that resemblance to "Thinking Fast and Slow" is a thing that will necessary happen. Human mind is "designed" that way to overcome some limitations of his substrate, of a brain made of very slow neurons. AI on the other hand relies on "neurons" that are several orders of magnitude faster. The situations when such an AI could benefit from snap decisions, while having an ability to decide by slow thoughtful process, still could pop, but it will not be like with a human mind. Chess engines have something like that: they can propose a move by a "snap" decision without trying to search all possible developments of a game from a current position. In real games they limit the depth of analysis by time they have. It resembles to some extent "System 1" and "System 2", but there is a gradual transition between them, from depth 0 analysis to depth of infinity.
With AI designed for real tasks, not artificial games, it will be probably the same story. They will do a search of depth N in any case, but in situations of time pressure they will keep N very low.
Historically attempts to replicate high level understanding of a human mind to build an AI didn't work. Simulations of low level understanding, like neurons and suchlike did work. We can draw parallels between AI developments and human mind traits as we see them, but they are very shaky constructs. At least as shaky as all these psychology "high level" theories, which try to refine naive human understanding how human mind works.
> We react to situations and then rationalize why we reacted that way at leisure, and those often turn into excuses and not reasons.
I do believe it is not because of limitations of a human mind, but due to training peculiarities. I believe that reasoning itself was "designed" with a goal of communicating inner states of mind to others, by getting an "explanation" that fits the current social situation and helps to reach current social goals. I agree with the idea that politics was a driver of evolution of human intelligence. And it probably still the driver. Humans learned how to apply these new abilities to other kinds of problems, like engineering ones, but it was when their intelligence evolved a lot under a pressure of natural selection driven by politics.
Probably people can do better than that, and could understand themselves a way better, could explain their anger by real reasons, but we still learning to seek excuses from the very young age. We are punished when we have "wrong reasons" for our behaviour and reinforced for "good reasons". The idea of such an education is to eliminate wrong reasons and remove their power to influence a person's behaviour, but the unintended consequence is person's preference for excuses over reasons.
In other words, it is a cultural thing I believe, not some genetic limitations.
what if the answer is a high dimensional vector as the problem space is also in high dimensional space? Our brains are only adept at 3D space+time. all our metaphors are time/space. trying to compress answers down so we might understand them is there real problem.
I'm sure that understanding high-dimensional spaces is the easy part, and getting insights from the AI's answers as high-dimensional vectors is the hard (or impossible) part. The math "just works like that"... but this is unusable for invention.
Also worthless for evolution of solutions. If I can't evaluate the solution I can't manually improve it. It's just more magical thinking hiding behind a veneer of rationalism.
A few headings down:
> Causal inference provides us with tools that allow us to answer the question of why something happens.
This is not necessarily so.
Randomized controlled trials suffer from black box problems the same as models. This is clear enough when thinking about something like a tutoring program. Suppose I randomly assign a bunch of schools to learn algebra with curriculum X and the rest to continue business as usual.
Program X does better, so we infer the program has a causal impact on algebra learning.
However, we still do not know for sure why program X does better, only that it does better. This is important to inform how to take what works about the program and apply it to other circumstances, adapt it, and so on.
I suppose compared to a big data set, we have a better "why" answer to the variation between the outcome and the treatment. The difference being that we actually know the cause of the observed effect with a trial, whereas with correlational analyses we're not so sure. But that's a very deflationary view of "why." I don't mean to be too cynical here; we can always push "real" causality one more level down. For example, suppose we figure out the secret sauce to better algebra teaching relates to a specifical pedagogical practice. We can then say "but why does that practice work? what does it do in the brain?" So I don't want be too reductive.
But even gold standard RCTs don't always give us a "why?" answer. I remember attending a conference about a decade ago among causal inference-devoted social researchers specifically about "the black box" of causal inference as it pertains to RCTs.