Guy Freeman, 9th September 2015
Pearl claims:
His claims are controversial, to say the least
Obviously we are able to infer causality, because we do it all the time, every day, naturally.
What are the "clues" in what we observe that lead us to perceive causal relationships?
Temporal precedence is one such obvious "clue": A can only cause B if A happens before B.
However, the reverse is not necessarily true: A can happen before B but A doesn't cause B. Can you think of any examples?
What if A and C are independent of each other, but B is dependent on both? The obvious network that fits these facts is the following:
It's natural and tempting to say that A and C cause B. So dependency patterns could strongly indicate causal patterns?
We assume that
nature possesses stable causal mechanisms that [...] are deterministic functional relationships between variables
We are trying to identify these mechanisms. We will only consider causal models that reflect this assumption, and additionally that each variable could be perturbed by independent random disturbances.
For example, here Z is independent of A, B, C and D and all their perturbances (represented by u) given the values of X and Y.
Unfortunately Nature only allows the observation of some of the variables. Can we recover the causal model from the probability distribution over the observable variables?
If we don't make any assumptions, there are an infinite number of possible causal models that would fit an observed distribution.
However, we can make the assumption named after a 14th century philosopher: Occam's Razor. This will help us progress.
If we restrict our set of possible causal models to those that satisfy Occam's Razor (which Pearl calls minimal models), then
C has a causal influence on E if a directed path from C to E exists in every minimal model consistent with the data
For example, observations lead us to conclude that for a, b, c, and d:
An example of such a situation would be:
It turns out that all minimal models consistent with this independence pattern have an arrow from c to d.
To help in our search for a causal model, minimality is not enough; we also need stability. The example Pearl gives helps us to understand this concept:
Suppose we see a picture of a chair. Which of these theories do you prefer?
We prefer Theory 1, of course! Why?
Minimality: T1 describes fewer possibilities, but still explains the picture
Stability: It's very unlikely one chair would exactly hide another; a slight change would reveal the other chair.
With the minimality and stability assumptions, every distribution has a unique minimal causal structure (up to d-separation equivalence), with the important caveat that there are no hidden (unmeasured) variables.
Pearl provides the IC algorithm to recreate the pattern (set) of DAGs that are consistent with the distribution implied by the data.
Of course, Nature usually likes to hide some important variables. How can we find the causal model then?
We can reduce the unbounded space of possibilities now by only considering DAGs where unobservable variables are parentless common causes of two observable variables (thanks to Theorem 2.6.2). These variables are replaced by Pearl with bidirectional arrows (i.e. a⟷b implies there is an unobservable common cause of a and b).
Now the IC algorithm can be used again with some modification to again find the set of minimal DAGs that fit the data.
The class of DAGs that are consistent with the data are represented by graphs that can have many different edges that represent different relationships:
We can now try to identify causes from these graphs. Notice that in all cases a third variable is needed, a kind of Natural control.
X is a potential cause of Y if X and Y are dependent, and there exists a variable Z such that conditional on any other variables, X and Z are independent but Z and Y are still dependent.
Here, B is a potential cause of D because C is independent of B but dependent on D given A.
A variable X is a genuine cause of Y if there exists another variable Z so that either
Here, D is a genuine cause of E because (letting B be the Z variable here)
We haven't used temporal (time) information at all so far, so could we theoretically find a causal relation from our data where the cause occurs after the effect??
In theory, yes, but apparently we naturally describe variables in such a way that the causes we infer will generally be from the past to the future.
Assuming minimality and stability, genuine causal relationships can be inferred from statistical dependencies.
How safe are these inferences? As safe as inferring three dimensions from two-dimensional perceptions or images...?
One big assumption was that unmeasured noise terms are mutually independent, which is not very realistic... But it's up to use to define our models so that this becomes a reasonable assumption.
Although we can infer when arrows must face a particular direction given all the assumptions, it's not clear that this is a cause, or indeed what a cause actually is.
Another assumption brushed aside is that the probability distribution over the variables is known, but this must also be estimated from the data, with potential pitfalls.
I highly recommend reading "Are There Algorithms That Discover Causal Structure?" by Freedman and Humphreys, 1999, which is also much easier to understand than Pearl's book!