"Can we infer causation from observation?"

Guy Freeman, 9th September 2015

Discussion of Pearl's Causality, Chapter 2

Pearl claims:

Causal relationships can be inferred even from non-temporal data
...as long as certain assumptions about the data generating process are made

His claims are controversial, to say the least

Overview

Obviously we are able to infer causality, because we do it all the time, every day, naturally.

What are the "clues" in what we observe that lead us to perceive causal relationships?

Introduction

Temporal precedence is one such obvious "clue": A can only cause B if A happens before B.

However, the reverse is not necessarily true: A can happen before B but A doesn't cause B. Can you think of any examples?

Introduction

Clues for causality

Introduction

Clues for causality

What if A and C are independent of each other, but B is dependent on both? The obvious network that fits these facts is the following:

It's natural and tempting to say that A and C cause B. So dependency patterns could strongly indicate causal patterns?

The Causal Modelling Framework

We assume that

nature possesses stable causal mechanisms that [...] are deterministic functional relationships between variables

We are trying to identify these mechanisms. We will only consider causal models that reflect this assumption, and additionally that each variable could be perturbed by independent random disturbances.

The Causal Modelling Framework

For example, here Z is independent of A, B, C and D and all their perturbances (represented by u) given the values of X and Y.

The Causal Modelling Framework

Unfortunately Nature only allows the observation of some of the variables. Can we recover the causal model from the probability distribution over the observable variables?

If we don't make any assumptions, there are an infinite number of possible causal models that would fit an observed distribution.

However, we can make the assumption named after a 14th century philosopher: Occam's Razor. This will help us progress.

Minimal models

If we restrict our set of possible causal models to those that satisfy Occam's Razor (which Pearl calls minimal models), then

C has a causal influence on E if a directed path from C to E exists in every minimal model consistent with the data

Minimal models - Example

For example, observations lead us to conclude that for a, b, c, and d:

a and b are independent
d and {a,b} are independent given c

An example of such a situation would be:

a could be "having a cold"
b could be "having hay fever"
c could be "needing to sneeze"
d could be "needing to wipe one's nose"

It turns out that all minimal models consistent with this independence pattern have an arrow from c to d.

Stable models

To help in our search for a causal model, minimality is not enough; we also need stability. The example Pearl gives helps us to understand this concept:

Suppose we see a picture of a chair. Which of these theories do you prefer?

The object in the picture is a chair
The object in the picture is a chair, or two chairs positioned such that one chair hides the other

We prefer Theory 1, of course! Why?

Minimality: T1 describes fewer possibilities, but still explains the picture

Stability: It's very unlikely one chair would exactly hide another; a slight change would reveal the other chair.

Recovering DAGs

With the minimality and stability assumptions, every distribution has a unique minimal causal structure (up to d-separation equivalence), with the important caveat that there are no hidden (unmeasured) variables.

Pearl provides the IC algorithm to recreate the pattern (set) of DAGs that are consistent with the distribution implied by the data.

Recovering latent structures

Of course, Nature usually likes to hide some important variables. How can we find the causal model then?

We can reduce the unbounded space of possibilities now by only considering DAGs where unobservable variables are parentless common causes of two observable variables (thanks to Theorem 2.6.2). These variables are replaced by Pearl with bidirectional arrows (i.e. a⟷b implies there is an unobservable common cause of a and b).

Now the IC algorithm can be used again with some modification to again find the set of minimal DAGs that fit the data.

Local criteria for causal relations

The class of DAGs that are consistent with the data are represented by graphs that can have many different edges that represent different relationships:

→ with * above: genuine causation
→ only: potential causation (because of latent confounder)
⟷: spurious association (latent confounder)
―: completely undetermined relationship

Local criteria for causal relations

We can now try to identify causes from these graphs. Notice that in all cases a third variable is needed, a kind of Natural control.

X is a potential cause of Y if X and Y are dependent, and there exists a variable Z such that conditional on any other variables, X and Z are independent but Z and Y are still dependent.

Local criteria for causal relations - Example

Here, B is a potential cause of D because C is independent of B but dependent on D given A.

Local criteria for causal relations

A variable X is a genuine cause of Y if there exists another variable Z so that either

X and Y are dependent, Z is a potential cause of X, and there is a set of variables S such that Z and Y are dependent given S but Z and Y are independent given S and X.... OR
You can form a chain of variables using the conditions above.

Local criteria for causal relations - Example

Here, D is a genuine cause of E because (letting B be the Z variable here)

B is a potential cause of D (remember?)
B and E are dependent
but B and E are independent conditional on D

Non-temporal causation

We haven't used temporal (time) information at all so far, so could we theoretically find a causal relation from our data where the cause occurs after the effect??

In theory, yes, but apparently we naturally describe variables in such a way that the causes we infer will generally be from the past to the future.

Pearl's Conclusions

Assuming minimality and stability, genuine causal relationships can be inferred from statistical dependencies.

How safe are these inferences? As safe as inferring three dimensions from two-dimensional perceptions or images...?

One big assumption was that unmeasured noise terms are mutually independent, which is not very realistic... But it's up to use to define our models so that this becomes a reasonable assumption.

My Conclusions

Although we can infer when arrows must face a particular direction given all the assumptions, it's not clear that this is a cause, or indeed what a cause actually is.

Another assumption brushed aside is that the probability distribution over the variables is known, but this must also be estimated from the data, with potential pitfalls.

I highly recommend reading "Are There Algorithms That Discover Causal Structure?" by Freedman and Humphreys, 1999, which is also much easier to understand than Pearl's book!