Causal Inference

Causal inference studies how to identify causal relationships from data, rather than mere correlations. It is an increasingly important area in data science and AI. While "correlation does not imply causation" is a fundamental principle in statistics, extracting causal information from observational data has long lacked a systematic methodology.

For frontier research on causal learning in AI, see 因果学习.

Why We Need Causal Inference

Correlation vs. Causation

A classic example: ice cream sales and drowning rates are highly correlated, but eating ice cream does not cause drowning. The real cause is "hot weather," which drives both.

Hot weather → Ice cream sales↑
Hot weather → Drowning rate↑
Ice cream sales ↔ Drowning rate (correlated but not causal)

The Core Problem of Causal Inference

The Counterfactual Question: "What would have happened if X had not been done?" This is the most fundamental question in causal inference. For example:

What would have happened to the patient if they had not taken the medication?
What would sales have been if the advertisement had not been placed?

We can never simultaneously observe the outcomes of both "did X" and "did not do X" — this is known as the Fundamental Problem of Causal Inference.

Theoretical Frameworks for Causal Inference

Rubin Causal Model (Potential Outcomes Framework)

Proposed by Donald Rubin, also known as the potential outcomes framework. The core idea is to define two potential outcomes for each individual:

\(Y_i(1)\): the outcome for individual \(i\) under treatment
\(Y_i(0)\): the outcome for individual \(i\) without treatment

Individual Treatment Effect (ITE):

\[ \tau_i = Y_i(1) - Y_i(0) \]

However, we can only observe one of these (corresponding to whether the individual actually received treatment); the other is counterfactual and unobservable.

Average Treatment Effect (ATE):

\[ \text{ATE} = E[Y(1) - Y(0)] = E[Y(1)] - E[Y(0)] \]

Pearl Causal Model (Structural Causal Model)

The Structural Causal Model (SCM), proposed by Judea Pearl, uses directed acyclic graphs (DAGs) to represent causal relationships among variables.

Core Concepts:

Causal Graph (DAG): Nodes represent variables; directed edges represent causal relationships
do-operator: \(P(Y | do(X=x))\) denotes "the distribution of Y after actively setting X to x," as distinct from passively observing \(P(Y | X=x)\)
Back-door Criterion: Determines which variables must be controlled to identify causal effects
Front-door Criterion: An alternative approach when unobserved confounders are present

Pearl's Causal Hierarchy:

Level	Question Type	Typical Question	Data Requirement
Association	Observation	"How does Y behave when X is observed?"	Observational data
Intervention	Action	"What happens to Y if we do X?"	Experiments / Causal models
Counterfactual	Imagination	"What would Y have been if X had been done?"	Complete causal model

Common Causal Inference Methods

Randomized Controlled Trial (RCT)

The Randomized Controlled Trial (RCT) is the "gold standard" of causal inference. It eliminates the influence of confounders through random assignment:

Treatment group: Receives the intervention (e.g., medication, advertisement exposure)
Control group: Does not receive the intervention
Because of random assignment, the two groups are statistically identical on all other characteristics

A/B testing in the internet industry is essentially an RCT.

Propensity Score Matching (PSM)

When randomized experiments are infeasible, propensity score matching uses the propensity score (the probability of receiving treatment) to match "similar" individuals between the treatment and control groups:

\[ e(X) = P(T = 1 | X) \]

Each individual in the treatment group is paired with the individual in the control group whose propensity score is closest, and then the outcome differences are compared.

Instrumental Variables (IV)

When unobserved confounders exist, an "instrumental variable" \(Z\) is sought that satisfies:

\(Z\) is correlated with the treatment variable \(X\) (relevance condition)
\(Z\) affects the outcome \(Y\) only through \(X\) (exclusion restriction)
\(Z\) is uncorrelated with the confounders

A classic example: using "distance to a college" as an instrumental variable to estimate the causal effect of education on income.

Difference-in-Differences (DiD)

This method compares the change in outcomes between treatment and control groups before and after an intervention:

\[ \text{DiD} = (Y_{\text{treatment,post}} - Y_{\text{treatment,pre}}) - (Y_{\text{control,post}} - Y_{\text{control,pre}}) \]

Key Assumption (Parallel Trends Assumption): In the absence of the intervention, the two groups would have followed the same trend over time.

Regression Discontinuity (RD)

When treatment assignment is determined by a threshold on a continuous variable (e.g., a scholarship awarded only to students scoring above 60 on an exam), the causal effect can be estimated by comparing individuals just above and just below the threshold.