Patrick Power PRO
Economics PhD @ Boston University
Applied Econometrics is concerned with the interpretation of statistical results in various contexts
Conceptualize the Math
Simulate it on a computer
Read papers that apply the technique(s)
Listing just a couple:
At a High Level, Causal Inference doesn't work as well as we might hope
I don't think this is emphasized as much as it should be in introductory econometric classes (which makes sense partly, why demotive the class?!)
Example
Using only experimental variation, we cannot determine whether the use of experimental vouchers has a high long run impact (however measured) than the use of standard vouchers
Research
Credible
Important
Choice Set
What This Means:
(1) We'll have to make "Approximations"
(2) The data alone doesn't provide a unique answer to our question
In practice, we often cannot provide guarantees for the performance of our approach under plausible assumptions
Example
Voucher or No Voucher
Annual Earnings
***Optional Material - For personal interest***
Motivation
Probability Spaces
A key concept that we want to be able to conceptualize is "a subset of a set"
Set
This is a subset
Here's another subset
Probability Spaces
In our work this semester on housing policy, the underlying set could be a collection of Eviction Complaints filed in Housing Court against tenants
Eviction Complaints
One subset of interest might be all of the evictions filed against HCV tenants
Another subset of interest might be evictions filed against tenants who failed to pay landlords in their first month
Probability Spaces
*We cannot always define probability on all subsets.
Probability Spaces
Set
We can also assign probability to this subset
And this subset as well
The probability that we assign to the entire set
Probability Spaces
So now we have the fundaments
Probability Spaces
The only thing we know so far is that Probability spaces allow us to represent the probability of subsets of a set (by definition)
One thing we can ask, when someone has introduced something new is, what can we do with it?
To do something new, we can define a random variable on the probability space
Probability Spaces
One random variable of interest will be an estimator
In this context, the underlying set (also known as the sample space) will be the set of possible samples that could be realized
A key idea is that a random variable "pulls" the probability measure forward onto the space we care about
All Eviction Cases
Housing Stability
Legal Aid
All Possible Samples
Housing Stability
Legal Aid
All Eviction Cases
Housing Stability
Legal Aid
Conditional Expectation (Event)
Conditioning on an Event (With Independence)
With respect to this conditional distribution
The event of interest is the set of outcomes which are mapped into some element of the sigma algebra of X and the treatment value is 1
We will assume the following
Conditioning on an Event (With Independence)
This is where the assumption kicks in
Conditioning on a Random Variable (With Independence)
Let's first discuss what conditional independence with respect to a random variable is Not!
Does not imply the following:
Why? Because
Which would imply unconditional indepdence!
Aim
We would like to understand the conditions under which
Left Side
Left Side
Right Side
Then Left Side equals Right Side
By Definition
Left Side
Idea
Approximate the Average Treatment Effect by comparing the average in the treated group to that in the control group
Average Treatment Effect
Approximation
Average outcome over those individuals in the treated group
Thought Experiment
Treated
Control
Population
Difference-in-Means
Summary
Difference-in-Means
Average Treatment on the Treated
Selection Bias
Example
Exercise: Develop a story for positive/negative selection bias in this context
Randomized Control Trial
ATE
Idea # 2
Instead of taking the difference between treated and control groups, let's average local differences between treated and control groups
Summary (thus far)
How much of this gap is selection bias?
Tsembris (2000)
(1) Take difference-in-means within each group
(2) Take the average differences
Idea # 2
Example
Let's assume we observe a categorical variable
Housing Court
Under what conditions is this a good idea?
Key Assumption
"Within each bin, treatment is as good as randomly assigned"
Within Bins
We are assuming that Treatment is randomly assigned within each bin
Local Randomized Control Trials
Continued...
By the Law of Iterated Expectations
Selection on Observables Assumption
Locally in the feature space, treatment is as good as randomly assigned
Interpretation
Implication
The Conditional Expectation Function has a Causal interpretation
Example
Exercise: Develop a story for positive/negative selection bias in this context
Conditional on the textual document, treatment is as good as randomly assigned
The Conditional Expectation Function has a Causal interpretation
A worthwhile question to reflect on is why do Economists (with several years of graduate training) use linear models for causal inference?
Local Variation of Treatment
Variation in Density of Treatment
Curse of Dimensionality
We can perfectly predict the treatment variable in the finite sample
Reflection
*This doesn't apply to every situation like Cluster Randomized Control Trials
In Practice
Big Picture
Causal Inference is a Missing Data Problem
Local Identification
Curse of Dimensionality
*It's not by claiming that such an estimation approach has the lowest asymptotic variance
What notion of similarity are we using to define Conditional Independence?
What notion of similarity are we using to form predictions from the training data?
The Essence of Causal Inference is Similarity
The notion of similarity in point 2 should "extend" the notion of similarity in point 1
Continuous transformations don't preserve conditional independence
(1) Are we learning the appropriate kernel?
(2) Are the observations unbiased?
Complexity of Estimand
Model Complexity
Because we have a finite amount of data, we must make the following decisions
(1) What information do we want to condition on?
(2) Where do we want to land on the following continuum?
Structure
The Ability of the Model to Generalize
What are we betting on?
Lasso
Difference-in-Means
Fine-tuned LLMs
Feed-Forward Neural Net
OLS
Information in Controls
This isn't exact. It's meant to help you conceptualize your own viewpoint
Model Complexity
This is very subjective. It's meant to help you conceptualize your own viewpoint
Possible Situations
True Model has unknown simple structure
Performance
Performance
True Model has unknown complex structure
Performance
Performance
True Model has known simple structure
Performance
LLM
OLS
Performance
LLM
OLS
Inner Product Space
Metric Space
Topology
Mathematical Structures for Representation Similarity
Defined with respect to a Topology
Conditional Independence
Conditional Independence
Causal Framework
Model
We previously showed
Idea
Can we use pre-treatment data to approximate the selection bias?
Average Treatment on the Treated
Selection Bias
Derivation
Key Assumption
Selection Bias
Parallel Trends Interpretation
We observe this initial difference
Parallel Trends
We also observe this difference
But we know this captures
Parallel Trends
Estimated Selection Bias
Estimated ATT
Summary
With Controls
With controls, we're correcting for local selection bias
Conceptually, it doesn't make sense to include individual level fixed effects
We often cannot randomly assign treatment
We cannot randomly assign having a lawyer because that requires consent & follow through by tenant
We randomize the next best thing which is access to a free lawyer
Instrumental Variables
Ex:
Interested in impact of having a lawyer on eviction case outcome
Instrument
Treatment
Outcome
If the Instrument is Randomly Assigned
At the population, we observe two treatment effects
The impact of an offer of free legal representation on legal representation
The impact of an offer of free legal representation on Judgements of Possession
But we're not primarily interested in either of these two effects
As we'll show, we can only observe the following
The impact of legal representation on Judgements of Possession for the compliers
LATE
We're interested in the following
The impact of legal representation on Judgements of Possession
Classifying Individuals
We can capture the effect of lawyers on Judgements of Possession for this subset of the population
Derivation
Assume this group doesn't exist
Exclusion Restriction
Continued
Assume this group doesn't exist
By Definition
Continued
Exclusion Restriction
We're still interested in the effect that a lawyer has on Judgement of Possession
Intention-to-Treat
First Stage
LATE
Motivation
But we see that in the paper's we're read (Diamond et al. 2019, Chetty et al 2016), Economists tend to fit linear models to the data
Up to this point in our class, we have emphasized a Non-parametric approach to Causal Inference
We want to try and understand what these linear models are capturing. It's certainly different from our nonparametric approach
Focus
The Linear Model
Outcome
Treatment
Controls
Chetty 2016 (Example)
Use Experimental Voucher for at least one year
Offer of Experimental Voucher
Site Fixed Effect
The Linear Model
Interested in the effect of an offer of a voucher has on Neighborhood Poverty Rate
Chetty 2016
Statistics Question
Overview
Explain Residualized Regression
Show that's it's a useful way to interpret Coefficients in Linear Models (including linear IV models!)
(Helpful for reading papers)
Flexible Enough to use "Text-based Controls"
(Potentially an interesting Research Direction for your Final Paper)
Residualized Regression
Not This!
Residualized Regression (Chetty 2016)
Synthetic Data
Residualized Regression (Chetty 2016)
(Based on Controls: Site Location)
Residualized Regression (Chetty 2016)
Notice that Treated Individuals have Positive Residuals!
Does this make sense to you?
Key Takeaway
The Coefficient of Interest in the Linear Model is the same as the coefficient in the Residualized Model
Conceptual Understanding
When we're running linear regression, we are regressing the outcome variable on differences between the treatment and the expected treatment (where the expected treatment is a linear function of the controls)
If treatment is as good as randomly assigned conditional on the controls, then it's essentially random who has a positive residual and who has a negative residual. The only difference is that individuals with positive residuals received treatment.
Therefore the relationship between the outcome variable and the residuals captures a relationship between the outcome and the treatment variable that isn't contaminated by selection bias
Extension
Partially Linear Models
The difference between treatment and the predicted treatment based only on the controls
Linear Models
How to Interpret the Coefficient of Interest
The Linear IV Model
Example: Humphries et al. (2024)
Evicted
Time
Landlord
Tenure
Balance
Months Behind
By Patrick Power