Empirical Research Design
with Difference-in-differences Estimation
20 February, 2019
Hiro Ishise
Shuhei Kitamura
Masa Kudamatsu
Tetsuya Matsubayashi
Takeshi Murooka
(OSIPP Osaka University)
For references and more detail,
please see the accompanying document for this lecture
What kind of research should public policy school students conduct?
Our answer:
Policy evaluation with difference-in-differences (DID) estimation
Road Map
Why DID?
How to find a research question
What outcome dataset to look for
What policy to look for
Road Map
Why DID?
How to find a research question
What outcome dataset to look for
What policy to look for
Bad research proposals
An example (what Masa actually saw)
Question: What determines
subjective wellbeing of people over 65 years old in China?
Income
Health
Age
Data: cross-sectional survey of Chinese people in 2005
Equation to estimate:
Choice of regressors: guided by the existing literature
Bad research proposals
An example (what Masa actually saw)
Question: What determines
subjective wellbeing of people over 65 years old in China?
Income
Health
Age
Data: cross-sectional survey of Chinese people in 2005
Equation to estimate:
Choice of regressors: guided by the existing literature
We strongly discourage this type of research
Why bad?
Omitted variable bias
1
2
3
No concrete policy implication
Lack of originality
DID mitigates these problems
Omitted variable bias
1
2
3
No concrete policy implication
Lack of originality
One-page summary of DID estimation
Outcome
Control group
Treatment group
Treatment effect
Counterfactual outcome for treatment group
Benefits of DID
Less omitted variable bias
1
2
3
Concrete policy implications
Easier to propose an original research
Plus
4
More feasible than RCT, RDD, IV
Road Map
Why DID?
How to find a research question
What outcome dataset to look for
What policy to look for
Find both a policy and outcomes
What determines ... ?
X
X
O
What's the impact of ... ?
Does a policy improve outcomes ?
How?
We propose two approaches: outcome-driven & policy-driven
There are other ways to find a research question.
Ask your supervisor for suggestions
Outcome-driven: step 1
Find an outcome of your interest
Poverty
Gender equality
Student performance at school
People's health
Subjective well-being
Firm productivity
etc.
Outcome-driven: step 2
List all the possible determinants of the outcome of your interest
Determinants of subjective well-being:
Personal income
Health
Marriage status
# of children
Age
etc.
e.g.
with literature review, mass media, your educated guess
Outcome-driven: step 3
Think of a policy that affects any of these determinants
Determinants of school attendance:
Parents' income
Student's health
Teen pregnancy
Distance to school
etc.
e.g.
can be changed by school construction
Outcome-driven: step 4
Find such a policy that was actually implemented somewhere
Determinants of school attendance:
Parents' income
Student's health
Teen pregnancy
Distance to school
etc.
e.g.
can be changed by school construction
Indonesia constructed schools on a massive scale
during 1973-78 (Duflo 2001)
Outcome-driven: step 5
Now you have a research question.
Did school construction increase school attendance
in Indonesia?
e.g.
Policy-driven: step 1
Start with a policy of your interest
New tax
Reduction in unemployment benefits
Expansion of eligibility for child allowance
Construction of schools
Reform in college admission system
etc.
e.g.
Policy-driven: step 2
List up what could be affected by the policy of your interest
Subsidy for child care:
labor market participation of mothers
occupational choice of women
workload of workers at child care centers
etc.
e.g.
with literature review, mass media, your educated guess
Policy-driven: step 3
Figure out which possible outcomes of the policy are
(1) more important in terms of people's living standards
(2) what other researchers haven't looked at
Installing air-conditioners at primary schools in Japan:
Profits of air-conditioner makers
School pupils' health / learning outcomes
School teachers' health
e.g.
Ask your supervisor if not sure
More important
More original (perhaps)
Policy-driven: step 4
Now you have a research question.
Does installing air-conditioners at primary schools improve school teacher's health in Japan?
e.g.
Once you've found a research question...
Talk to your supervisor
to check if it's original and important/interesting
Review the related literature
to check if it's original in terms of methodology
A good research question satisfies 3 conditions:
1
2
3
Original
Important / Interesting
Feasible
Now it's time to check the feasibility
This is what Steve Pischke taught me during my 1st-year of PhD study at LSE
Road Map
Why DID?
How to find a research question
What outcome dataset to look for
What policy to look for
Outcome
Control group
Treatment group
Treatment effect
Counterfactual outcome for treatment group
Panel data is a must
Outcomes need to be observed at least twice
Panel data 1: Longitudinal data
2001
2003
2002
2004
2018
...
...
Panel data 1: Longitudinal data
2001
2003
2002
2004
2018
...
...
Longitudinal data is expensive to collect and thus difficult to find
Panel data 2: Repeated cross-sections
2005
2000
2010
2015
...
Born in 1965
Born in 1966
Born in 1967
Born in 1968
Born in 1985
Panel data 2: Repeated cross-sections (cont.)
Examples (for developing countries)
Living Standard Measurement Surveys (LSMS) by World Bank
Demographic and Health Surveys (DHS) by USAID
Panel data 3: Cross-sectional survey
born in 1965
...
District A
District B
District C
District D
District Z
born in 2003
...
born in 1966
born in 1967
Panel data 3: Cross-sectional survey
Example: Duflo (2001)
Cross-sectional survey of men in Indonesia in 1995
used as a district panel data
Panel data 4: Recall data from cross-section
2001
2003
2002
2004
2018
...
...
Panel data 4: Recall data
Example: Kudamatsu (2012)
Cross-sectional fertility surveys of women aged 15-49 across Africa
used as a panel data of child births
Road Map
Why DID?
How to find a research question
What dataset to look for
What policy to look for
Outcome
Control group
Treatment group
Treatment effect
Counterfactual outcome for treatment group
Where
When and where was the policy implemented?
When
When and where was the policy implemented?
Example 1: Card and Krueger (1994)
Minimum wage raised
in the state of New Jersey
in April 1992,
but not in Pennsylvania
When and where was the policy implemented?
Example 2: Richardson and Troost (2009)
Central bank provided
credits to troubled banks
in District 6 of Mississippi state
in 1931,
but not in District 8
When should the policy have been implemented for your DID estimation?
Time
Your panel data
Date of your policy
When should the policy have been implemented for your DID estimation?
Time
Your panel data
Date of your policy
Because there's no control group
When should the policy have been implemented for your DID estimation?
Time
Your panel data
Date of your policy
Because there's no treatment group
Where should the policy have been implemented for your DID estimation?
Ideally, only some parts of a country
Treatment and control groups will be (relatively) comparable
Japan and China: differ a lot
Provinces in China: relatively similar
Prefectures in Japan: relatively similar
e.g.
Where should the policy have been implemented for your DID estimation?
Example: Wang (2013)
on Special Economic Zones
Source: Figure 2 of Wang (2013)
Nationwide policies for DID estimation?
In countries with centralized policy-making (e.g. Japan)
most policies are implemented nationwide
You can still use nationwide policies for DID estimation in 3 ways
Nationwide policies for DID estimation #1
Example: Kondo and Shigeoka (2013)
Universal health insurance in Japan since 1961
Prefecture A
Prefecture B
Before 1961
Since 1961
Everyone insured
Everyone insured
Many insured
Few insured
Treatment Group
Nationwide policies for DID estimation #1
Everyone gets treated by the policy
but some citizens were already treated before the policy date
Area A
Area B
Before
After
Everyone treated
Everyone treated
Many treated
Few treated
Treatment Group
Nationwide policies for DID estimation #2
Example: Meyer et al. (1995)
Increases in benefits for work-related injuries in Kentucky in 1980
(also in Michigan in 1982)
High-earning workers
Low-earning workers
Before 1980
Since 1980
Benefits increased
No increase
Treatment Group
Nationwide policies for DID estimation #2
Only certain groups of citizens get treated by policy
Citizen group A
Citizen group B
Before
After
Treated
Not treated
Treatment Group
Not treated
Not treated
This type of policy may allow you to conduct RDD!!!
Nationwide policies for DID estimation #3
Example: Baland and Robinson (2008)
Secret ballots in national elections in Chile since 1958
Areas with many landless farmers
Before 1958
Since 1958
Free to vote
Free to vote
Treatment Group
Areas with few landless farmers
Landlords control
their votes
Free to vote
Nationwide policies for DID estimation #3
Your theory predicts that
the policy affects certain groups of areas / citizens more than others
Group A
Before
After
Affected by the policy
Not affected
Treatment Group
Group B
Not affected
Not affected
An example of how DID tests your theoretical predictions
Searching policies for DID estimation...
may allow you to find policies that are
1
2
randomly allocated
(e.g. corruption audits for Brazilian mayors)
appropriate for RDD
(e.g. policies applied to 65+ years old only)
If this happens to you, congratulations!
Non-policy treatments
Everything discussed today
can be applied to non-policy treatments
Weather shocks
Political events
Commodity price shocks
etc.
DID estimation techniques
See Section 5 of the accompanying document for this lecture
Outcome
Control group
Treatment group
Treatment effect
Counterfactual outcome for treatment group
Good luck with your thesis!
Copy of Empirical research design with difference-in-differences estimation
By Masayuki Kudamatsu
Copy of Empirical research design with difference-in-differences estimation
- 1,112