Empirical Research Design

with Difference-in-differences Estimation

20 February, 2019

Hiro Ishise

Shuhei Kitamura

Masa Kudamatsu

Tetsuya Matsubayashi

Takeshi Murooka

(OSIPP Osaka University)

For references and more detail,

please see the accompanying document for this lecture

What kind of research should public policy school students conduct?

Our answer:

Policy evaluation with difference-in-differences (DID) estimation

Road Map

Why DID?

How to find a research question

What outcome dataset to look for

What policy to look for

Road Map

Why DID?

How to find a research question

What outcome dataset to look for

What policy to look for

Bad research proposals

An example (what Masa actually saw)

Question: What determines

subjective wellbeing of people over 65 years old in China?

y = c + \alpha x_1 + \beta x_2 + \gamma x_3 + ... + \varepsilon

Income

Health

Age

Data: cross-sectional survey of Chinese people in 2005

Equation to estimate:

Choice of regressors: guided by the existing literature

Bad research proposals

An example (what Masa actually saw)

Question: What determines

subjective wellbeing of people over 65 years old in China?

y = c + \alpha x_1 + \beta x_2 + \gamma x_3 + ... + \varepsilon

Income

Health

Age

Data: cross-sectional survey of Chinese people in 2005

Equation to estimate:

Choice of regressors: guided by the existing literature

We strongly discourage this type of research

Why bad?

Omitted variable bias

No concrete policy implication

Lack of originality

DID mitigates these problems

Omitted variable bias

No concrete policy implication

Lack of originality

One-page summary of DID estimation

Outcome

Control group

Treatment group

Treatment effect

Counterfactual outcome for treatment group

Benefits of DID

Less omitted variable bias

Concrete policy implications

Easier to propose an original research

Plus

More feasible than RCT, RDD, IV

Road Map

Why DID?

How to find a research question

What outcome dataset to look for

What policy to look for

Find both a policy and outcomes

What determines ... ?

What's the impact of ... ?

Does a policy improve outcomes ?

How?

We propose two approaches: outcome-driven & policy-driven

There are other ways to find a research question.

Ask your supervisor for suggestions

Outcome-driven: step 1

Find an outcome of your interest

Poverty

Gender equality

Student performance at school

People's health

Subjective well-being

Firm productivity

etc.

Outcome-driven: step 2

List all the possible determinants of the outcome of your interest

Determinants of subjective well-being:

Personal income

Health

Marriage status

# of children

Age

etc.

e.g.

with literature review, mass media, your educated guess

Outcome-driven: step 3

Think of a policy that affects any of these determinants

Determinants of school attendance:

Parents' income

Student's health

Teen pregnancy

Distance to school

etc.

e.g.

can be changed by school construction

Outcome-driven: step 4

Find such a policy that was actually implemented somewhere

Determinants of school attendance:

Parents' income

Student's health

Teen pregnancy

Distance to school

etc.

e.g.

can be changed by school construction

Indonesia constructed schools on a massive scale

during 1973-78 (Duflo 2001)

Outcome-driven: step 5

Now you have a research question.

Did school construction increase school attendance

in Indonesia?

e.g.

Policy-driven: step 1

Start with a policy of your interest

New tax

Reduction in unemployment benefits

Expansion of eligibility for child allowance

Construction of schools

Reform in college admission system

etc.

e.g.

Policy-driven: step 2

List up what could be affected by the policy of your interest

Subsidy for child care:

labor market participation of mothers

occupational choice of women

workload of workers at child care centers

etc.

e.g.

with literature review, mass media, your educated guess

Policy-driven: step 3

Figure out which possible outcomes of the policy are

(1) more important in terms of people's living standards

(2) what other researchers haven't looked at

Installing air-conditioners at primary schools in Japan:

Profits of air-conditioner makers

School pupils' health / learning outcomes

School teachers' health

e.g.

Ask your supervisor if not sure

More important

More original (perhaps)

Policy-driven: step 4

Now you have a research question.

Does installing air-conditioners at primary schools improve school teacher's health in Japan?

e.g.

Once you've found a research question...

Talk to your supervisor

to check if it's original and important/interesting

Review the related literature

to check if it's original in terms of methodology

A good research question satisfies 3 conditions:

Original

Important / Interesting

Feasible

Now it's time to check the feasibility

This is what Steve Pischke taught me during my 1st-year of PhD study at LSE

Road Map

Why DID?

How to find a research question

What outcome dataset to look for

What policy to look for

Outcome

Control group

Treatment group

Treatment effect

Counterfactual outcome for treatment group

Panel data is a must

Outcomes need to be observed at least twice

Panel data 1: Longitudinal data

2001

2003

2002

2004

2018

...

Panel data 1: Longitudinal data

2001

2003

2002

2004

2018

...

Longitudinal data is expensive to collect and thus difficult to find

Panel data 2: Repeated cross-sections

2005

2000

2010

2015

...

Born in 1965

Born in 1966

Born in 1967

Born in 1968

Born in 1985

Panel data 2: Repeated cross-sections (cont.)

Examples (for developing countries)

Living Standard Measurement Surveys (LSMS) by World Bank

Demographic and Health Surveys (DHS) by USAID

Panel data 3: Cross-sectional survey

born in 1965

...

District A

District B

District C

District D

District Z

born in 2003

...

born in 1966

born in 1967

Panel data 3: Cross-sectional survey

Example: Duflo (2001)

Cross-sectional survey of men in Indonesia in 1995

used as a district panel data

Panel data 4: Recall data from cross-section

2001

2003

2002

2004

2018

...

Panel data 4: Recall data

Example: Kudamatsu (2012)

Cross-sectional fertility surveys of women aged 15-49 across Africa

used as a panel data of child births

Road Map

Why DID?

How to find a research question

What dataset to look for

What policy to look for

Outcome

Control group

Treatment group

Treatment effect

Counterfactual outcome for treatment group

Where

When and where was the policy implemented?

When

When and where was the policy implemented?

Example 1: Card and Krueger (1994)

Minimum wage raised

in the state of New Jersey

in April 1992,

but not in Pennsylvania

When and where was the policy implemented?

Example 2: Richardson and Troost (2009)

Central bank provided

credits to troubled banks

in District 6 of Mississippi state

in 1931,

but not in District 8

When should the policy have been implemented for your DID estimation?

Time

Your panel data

Date of your policy

When should the policy have been implemented for your DID estimation?

Time

Your panel data

Date of your policy

Because there's no control group

When should the policy have been implemented for your DID estimation?

Time

Your panel data

Date of your policy

Because there's no treatment group

Where should the policy have been implemented for your DID estimation?

Ideally, only some parts of a country

Treatment and control groups will be (relatively) comparable

Japan and China: differ a lot

Provinces in China: relatively similar

Prefectures in Japan: relatively similar

e.g.

Where should the policy have been implemented for your DID estimation?

Example: Wang (2013)

on Special Economic Zones

Source: Figure 2 of Wang (2013)

Nationwide policies for DID estimation?

In countries with centralized policy-making (e.g. Japan)

most policies are implemented nationwide

You can still use nationwide policies for DID estimation in 3 ways

Nationwide policies for DID estimation #1

Example: Kondo and Shigeoka (2013)

Universal health insurance in Japan since 1961

Prefecture A

Prefecture B

Before 1961

Since 1961

Everyone insured

Many insured

Few insured

Treatment Group

Nationwide policies for DID estimation #1

Everyone gets treated by the policy

but some citizens were already treated before the policy date

Area A

Area B

Before

After

Everyone treated

Many treated

Few treated

Treatment Group

Nationwide policies for DID estimation #2

Example: Meyer et al. (1995)

Increases in benefits for work-related injuries in Kentucky in 1980

(also in Michigan in 1982)

High-earning workers

Low-earning workers

Before 1980

Since 1980

Benefits increased

No increase

Treatment Group

Nationwide policies for DID estimation #2

Only certain groups of citizens get treated by policy

Citizen group A

Citizen group B

Before

After

Treated

Not treated

Treatment Group

Not treated

This type of policy may allow you to conduct RDD!!!

Nationwide policies for DID estimation #3

Example: Baland and Robinson (2008)

Secret ballots in national elections in Chile since 1958

Areas with many landless farmers

Before 1958

Since 1958

Free to vote

Treatment Group

Areas with few landless farmers

Landlords control

their votes

Free to vote

Nationwide policies for DID estimation #3

Your theory predicts that

the policy affects certain groups of areas / citizens more than others

Group A

Before

After

Affected by the policy

Not affected

Treatment Group

Group B

Not affected

An example of how DID tests your theoretical predictions

Searching policies for DID estimation...

may allow you to find policies that are

randomly allocated

(e.g. corruption audits for Brazilian mayors)

appropriate for RDD

(e.g. policies applied to 65+ years old only)

If this happens to you, congratulations!

Non-policy treatments

Everything discussed today

can be applied to non-policy treatments

Weather shocks

Political events

Commodity price shocks

etc.

DID estimation techniques

See Section 5 of the accompanying document for this lecture

Outcome

Control group

Treatment group

Treatment effect

Counterfactual outcome for treatment group

Good luck with your thesis!

Copy of Empirical research design with difference-in-differences estimation

By Masayuki Kudamatsu

Copy of Empirical research design with difference-in-differences estimation

1,397

Masayuki Kudamatsu

sites.google.com/site/mkudamatsu

Empirical Research Design

with Difference-in-differences Estimation

Road Map

Road Map

Bad research proposals

Bad research proposals

We strongly discourage this type of research

Why bad?

DID mitigates these problems

One-page summary of DID estimation

Benefits of DID

Plus

Road Map

Find both a policy and outcomes

Outcome-driven: step 1

Outcome-driven: step 2

Outcome-driven: step 3

Outcome-driven: step 4

Outcome-driven: step 5

Policy-driven: step 1

Policy-driven: step 2

Policy-driven: step 3

Policy-driven: step 4

Once you've found a research question...

A good research question satisfies 3 conditions:

Road Map

Panel data is a must

Panel data 1: Longitudinal data

Panel data 1: Longitudinal data

Panel data 2: Repeated cross-sections

Panel data 2: Repeated cross-sections (cont.)

Panel data 3: Cross-sectional survey

Panel data 3: Cross-sectional survey

Panel data 4: Recall data from cross-section

Panel data 4: Recall data

Road Map

When and where was the policy implemented?

When and where was the policy implemented?

When and where was the policy implemented?

When should the policy have been implemented for your DID estimation?

When should the policy have been implemented for your DID estimation?

Because there's no control group

When should the policy have been implemented for your DID estimation?

Because there's no treatment group

Where should the policy have been implemented for your DID estimation?

Where should the policy have been implemented for your DID estimation?

Nationwide policies for DID estimation?

Nationwide policies for DID estimation #1

Nationwide policies for DID estimation #1

Nationwide policies for DID estimation #2

Nationwide policies for DID estimation #2

Nationwide policies for DID estimation #3

Nationwide policies for DID estimation #3

Searching policies for DID estimation...

Non-policy treatments

DID estimation techniques

Good luck with your thesis!

Copy of Empirical research design with difference-in-differences estimation

More from Masayuki Kudamatsu