Economics 155: Intro Microeconomics
Finding & Preparing Data
Ryan Clement | Middlebury Libraries | Spring 2020
What are we covering?
- How to think about your data search
- A few major data sources
- Codebooks
- Dirty, unprepared data
Before you start your data search
- What variables do I need?
- Independent variable
- Dependent variable
- What unit of observation do I need?
- microdata vs macrodata
- What time period/frequency do I need?
Before you start your data search
- "Who would care about this?"
- And who would care about keeping it?
- What type of organization are they?
- Educational institutions, government organization, private company, etc.
- If not government, how valuable is the data?
- And who would pay for it?
- Are there privacy/confidentiality issues?
-
Cross-Sectional
- data that are only collected once
- many public opinion surveys are cross-sectional
-
Time Series
- studies the same variable over time
- the Census or the National Health Interview Study are examples
- the questions generally remain the same over time, but the individual respondents vary
-
Longitudinal Studies
- conducted repeatedly, same group of respondents surveyed each time
- allows for examining changes over the life course
- Add Health is an example
Types of studies
Searching Google for Data
- Don't start with Google
- Be as specific as possible in search terms (i.e. "microdata")
- Remember the "who would care" rule
- Some access points:
- Social Explorer (for tables and many different geographies)
- IPUMS (historical, harmonized, microdata)
- NHGIS (historical spatial data)
- When working with historical/time series data:
- Watch for changing values
- Watch for changing geographic coverage
- Watch for changing questions
-
Which ACS is right for you?
- The 3-year ACS is going away
- Sociological survey on demographic, behavioral, and attitudinal topics
- Annually from 1972-1994, then biennially since 1994
- Randomly selected sample of adults (18+) in United States
- Two samples of ~1500 respondents each
- Some questions appear every year; some come and go; some come and then never return
- Longitudinal study of students in grades 7-12 in 1994-95 (most recent follow up in 2008)
- Survey data on social, economic, psychological and physical well-being
- Contextual data on family, neighborhood, community, school, friendships, peer groups, and romantic relationships
- Public use and restricted versions of the data; public use available through ICPSR
- Part of the Institute for Social Research at University of Michigan
- First attempt at openly sharing data amongst researchers (started with election studies data)
- Curated, digitized, diverse historical data sets
go/icpsr/
IPUMS Project Goals
- Collect and preserve data and documentation
- Harmonize data
- Disseminate the data absolutely free!
-
Use it for GOOD -- never for EVIL
Other government sources
Other data repositories
Codebooks
- Column locations and widths for each variable (if necessary)
- Definitions of different record types
- Response codes for each variable
- Codes used to indicate nonresponse and missing data
- Exact questions and skip patterns used in a survey
- Other indications of the content and characteristics of each variable
What's in a codebook?
Dirty, Unprepared Data
Missing data
Bad data
Unclear data
2020_microecon_findingdata
By Ryan Clement
2020_microecon_findingdata
- 764