Economics 402: Junior Seminar
Finding & Preparing Data
Ryan Clement | Reed College Library | Spring 2015
What are we doing?
- How to think about your data search
- Major data sources
- Codebooks
- Dirty, unprepared data
- Open searching time
Before you start your data search
- "Who would care about this?"
- And who would care about keeping it?
- What type of organization are they?
- Educational institutions, government organization, private company, etc.
- If not government, how valuable is the data?
- And who would pay for it?
- Are there privacy/confidentiality issues?
- And at what level of observation do you need the data?
Searching Google for Data
- Don't start with Google
- Be as specific as possible in search terms (i.e. "microdata")
- Remember the "who would care" rule
- Access points:
- ICPSR (including historical)
- Social Explorer (for quick overview)
- American Fact Finder (only last two decennial censuses)
- IPUMS (historical, harmonized, + microdata)
- NHGIS (historical spatial data)
- When working with historical/time series data:
- Watch for changing values
- Watch for changing geographic coverage
- Watch for changing questions
-
Which ACS is right for you?
- Except the 3-year ACS is going away
- Sociological survey on demographic, behavioral, and attitudinal topics
- Annually from 1972-1994, then biennially since 1994
- Randomly selected sample of adults (18+) in United States
- Two samples of ~1500 respondents each
- Some questions appear every year; some come and go; some come and then never return
- Longitudinal study of students in grades 7-12 in 1994-95 (most recent follow up in 2008)
- Survey data on social, economic, psychological and physical well-being
- Contextual data on family, neighborhood, community, school, friendships, peer groups, and romantic relationships
- Public use and restricted versions of the data; public use available through ICPSR
- Part of the Institute for Social Research at University of Michigan
- First attempt at openly sharing data amongst researchers (started with election studies data)
- Curated, digitized, diverse historical data sets
IPUMS Project Goals
- Collect and preserve data and documentation
- Harmonize data
- Disseminate the data absolutely free!
-
Use it for GOOD -- never for EVIL
Other government sources
Other data repositories
Codebooks
- Column locations and widths for each variable (if necessary)
- Definitions of different record types
- Response codes for each variable
- Codes used to indicate nonresponse and missing data
- Exact questions and skip patterns used in a survey
- Other indications of the content and characteristics of each variable
What's in a codebook?
What else is a codebook good for?
Dirty, Unprepared Data
Missing data
Bad data
Unclear data
Questions?
econ402
By Ryan Clement
econ402
- 1,457