Economics 402: Junior Seminar 

Finding & Preparing Data

Ryan Clement | Reed College Library | Spring 2015

What are we doing?

  • How to think about your data search
  • Major data sources
  • Codebooks
  • Dirty, unprepared data
  • Open searching time

Before you start your data search

  • "Who would care about this?"
    • And who would care about keeping it?
  • What type of organization are they?
    • Educational institutions, government organization, private company, etc.
  • If not government, how valuable is the data?
    • And who would pay for it?
  • Are there privacy/confidentiality issues?
    • And at what level of observation do you need the data?

Searching Google for Data

  • Don't start with Google
  • Be as specific as possible in search terms (i.e. "microdata")
  • Remember the "who would care" rule
  • Access points:
  • When working with historical/time series data:
    • Watch for changing values
    • Watch for changing geographic coverage
    • Watch for changing questions
  • Which ACS is right for you?
    • Except the 3-year ACS is going away
  • Sociological survey on demographic, behavioral, and attitudinal topics
  • Annually from 1972-1994, then biennially since 1994
  • Randomly selected sample of adults (18+) in United States
    • Two samples of ~1500 respondents each
  • Some questions appear every year; some come and go; some come and then never return
  • Longitudinal study of students in grades 7-12 in 1994-95 (most recent follow up in 2008)
  • Survey data on social, economic, psychological and physical well-being
    • Contextual data on family, neighborhood, community, school, friendships, peer groups, and romantic relationships
  • Public use and restricted versions of the data; public use available through ICPSR
  • Part of the Institute for Social Research at University of Michigan
  • First attempt at openly sharing data amongst researchers (started with election studies data)
  • Curated, digitized, diverse historical data sets

IPUMS Project Goals

  • Collect and preserve data and documentation
  • Harmonize data
  • Disseminate the data absolutely free!
  • Use it for GOOD -- never for EVIL

Other government sources

Other data repositories

Codebooks

  • Column locations and widths for each variable (if necessary)
  • Definitions of different record types
  • Response codes for each variable
  • Codes used to indicate nonresponse and missing data
  • Exact questions and skip patterns used in a survey
  • Other indications of the content and characteristics of each variable

What's in a codebook?

What else is a codebook good for?

Dirty, Unprepared Data

Missing data
Bad data
Unclear data

Questions?

econ402

By Ryan Clement

econ402

  • 1,457