Finding data and documents  

Sarah Cohen / Cronkite School of Journalism / February 2022

Take a tour

Look for

  • Small office and program names
  • IT initiatives
  • Officials' names

Text

Text

Text

Another example: long-term care complaints

Lookups that might be scraped

Text

Public data portals

Statistical reports =

Counting up items!

Audits and inspections

Look elsewhere

  • muckrock.com has a stash of previously submitted public records requests. Not all of them are good, but they can lead you to a good request.
     
  • data.world wants to be the Github for data -- a way to collaborate on data-related projects. Most data is unverified and not well documented, but it can give you a sense of what's available from official sources elsewhere.
     
  • Google's new beta dataset search seems pretty haphazard - but it could point you to good datasets elsewhere that you can use as a model.

Data aggregation sites

A NEWS21 EXAMPLE

Start with  a story!

  • Research to see what's already known
  • Don't repeat what others have done - use their work instead
  • Consider difficulty and impact of an effort
  • Get expert help
  • Define what you want to know or say - write out the sentence to make sure your data will actually answer an interesting question

Success! 

  • FBI Crime Victimization survey
  • Scraped press releases from USDOJ
  • Home-made database of 2 weeks' worth of postings from far-right and neo-Nazi groups on Twitter, Facebook, Gab, VK. 
  • Partnership with ProPublica's "Documenting Hate" project
  • FBI UCR Hate crime series

#Fail

  •  US Attorneys' Case management system
  • Databases of graffiti maintained by local police departments (gang tags along with hate symbols)
  • Historical public opinion polls from the Roper Center for Public Opinion research
  • Couldn't get images of hate symbols from major mapping companies like Google

Possible sources : Easier

  • Government agencies and open gov. sites
  • Hobbyists and interest groups
  • Academic researchers
  • Microdata from government programs like Census, Office of Justice Programs
  • Social data through API's - Spotify and Twitter are easiest

Possible sources : Harder

  • Public records requests
  • Whistleblower leaks
  • Survey of local governments / departments.
  • Homemade data

sarah.h.cohen@asu.edu

@sarahcnyt

https://slides.com/sarahcnyt/news21

 

Right to know - News 21

By Sarah Cohen

Right to know - News 21

USC health fellows presentation on public records, edited October 2018

  • 544