Finding data and documents
Sarah Cohen / Cronkite School of Journalism / February 2022
Take a tour




Look for
- Small office and program names
- IT initiatives
- Officials' names



Text
Text
Text
Another example: long-term care complaints


Lookups that might be scraped
Text
Public data portals

Statistical reports =
Counting up items!

Audits and inspections


Look elsewhere


-
muckrock.com has a stash of previously submitted public records requests. Not all of them are good, but they can lead you to a good request.
-
data.world wants to be the Github for data -- a way to collaborate on data-related projects. Most data is unverified and not well documented, but it can give you a sense of what's available from official sources elsewhere.
- Google's new beta dataset search seems pretty haphazard - but it could point you to good datasets elsewhere that you can use as a model.
Data aggregation sites
A NEWS21 EXAMPLE
Start with a story!
- Research to see what's already known
- Don't repeat what others have done - use their work instead
- Consider difficulty and impact of an effort
- Get expert help
- Define what you want to know or say - write out the sentence to make sure your data will actually answer an interesting question
Success!
- FBI Crime Victimization survey
- Scraped press releases from USDOJ
- Home-made database of 2 weeks' worth of postings from far-right and neo-Nazi groups on Twitter, Facebook, Gab, VK.
- Partnership with ProPublica's "Documenting Hate" project
- FBI UCR Hate crime series
#Fail
- US Attorneys' Case management system
- Databases of graffiti maintained by local police departments (gang tags along with hate symbols)
- Historical public opinion polls from the Roper Center for Public Opinion research
- Couldn't get images of hate symbols from major mapping companies like Google
Possible sources : Easier
- Government agencies and open gov. sites
- Hobbyists and interest groups
- Academic researchers
- Microdata from government programs like Census, Office of Justice Programs
- Social data through API's - Spotify and Twitter are easiest
Possible sources : Harder
- Public records requests
- Whistleblower leaks
- Survey of local governments / departments.
- Homemade data
sarah.h.cohen@asu.edu
@sarahcnyt
https://slides.com/sarahcnyt/news21
Right to know - News 21
By Sarah Cohen
Right to know - News 21
USC health fellows presentation on public records, edited October 2018
- 544

