APP Data Training Extravaganza
Finding data and what to avoid
Numbers can suck hard
Data isn't this
It's this
Data
Can be the backbone of your reporting, from dailies to investigations
Where do you find data?
EVERYWHERE
Much of it is hiding in plain sight
Start with creative Googling
- Search for the topic you're looking for and "data"
- Search for other places data might hide: "Reports", "publications", "statistics"
- Try multiple combos and search deep on the results
Search government websites
- Tons of data is required to be published by law
- Investigate/explore government websites deeply. Click on things.
- Menus are your friend: Search for things like "data", "reports", "publications", "statistics", "dashboard"
Use URLs to find more
Work backwards to see where things are stored
Finding the database is great.
Finding the system that built that database is better
The appendix is your friend
If it doesn't exist, build it
Building your own database creates journalism no one else has
Example: Bias data
-
Original database story: Bias incidents against Jewish people are on the rise
- Custom database story: 1 in 7 bias incidents against Jewish people occur in 4 of NJ's 564 towns: Lakewood, Howell, Jackson and Toms River
OPRA Requests
Make them often, fail repeatedly — it's worth it
- When you request data, state your format: Ask for data in a "machine-readable, non-PDF format" such as a CSV or Microsoft Excel file
- Only ask for records that exist: OPRA custodians only have to give you records that are officially kept. Find the name of the database/report you are requesting before you OPRA
Avoiding bad data
USE OFFICIAL GOVERNMENT DATA WHEN POSSIBLE
In a capitalistic society, non-government sources typically want to sell you something, and that can make bad data.
Gov't data often has legal requirements to remain non-partisan. Approach all 3rd party data with skepticism.
Gov't vs. Private Data
Use freely!
Use with Caution
Check Methodology!
- Whenever you are considering using data from a 3rd party, check their work.
- No methodology = 🚩
- Look for robust methodology that actually explains how data was used. READ IT.
- If you don't think you could replicate the analysis with enough time, don't use it
- If something seems off, it probably is.
RTFD
- Look for any supporting material for the data you are using. READ IT.
- "README" files, "Data dictionaries" and "record layouts" often accompany data. These are often your guides to using the data.
- These files can be the difference between data glory and you getting sued.
(Read the ... ahem ... documents)
Keep a data log!
Document everything you're doing in its own spreadsheet
Do you understand the methodology or documentation?
Be honest with yourself! If you don't, talk to someone who does.
Avoid subjective terms if possible
- What town has the most crime in NJ?
- These towns have the most home sales
- The schools with the highest test scores in NJ
Things like the "best" cities or the "safest" towns
Provable info
Approach with skepticism
- What are the best towns to live in NJ?
- What are the safest towns in NJ?
- What are the best schools in NJ?
Ask questions of the data and provider
-
Why haven't we done this reporting ourselves?
-
Who is providing it? Who do they represent? Who funds them?
Talk to someone
Talk to someone that knows the data you are seeking/looking at. All of these databases were built by people — find them.
Academic experts also often work extensively with data. They can warn you of potential pitfalls, problems. Look for people who have written about your data.
EXERCISE TIME!
Find the number of individual people from Monmouth County who sought treatment for alcohol abuse from Jan. 2023 to June 2023.
Hint: Look for the name of the system NJ uses to log this information to find a pathway to this data.
NJ's largest tree is in Blairstown
Finding data and what to avoid
By sstirling
Finding data and what to avoid
- 153