APP Data Training Extravaganza

Finding data and what to avoid

Numbers can suck hard

Data isn't this

It's this

Data

Can be the backbone of your reporting, from dailies to investigations

Where do you find data?

EVERYWHERE

Much of it is hiding in plain sight

Start with creative Googling 

  • Search for the topic you're looking for and "data" 
  • Search for other places data might hide: "Reports", "publications", "statistics"
  • Try multiple combos and search deep on the results

Search government websites

  • Tons of data is required to be published by law
  • Investigate/explore government websites deeply. Click on things. 
  • Menus are your friend: Search for things like "data", "reports", "publications", "statistics", "dashboard"

Use URLs to find more

Work backwards to see where things are stored

Finding the database is great.

 

Finding the system that built that database is better

The appendix is your friend

If it doesn't exist, build it

Building your own database creates journalism no one else has

Example: Bias data

  • Original database story: Bias incidents against Jewish people are on the rise

  • Custom database story: 1 in 7 bias incidents against Jewish people occur in 4 of NJ's 564 towns: Lakewood, Howell, Jackson and Toms River

OPRA Requests

Make them often, fail repeatedly — it's worth it

  • When you request data, state your format: Ask for data in a "machine-readable, non-PDF format" such as a CSV or Microsoft Excel file
  • Only ask for records that exist: OPRA custodians only have to give you records that are officially kept. Find the name of the database/report you are requesting before you OPRA

Avoiding bad data

USE OFFICIAL GOVERNMENT DATA WHEN POSSIBLE

In a capitalistic society, non-government sources typically want to sell you something, and that can make bad data. 

Gov't data often has legal requirements to remain non-partisan. Approach all 3rd party data with skepticism.

Gov't vs. Private Data

Use freely!

Use with Caution

Check Methodology!

  • Whenever you are considering using data from a 3rd party, check their work. 
  • No methodology = 🚩
  • Look for robust methodology that actually explains how data was used. READ IT.
  • If you don't think you could replicate the analysis with enough time, don't use it
  • If something seems off, it probably is.

RTFD

  • Look for any supporting material for the data you are using. READ IT.
  • "README" files, "Data dictionaries" and "record layouts" often accompany data. These are often your guides to using the data. 
  • These files can be the difference between data glory and you getting sued. 

(Read the ... ahem ... documents)

Keep a data log!

Document everything you're doing in its own spreadsheet

Do you understand the methodology or documentation?

Be honest with yourself! If you don't, talk to someone who does. 

Avoid subjective terms if possible

  • What town has the most crime in NJ? 
  • These towns have the most home sales 
  • The schools with the highest test scores in NJ 

Things like the "best" cities or the "safest" towns

Provable info 

Approach with skepticism

  • What are the best towns to live in NJ?
  • What are the safest towns in NJ?
  • What are the best schools in NJ?

Ask questions of the data and provider

  • Why haven't we done this reporting ourselves?

  • Who is providing it? Who do they represent? Who funds them? 

Talk to someone

Talk to someone that knows the data you are seeking/looking at. All of these databases were built by people — find them.  

 

Academic experts also often work extensively with data. They can warn you of potential pitfalls, problems. Look for people who have written about your data.

EXERCISE TIME! 

Find the number of individual people from Monmouth County who sought treatment for alcohol abuse from Jan. 2023 to June 2023. 

Hint: Look for the name of the system NJ uses to log this information to find a pathway to this data. 

NJ's largest tree is in Blairstown

Made with Slides.com