finding the bad guys
Tricks for joining lists of people
Adam Playford, Newsday (@adamplayford)
Michael LaForgia, Tampa Bay times (@laforgia_)
NICAR 2013
Why?
- Gives your story sweep, precision
- Great stories built on this technique
Whiz-bang tech is cool, but we got into this
to hold people accountable
WHY HARD?
Lack of whiz-bang tech:
Existing tools are not great at this.
Two competing masters:
- Don't want to miss great examples.
- Also don't want to screw up.
What we want to do
Not going to teach you how to use everything
- Access, python classes start 9 a.m. tomorrow
- IRE boot camps, codeacademy.com
Let's talk about principles.
not a technology problem
a journalism problem
that requires technology
EXCEPT...
ONE TECHNICAL PROBLEM YOU CAN'T AVOID:
A JOIN CAN ALWAYS ADD ROWS.
FIRST STEPs:
ZERO CLICKING
WHAT DOES EACH
ROW REPRESENT?
WHAT FIELDS DOES EACH ROW
CONTAIN?
UNIQUE ID
Social Security # (SSN)
Gov't employee ID
Great if you have it in both data sets.
But that never happens.
:(
FIRST/LAST
- John Smith
- Nicknames
- Married women/Ron Artest
middle
- So many wrinkles.
- In Access/SQL, lose-lose.
- In Excel, sometimes easier.
=IF(NOT(ISBLANK(B1)),B1=C1,"")
ADDRESS
- 31 Main Avenue
- 31 Main Ave
- 31 Main Av Apt. 33
- 31 Main Av.#33
- First digits.
- If you can script, cleaning program.
DOB
- Great if you have it (move to Florida)
- But not enough to avoid John Smith
race/sex
Not usually very useful
Except when it is.
Zip
Actually quite helpful.
Zip structure: More digits, more specific
first, last, DOB
- Handles most John Smiths but not all
first, last & address
- Father/Son things.
- This happens a LOT.
first, last, address & dob
- The closest you get to perfect.
- Still may be imperfect.
- Only as good as your data.
partial matches
can be interesting, too
first, address & dob
- Women who've changed names.
lots of combinations.
all do different things.
journalism problem.
address by digits & first-3 zip
name matches by commonness
Weight uncommon names
higher than common names.
- Social Security Agency: First names [link]
- Census: Last names [link]
We are surprisingly bad at guessing
whether a name is common.
inmate visitors
CANTEEN
to the bar!
adam.playford@newsday.com
@adamplayford
mlaforgia@tampabay.com
@laforgia_
http://bit.ly/badguys2013