The Five Stages of Terrible Data

(or Steven and the Terrible, Horrible, No Good, Very Bad Data)

#NICAR15

@dataeditor

 

http://bit.ly/terribledata

  1. Denial
  2. Anger
  3. Bargaining
  4. Depression
  5. Acceptance

We've all been through these stages.

If not, you're probably stuck at Denial.

Good data doesn't exist on every subject.

Don't simply use data because it's the best out there.

Using data because it's the best that exists on a subject is like eating to cleanest food you can find in the trash can. 

You're still eating trash.

What does bad data look like?

Bad data can often look a lot like good data.

What can bad data really look like?

  • Missing data
  • Missing fields
  • Produced by company (sometimes)
  • Field definitions are fuzzy or undefined
  • It's super dirty

You can't always blame the data

It's not always the data that's terrible

  • You might not understand:
    • How the data is collected
    • Why the data was collected
    • Every column in the data
    • The purpose of the data

There are two types of bad data

Bad data

You can mostly use this data if you can fill in some of the blanks or clean it.

Irredeemably bad data

You should probably just put this data on a rocket ship and fly it straight into the sun.

DENIAL

"The data can't be that bad."

ANGER

"The data isn't that bad!"

BARGAINING

"What if the data isn't that bad?"

DEPRESSION

"The data is bad and I feel bad."

ACCEPTANCE

"The data is bad and I'll move past it."

Most people don't make it through all five stages.

(That's not a good thing)

If you don't get through all of the stages, everyone suffers.

How can we stop this cycle?

Slow. Down.

Pick up the phone.

Try to poke holes.

Help your neighbor.

(Friends don't let friends use bad data)

Bad data leads to bad stories.

Bad stories lead to bad policies.

Bad policies lead to bad outcomes.

Bad outcomes lead to the world becoming a post-apocalyptic wasteland.

(probably)

Moral of the story:

KNOW.

THY.

DATA.

The Five Stages of Terrible Data

(or Steven and the Terrible, Horrible, No Good, Very Bad Data)

#NICAR15

@dataeditor

 

http://bit.ly/terribledata