APP Data Training Extravaganza
What to do with data once you have it
Clean data is good data
But all data has a human imprint.
Humans make a lot of misteaks
Open Refine is a free, powerful, safe data cleaning program

Gannett considers it too dangerous to download. It is not
PDFs are a part of data life

There are tools that can help liberate data from them
PDF check

Is it an image? Or can you highlight text?
If you can highlight text, you can extract information more easily.
If your data includes tables, Adobe has a free tool that can pull it into Microsoft Excel:
PDF Image? You need more power
(And patience)


These tools were developed specifically for journalists
Have a boatload of PDFS? Try Pinpoint to organize, categorize

Cleaning data in a spreadsheet

Data cleaning basics
-
Fix misspellings
-
Standardize categories
-
Insure number columns are all numbers
-
Trim extra space

Class participation!
https://t.ly/appdata1
GO HERE:


The good stuff is always at the bottom
Copy everything after "metadata" for Mount Holly

Copy it into a spreadsheet.
Find the same data for New York, NY. Copy it directly below Mont Holly.
Select "Text to Columns" under the DATA tab, then...


Select "Delimited", and click next
Select "Comma" and click Finish

Hooray! Now we have structured data
Now create a header column.
Find and Replace
Use it to replace errors in the data en masse, like spelling mistakes
Remove unwanted characters

Filter is your friend



Search for irregularities in each column
Fixing these types of errors allows you to analyze your data

Use "trim" to ditch white space
A lot of databases have extra white space hiding in cells.
Type =TRIM(NameOfCell) to edit out the extra space.

Data not how you like? Try a pivot table

An intense checklist for bulletproofing your data:
https://t.ly/datachecklist
What to do with data once you have it
By sstirling
What to do with data once you have it
- 166