APP Data Training Extravaganza
What to do with data once you have it
Clean data is good data
But all data has a human imprint.
Humans make a lot of misteaks
Open Refine is a free, powerful, safe data cleaning program
Gannett considers it too dangerous to download. It is not
PDFs are a part of data life
There are tools that can help liberate data from them
PDF check
Is it an image? Or can you highlight text?
If you can highlight text, you can extract information more easily.
If your data includes tables, Adobe has a free tool that can pull it into Microsoft Excel:
PDF Image? You need more power
(And patience)
These tools were developed specifically for journalists
Have a boatload of PDFS? Try Pinpoint to organize, categorize
Cleaning data in a spreadsheet
Data cleaning basics
-
Fix misspellings
-
Standardize categories
-
Insure number columns are all numbers
-
Trim extra space
Class participation!
https://t.ly/appdata1
GO HERE:
The good stuff is always at the bottom
Copy everything after "metadata" for Mount Holly
Copy it into a spreadsheet.
Find the same data for New York, NY. Copy it directly below Mont Holly.
Select "Text to Columns" under the DATA tab, then...
Select "Delimited", and click next
Select "Comma" and click Finish
Hooray! Now we have structured data
Now create a header column.
Find and Replace
Use it to replace errors in the data en masse, like spelling mistakes
Remove unwanted characters
Filter is your friend
Search for irregularities in each column
Fixing these types of errors allows you to analyze your data
Use "trim" to ditch white space
A lot of databases have extra white space hiding in cells.
Type =TRIM(NameOfCell) to edit out the extra space.
Data not how you like? Try a pivot table
An intense checklist for bulletproofing your data:
https://t.ly/datachecklist
What to do with data once you have it
By sstirling
What to do with data once you have it
- 148