APP Data Training Extravaganza

What to do with data once you have it

Clean data is good data

But all data has a human imprint.

 

Humans make a lot of misteaks 

 

 

Open Refine is a free, powerful, safe data cleaning program

Gannett considers it too dangerous to download. It is not

PDFs are a part of data life

There are tools that can help liberate data from them

PDF check

Is it an image? Or can you highlight text? 

 

If you can highlight text, you can extract information more easily.

 

If your data includes tables, Adobe has a free tool that can pull it into Microsoft Excel:

PDF Image? You need more power 

(And patience)

These tools were developed specifically for journalists 

Have a boatload of PDFS? Try Pinpoint to organize, categorize

Cleaning data in a spreadsheet

Data cleaning basics

  • Fix misspellings

  • Standardize categories 

  • Insure number columns are all numbers

  • Trim extra space

Class participation!

https://t.ly/appdata1

GO HERE:

The good stuff is always at the bottom

Copy everything after "metadata" for Mount Holly

Copy it into a spreadsheet.

Find the same data for New York, NY. Copy it directly below Mont Holly.

Select "Text to Columns" under the DATA tab, then...

Select "Delimited", and click next

Select "Comma" and click Finish

Hooray! Now we have structured data

Now create a header column.

Find and Replace

Use it to replace errors in the data en masse, like spelling mistakes

Remove unwanted characters

Filter is your friend

Search for irregularities in each column 

 

Fixing these types of errors allows you to analyze your data

Use "trim" to ditch white space

A lot of databases have extra white space hiding in cells. 

 

Type =TRIM(NameOfCell) to edit out the extra space. 

Data not how you like? Try a pivot table

An intense checklist for bulletproofing your data: 

 

https://t.ly/datachecklist

Made with Slides.com