Working With Data

All historians work with data! Written text is unstructured data, but some data is structured like this account book of expenses (like a credit card bill).

 

Historians are very good at finding patterns and relationships. Working with data is all about finding patterns and relationships.

Understand your data 

Content model: how the data is organized

Table:

the whole spreadsheet

Columns contain one

content type: the individual parts of the data

Rows contain one record: an individual unit described by the content types

Data best practices

  • Keep an original, clean version
  • Keep a backup off-site
  • Keep your data in a non-proprietary format (csv good, Excel and Access bad)
  • One sheet, one purpose
  • One column, one purpose
  • Zero is not the same as absent
  • Formatting is not data
  • Break up dates

BAD

GOOD

Content type

Discrete: a category or type of thing. Gender, occupation, educational level, and race are all discrete data because there are separate sub-types which do not overlap

 

Continuous: a spectrum which is connected.  Dates, ages, counts, and money are all continuous data because data can fall on any arbitrary point of a spectrum.

Right Tool for the Right Job

 

Spreadsheets: good for entry, bad for cleaning and analysis

 

OpenRefine: good for cleaning, bad for entry and analysis

 

Tableau: good for analysis, bad for entry and cleaning

 

Python: good for analysis and cleaning, bad for visualization

 

D3: good for visualization, bad for entry and analysis

 

Open up Sir William Johnson's 1755-1756 expenses!

 

Handy shortcuts: ctrl + down / cmd + down to navigate

 

View or Window > Freeze to keep one row or column at the top of your sheet

Math time!

Ranges

G2:G9

A9:J9

Important formulas

=sum( range ) : total up the range

=average ( range ) : find the average of the range

=median( range ) : find the median of the range

=counta( range ) : count the number of items in the range

=countunique( range ) : count the number of unique items in the range (if "cat" occurs three times in a list, it will only be counted once"

=countif( range,"string") : count the number of items that match the string

=countif('Sheet1'!range, "string") : reference another sheet

When you're done

Share a link to your spreadsheet using the green "Share" button and set it to Anyone > Commenter

Share this link on in the comments to Module 1