Working With Data
All historians work with data! Written text is unstructured data, but some data is structured like this account book of expenses (like a credit card bill).
Historians are very good at finding patterns and relationships. Working with data is all about finding patterns and relationships.
Understand your data
Content model: how the data is organized
Table:
the whole spreadsheet
Columns contain one
content type: the individual parts of the data
Rows contain one record: an individual unit described by the content types
Data best practices
- Keep an original, clean version
- Keep a backup off-site
- Keep your data in a non-proprietary format (csv good, Excel and Access bad)
- One sheet, one purpose
- One column, one purpose
- Zero is not the same as absent
- Formatting is not data
- Break up dates
BAD
GOOD
Content type
Discrete: a category or type of thing. Gender, occupation, educational level, and race are all discrete data because there are separate sub-types which do not overlap
Continuous: a spectrum which is connected. Dates, ages, counts, and money are all continuous data because data can fall on any arbitrary point of a spectrum.
Right Tool for the Right Job
Spreadsheets: good for entry, bad for cleaning and analysis
OpenRefine: good for cleaning, bad for entry and analysis
Tableau: good for analysis, bad for entry and cleaning
Python: good for analysis and cleaning, bad for visualization
D3: good for visualization, bad for entry and analysis
Open up Sir William Johnson's 1755-1756 expenses!
Handy shortcuts: ctrl + down / cmd + down to navigate
View or Window > Freeze to keep one row or column at the top of your sheet
Math time!
Ranges
G2:G9
A9:J9
Important formulas
=sum( range ) : total up the range
=average ( range ) : find the average of the range
=median( range ) : find the median of the range
=counta( range ) : count the number of items in the range
=countunique( range ) : count the number of unique items in the range (if "cat" occurs three times in a list, it will only be counted once"
=countif( range,"string") : count the number of items that match the string
=countif('Sheet1'!range, "string") : reference another sheet
When you're done
Share a link to your spreadsheet using the green "Share" button and set it to Anyone > Commenter
Share this link on in the comments to Module 1
596 Working with Data 1
By mkane
596 Working with Data 1
- 1,545