From Event to Data Set to Replicant

Perspective, Structure, and the Problem of Representation in Data-Driven Digital History

Sharon M. Leon

Collect & Connect, November 23, 2020

@sharonmleon

Data-

Driven

Interpretation

Time on the Cross

Data at Capta

Differences in the etymological roots of the terms data and capta make the distinction between constructivist and realist approaches clear. Capta is “taken” actively while data is assumed to be a “given” able to be recorded and observed. From this distinction, a world of differences arises. Humanistic inquiry acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact.

 

Johanna Drucker, Digital Humanities Quarterly (2011).

The

Events

The Record Creators

Thomas Mulledy, SJ

The 1838 Sale

The Inventory

The

Archives

The Archival Turn

Duties of an Archivist

  • Appraisal and selection
  • Collection and arrangement
  • Description
  • Preservation and access

Traditional Practice

Sir Hilary Jenkinson

T.R. Schellenberger

Critical Archival Studies

critical archival studies broadens the field’s scope beyond an inward, practice-centered orientation and builds a critical stance regarding the role of archives in the production of knowledge and different types of narratives, as well as identity construction.

Michelle Caswell, T-Kay Sangwand, and Ricardo Punzalan, Journal of Critical Library and Information Studies (2017).

The

Data

Sets

The Records

Capturing Data

  • Research notes
  • Digital imaging
  • Transcription
  • Structured, rectangular data

Derived Data

  • Hand generated from document transcriptions
  • Individuals and relationships processed to People with Unique ID, and then de-dupped
  • Appearances processed to Events with participants
  • Event types: birth, baptism, marriage, death, inventory, health, sale, legal, labor, commerce, conditions, travel, punishment, run away

Research Data

The Data Model

Tidy Data

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.

Hadley Wickham, Journal of Statistical Software, 2014

Documentation

  • Variable Choice
  • Transcribed
  • Controlled Vocabularies
  • Imputed Fields
  • Calculated Fields
  • Provenance

The

Linked

Data

The Semantic Web

  1. Use URIs as names for things

  2. Use HTTP URIs so that people can look up those names.

  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

  4. Include links to other URIs. so that they can discover more things.

Tim Berners-Lee, “Linked Data,” 2006.

Subject-Predicate-Object

URI-LOD Property-URI

Linked Open Vocabularies

Isaac Hawkins, II

Spouse of

Catherine Harrison

The Replicants

Data Sharing

Functional Requirements for Bibliographic Records

Aggregation

Integration over Replication