From Event to Data Set to Replicant
Perspective, Structure, and the Problem of Representation in Data-Driven Digital History
Sharon M. Leon
Digitorium, October 12, 2019
@sharonmleon
Data-
Driven
Interpretation
Time on the Cross
Data at Capta
Differences in the etymological roots of the terms data and capta make the distinction between constructivist and realist approaches clear. Capta is “taken” actively while data is assumed to be a “given” able to be recorded and observed. From this distinction, a world of differences arises. Humanistic inquiry acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact.
Johanna Drucker, Digital Humanities Quarterly (2011).
The
Events
The Record Creators
Thomas Mulledy, SJ
The 1838 Sale
The Inventory
The
Archives
"The Archive"
The Archival Turn
Duties of an Archivist
- Appraisal and selection
- Collection and arrangement
- Description
- Preservation and access
Traditional Practice
Sir Hilary Jenkinson
T.R. Schellenberger
Critical Archival Studies
critical archival studies broadens the field’s scope beyond an inward, practice-centered orientation and builds a critical stance regarding the role of archives in the production of knowledge and different types of narratives, as well as identity construction.
Michelle Caswell, T-Kay Sangwand, and Ricardo Punzalan, Journal of Critical Library and Information Studies (2017).
The
Data
Sets
The Records
Capturing Data
- Research notes
- Digital imaging
- Transcription
- Structured, rectangular data
Derived Data
- Hand generated from document transcriptions
- Individuals and relationships processed to People with Unique ID, and then de-dupped
- Appearances processed to Events with participants
- Event types: birth, baptism, marriage, death, inventory, health, sale, legal, labor, commerce, conditions, travel, punishment, run away
Research Data
The Enslaved Group
-
1,132 individuals owned by the Jesuits (1717-1840)
-
598 individuals with birth years
-
-
48 enslaved people owned by others
-
34 free Blacks
Relationships in the Records
- 108 inferred partnerships
- 13 sacramental marriages
- 400 identified parental relationship
- 87 baptisms
- 141 births
- 56 deaths indicated
- 26 deaths with specific date
The Data Model
Tidy Data
Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
Hadley Wickham, Journal of Statistical Software, 2014
Documentation
- Variable Choice
- Transcribed
- Controlled Vocabularies
- Imputed Fields
- Calculated Fields
- Provenance
The
Linked
Data
The Semantic Web
-
Use URIs as names for things
-
Use HTTP URIs so that people can look up those names.
-
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
-
Include links to other URIs. so that they can discover more things.
Tim Berners-Lee, “Linked Data,” 2006.
Subject-Predicate-Object
URI-LOD Property-URI
Linked Open Vocabularies
Isaac Hawkins, II
Spouse of
Catherine Harrison
The Replicants
Data Sharing
Lots of Copies Keeps Stuff Safe?
- Proliferation of copies
- Proliferation of variants
- Provenance not embedded
- Syncing system
Functional Requirements for Bibliographic Records
Aggregation
Integration over Aggregation
Data-Driven History
By sharonmleon
Data-Driven History
From Event to Data Set to Replicant: Perspective, Structure, and the Problem of Representation in Data-Driven Digital History
- 361