A Complete Frankenstein Variorum:
Bridging digital resources and sharing the theory of edition
DH 2024 Reinvention & Responsibility, Washington, DC
Panel: Unpacking the Past, Building the Future: Navigating the Complexities of Textual Analysis and Editions
8 August 2024, 8:30 - 10am, George Mason U.: Van Metre Hall 121
Link to these slides: https://bit.ly/fv-dh24
Elisa Beshero-Bondar | Raffaele Viglianti | Yuying Jin |
@ebeshero | @raffazizzi | @yuying-jin |
Objectives of the Frankenstein Variorum
-
to “upcycle” and connect previous digital editions of Frankenstein
-
to share a nonlinear, divergent edition history
- to encourage exploration from one edition to the others
Variorum - for tracking and comparing versions
Most immediate context: Darwin Online (ed. Barbara Bordalejo), except...
- Frankenstein Variorum compares five versions (to Darwin Online's six)
- Frankenstein Variorum incorporates MS witnesses
- Frankenstein Variorum integrates earlier digital editions made by others
FV as a Variorum
- Visualizes a collation, or comparison of versions, working with digital editions that were encoded very differently
- Designed as a static website for serendipitous browsing and intensive research
- Applies the TEI in a JavaScript context to store comparison data and pointers to variant passages
James Rieger, ed., first new edition of 1818 in 141
years : inline collation of "Thomas" w/ 1818,
1831 variants in endnotes
Legend:
Stuart Curran and Jack Lynch: PA Electronic Edition (PAEE) , collation of 1818 and 1831: HTML
Nora Crook crit. ed of 1818, variants of "Thomas", 1823, and 1831 in endnotes (P&C MWS collected works)
Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta
1974
~mid-1990s
1996
Charles Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts
2007
Shelley-Godwin Archive publishes diplomatic edition of 1816 ms drafts
print edition
digital edition
Legend:
2013
2017
Critical and Diplomatic Editions Leading to the Frankenstein Variorum Project
Frankenstein Variorum Project begins
assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text. Collation project completed in 2023, Variorum viewer officially launches in 2024.
New digital editions in the FV
- New encoding of the 1823 edition, based on OCR from Google Books
- New preparation of the "Thomas copy"
New digital edition of “1823”
- 1823: prepared by William Godwin, the first published edition bearing the name ”Mary Wollstonecraft Shelley” on the title page
- Carnegie Mellon University librarians prepared OCR of the 1823 edition for our project
- Our XML encoding for 1823 matches that of 1818 and 1831 editions (struc ture of letters, chapters, paragraphs, poems, annotations.)
New digital edition of “Thomas copy”
-
Our edition responds to James Rieger's and Nora Crook's print editions interpreting Mary Shelley's marginalia.
- prepared after EBB's personal consultation with the Huntington Library MS.
-
Added insertions, deletions, + margin-notes to the 1818 edition
- Prepared new XML from 1818 edition, with
<add>
,<del>
,<note>
elements showing Thomas marginalia
Preparing for collation
Collating when the editions are so different (1)
Align and “chunk”
- Best not to collate the entire novel files to prevent severe alignment errors!
- We prepared 33 collation units (or "chunk files") sharing common starting and ending points.
- Edition files of the same chunk are collated together
Collating when the editions are so different (2)
Prescribe rules to direct the machine-assisted collation
- Extensive Python collation script
- to work around differences
- (identify and unite words split around line-endings in S-GA)
-
to identify what features can be ignored/skipped over for collation purposes
- (e.g. markup of pagination, line-by-line encoding in S-GA)
-
to normalize: identify what apparently different features are the same:
<milestone type='paragraph'>
is same as<p>
"&"
is not different from"and"
-
Prescribe output in form of TEI critical apparatus :
- coordinate information on which editions align and what normalized tokens/strings they share at this point.
- (See Parallel Segmentation encoding in TEI Guidelines)
- to work around differences
-
Markup of text structure compared across Variorum:
- Volume (print editions only), letter, chapter
- Paragraph, poetry line-groups and lines
- Notes
- Markup of manuscript events included in Variorum comparison: deletion, insertion, gap
-
Normalizing algorithm:
- Decide what marks are equivalent)
- Ignore but preserve other markup in collation process, also abbreviations, capitalization.
Background image created by the author from a loom on Reddit and the frontispiece illustration of Frankenstein (1831)
CAUTION: Collation of heavily altered documents leads to many tangles and snags.
Completing this project was not possible without students!
Students helped with...
- Exploring the development of contextual annotations (Stephen, Jack and Avery at CMU)
-
Tracking the kinds of errors we would find in the collation in our collationWorkspace (Nate and Rachel)
-
Finding algorithmic ways to debug collation tangles (Mia, Jackie, Nate, and Yuying)
- Defining “long-tokens“ to pull heavily revised passages and long deletions away from the collation machinery! (ask us about this) (Yuying for the win!, with Nate and Rachel)
-
Developing and testing our shell-script to run our postCollation pipeline (Yuyin g)
-
Finalizing the Intnerface in React + Astro (Yuying's senior design project)
- Roll credits: People page on the Variorum website
- Collation projects take much longer to debug than you ever expected
- Correct the input machinery, not the output.
- Minimize brittle hand-correction!
- Work on the pre-processing.
- Refine post-processing to correct output errors!
- Machine-assisted processes need a lot of documentation
- for project sustainability
- for reproducibility of data
- Look for ways to involve students!
- especially undergrads unfamiliar with the tech
- forces clear communication from everyone!
- best way to simplify overly complicated processes
- major skill building for all!
Spine and data coordination
From collation data to spine
-
“Spine” = data model (dynamic nerve plexus?) holding the variorum together
- standoff use of TEI critical apparatus
- coordinates data on variance, including normalized tokens and maximum edit-distance values
- points to specific locations in the variorum edition files
- standoff use of TEI critical apparatus
How do the five editions “stack up” by collation chunk?
Legend
MS
1818
Thm
1823
1831
gaps, alignments, relative string-length for each ”chunk”