The Frankenstein Variorum:

An Orientation to the “Back End” of the Collation

Elisa Beshero-Bondar

Twitter: @epyllia | GitHub: @ebeshero

14 May 2021

Link to these slides: http://bit.ly/fv-backend

A Digital Variorum

Our use of the term: A digital edition that investigates change to a work by comparing distinct versions of it.

Print Publications of Frankenstein in MWS's lifetime:

1818 Edition (3 volumes) published anonymously
1823 Edition (2 volumes) printed by MWS's father William Godwin, the first to include her name as the author.
1831 Edition (1/2 of a volume) extensively revised by MWS, bound with Friedrich von Schiller's The Ghost Seer in Bentley's Standard Series of novels)

Manuscript versions:

Fair copy MS drafts (~1816) at the Bodleian Library, Oxford: Abinger c56, c57, c58
Thomas Copy made sometime between 1818 and 1822: MWS's marginal comments on a print copy of 1818

James Rieger, ed., first new edition of 1818 in 141

years : inline collation of "Thomas" w/ 1818,

1831 variants in endnotes

Legend:

Stuart Curran and Jack Lynch: PA Electronic Edition (PAEE) , collation of 1818 and 1831: HTML

Nora Crook crit. ed of 1818, variants of "Thomas", 1823, and 1831 in endnotes (P&C MWS collected works)

Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta

1974

~mid-1990s

1996

Charles Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts

2007

Shelley-Godwin Archive publishes diplomatic edition of 1816 ms drafts

print edition

digital edition

Legend:

2013

2017

Critical and Diplomatic Editions Leading to the Frankenstein Variorum Project

Frankenstein Variorum Project :

assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text

Who benefits from a holistic digital variorum of Frankenstein?

scholars of British Romanticism and the Shelley-Godwin circle
instructors of 19c literature and science fiction
undergraduate and high school students
fans interested in the history of the novel

None of these readers have seen a comprehensive view of the novel's changes from the manuscript of 1816 to 1831 without expensive and intensive study

Multi-institutional collaboration

Me (Pitt-Greensburg and TEI)
Raffaele Viglianti (MITH: Shelley-Godwin Archive and TEI)
Rikk Mulligan, Jon Klancher and team at Carnegie Mellon University

Can we make an edition that conveniently compares the manuscripts to the print publications?
Can we make a comprehensive collation to show changes to the novel over time, from 1816 to 1831?
- How many versions? (5 and a bit?)
- Which editorial interventions persist from 1816 to 1831?
  - MWS in the "Thomas" copy: how much of this persists into 1831?
  - PBS's additions: which/how many of these persist to 1831?
  - What parts of the novel were most mutable?

Motivating Questions

Working with MS versions

Thomas copy marginalia

prepared new XML from 1818 edition, with <add>, <del>, <note> elements

Shelley-Godwin Archive’s diplomatic edition of the 1816 Notebooks at http://shelleygodwinarchive.org

encoded surface-by-surface, line-by-line
required resequencing to include in the collation (margin notes not included at point of insertion but at the end of each page file)

Shelley-Godwin Archive: sample page surface:

Variorum challenge

1. Make visible and accessible a nonlinear, divergent edition history

1816 notebooks to 1818: uneven (gaps in notebooks)
Thomas divergence:
- copy with margin notes was left in Italy before 1823, apparently not consulted later
1823 edits: largely retained in 1831
1831 major revisions:
- alteration of character relationships, added chapter and several lengthened passages

2. Introduce textual scholarship to students, fans of Frankenstein as well as text scholars, 19c specialists:

Recruit next generation of text scholars
Not marginalizing variants in print model of endnotes/footnotes
Tell the story of Frankenstein’s ”hot” or ”cool” alterations inline

Progress Report

Frankenstein Variorum = my most technically challenging project
One third of the collation is being displayed at https://frankensteinvariorum.github.io/
This (first) portion of the novel gave us the basis for developing the ”spine” data model and the Variorum reader web interface
Future work with my colleagues to complete the project. This includes
- refining my Python machine collation algorithms to complete the variorum.
- UX testing of the navigation and reading interface (and recursive cycles of development)
- publications and documentation of the ”spine” data model for the TEI Guidelines and other projects

Frankenstein Variorum: The Collation Process

Collation is weaving. . .

Threading the machine...

We need to prepare the same consistent “threads”
Make the text streams consistent with each other
collateX = weaving machine
- needs to be able to find where the “threads” run together, and where they diverge
- Markup functions as signals to the collation weaving machine

Guiding the machine reading

Markup is not the same in all of the editions
That's okay, because we mask some of it from collateX
Python script instructs on strings of text that collateX can just skip over
Some of this is markup that is not helpful for comparing the documents

ignore = ['sourceDoc', 'xml', 'comment', 'w', 'mod', 'anchor', 'include', 'delSpan', 
          'addSpan', 'add', 'handShift', 'damage', 'restore', 'zone', 'surface',
          'graphic', 'unclear', 'retrace', 'damage', 'restore']

inlineEmpty = ['pb', 'lb', 'gap', 'del', 'p', 'div', 'milestone', 'lg', 'l',
               'note', 'cit', 'quote', 'bibl', 'ab', 'hi', 'head']
# 2018-05-12 ebb: I'm setting a white space on either side of the inlineEmpty elements in line 76
# 2018-07-20: ebb: CHECK: are there white spaces on either side of empty elements in the output?

inlineContent = ['metamark', 'mdel', 'shi']

In the Python script:

creating lists of XML element names for special treatment

Ignored whole elements need to be screened out of the collation entirely.

Other whole elements need to be preserved.

XML Pulldom library helps us with special handling of XML elements.

The “base markup” for collation

print books: early old simple structural markup for 1818 and 1831 eds.
MS versions
- Thomas copy prepped from 1818 files with some simple manuscript coding of handwritten insertions, deletions, and notes
- 1816 MS Notebook: It's complicated...

PA Electronic Edition (mid 1990s): 1818 vs 1831

started from base HTML 1.0 files
up-converted to clean, simple XML
- ”on its way” to TEI (structural elements in text)
- prepared for machine-assisted collation (via CollateX): including element tags
- deep hierarchy of novel ”flattened” to milestones: <div type="volume"/>, <p/>, etc.
corrected against photofacsimiles of 1818 and 1831 print publications

Prepared from OCR new XML of 1823 edition

prepared by William Godwin, the first edition bearing the name ”Mary Wollstonecraft Shelley” on the title page
XML syntax matches that of 1818 and 1831 editions

Added new XML: Thomas Copy

Added insertions, deletions, + margin-notes to 1818 edition
checked against and respond to/update Nora Crook and James Reiger editions

S-GA: resequenced / compressed for collation

<surface lrx="3847" lry="5342" 
partOf="#ox-frankenstein_volume_i" 
ulx="0" uly="0" folio="21r" shelfmark="MS. Abinger c. 56" base="ox-ms_abinger_c56/ox-ms_abinger_c56-0045.xml" 
id="ox-ms_abinger_c56-0045" sID="ox-ms_abinger_c56-0045"/>
      <graphic url="http://shelleygodwinarchive.org/images/ox/ms_abinger_c56/ms_abinger_c56-0045.jp2"/>
      <zone type="main" sID="c56-0045__main"/> 

<lb n="c56-0045__main__17"/> 
         <del rend="strikethrough" sID="c56-0045__main__d2e9811"/>But how<del eID="c56-0045__main__d2e9811"/> How can I describe
      my <lb n="c56-0045__main__18"/> emotion at this catastrophe; or how 

<w ana="start"/>deli<lb n="c56-0045__main__19"/>neate<w ana="end"/> 

the wretch whom with such <lb n="c56-0045__main__20"/> infinite pains and care I had endeavoured <lb n="c56-0045__main__21"/> to form. His limbs were in proportion <lb n="c56-0045__main__22"/> and I had selected his features <del rend="strikethrough" sID="c56-0045__main__d2e9830"/>h<del eID="c56-0045__main__d2e9830"/> as <lb n="c56-0045__main__23"/> 
         <mod sID="c56-0045__main__d2e9835"/>
            <del rend="strikethrough" sID="c56-0045__main__d2e9837"/>handsome<del eID="c56-0045__main__d2e9837"/>
            <mdel>.</mdel>
            <anchor xml:id="c56-0045.01"/>
            <zone corresp="#c56-0045.01" type="left_margin" sID="c56-0045__left_margin"/> 
               <lb n="c56-0045__left_margin__1"/> 
               <add sID="c56-0045__left_margin__d2e9849"/>
                  <mod sID="c56-0045__left_margin__d2e9851"/>
                     <del rend="strikethrough" sID="c56-0045__left_margin__d2e9853"/>handsome<del eID="c56-0045__left_margin__d2e9853"/>
                     <add hand="#pbs" place="superlinear" sID="c56-0045__left_margin__d2e9856"/>beautiful.<add eID="c56-0045__left_margin__d2e9856"/>
                  <mod eID="c56-0045__left_margin__d2e9851"/>
               <add eID="c56-0045__left_margin__d2e9849"/>
            <zone eID="c56-0045__left_margin"/>
         <mod eID="c56-0045__main__d2e9835"/>
         <mod sID="c56-0045__main__d2e9863"/>
            <del rend="strikethrough" sID="c56-0045__main__d2e9865"/>Handsome<del eID="c56-0045__main__d2e9865"/>
            <add hand="#pbs" place="superlinear" sID="c56-0045__main__d2e9868"/>Beautiful<add eID="c56-0045__main__d2e9868"/>
         <mod eID="c56-0045__main__d2e9863"/>; Great God! His <lb n="c56-0045__main__24"/>

added word boundary markup to indicate whole words spanning lines
resequenced margin zone content: (followed S-GA's pointers to represent semantic reading order for collation)

Gothenburg model : algorithm for computer-assisted collation, developed in 2009 workshop of collateX and Juxta developers.

Tokenization :
- Break down the smallest unit of comparison: (words--with punctuation, or character-by-character): FV tokenizes words and includes punctuation
Normalization
- ('&' = 'and')
Alignment
- Identify comparable divergence: what makes text sequences comparable units?
- “Chunking” text into comparable passages (chapters/paragraphs that line up with identifiable start and end points). Collation proceeds chunk by chunk.
Analysis
- (study output, correct, and re-align after machine process, AND refine automated processing)
Visualization
- critical edition apparatus, graph displays

Computer-aided collation: Gothenburg Model

A ”panpipe” view of Frankenstein’s five versions

Legend

1818

Thm

1823

1831

Alignments, gaps, and comparative lengths of each collation unit

chapter heading or other structural boundary

Working with the Output of Collation

Making a “stand-off“ Spine (info + pointers to collation data)
Generating the edition files with collation data marked “inline”

“Spine 2” by Buzz Spector:

polaroid of 33 books aligned at the spines, one per human vertebra

Thinking about a spine for a variorum edition. . .

express a holistic view structured according to variant locations
serve as ”nerve plexus” of data pointers for dynamic coordination of multiple editions

built up from computer-aided collation

Constructing a spine for The Frankenstein Variorum

“Spine” data model = standoff use of TEI critical apparatus:
- coordinates data on variance: piece by piece (vertebra) which specific passages line up and where they differ.
- can include processed data, like maximum edit-distance, at each location
- can include data on normalization: e.g. normalized tokens used in collation process
- points to specific locations in separate edition files

”Heatmap” view, showing variation intensity as blocks with circles color-coded by edition. Selecting a circle on the heatmap view displays the edition and its variants.

The Variorum Viewer and Its Options for Display

The visitor chooses an edition to read and a section aligned with the other editions, in this case the 1818 in section 10. Sections are usually chapter boundaries.

Variant passages are highlighted based on a three-part scale of intensity defined by maximum edit distance of any version from the others at this point. The darker the shade, the greater the divergence from at least one of the other editions. The colored dot beneath a passage indicates which edition(s) hold a variant at this location, following the legend provided above.

The presence of a number with a manicule indicates here that two contextual annotations are available (as shown below). These annotations were written by a team of scholars to offer commentary on content in this paragraph.

Selecting a variant passage opens a panel to show how all the editions read at this point. Contextual annotations (signalled by the manicule) would open in the same space as this variant display panel, so the two are not currently displayed together. The visitor may choose which to view.

A heavily revised passage, showing the MS notebook view

Selecting a manicule symbol reveals a contextual annotation on a passage. Such annotations often highlight an especially significant revision that affects our view of the characters, as with the one highlighted here.

Viewing a contextual annotation

Where to find our files

Frankenstein Variorum GitHub Organization
Collation work:
- fv-collation repo: https://github.com/FrankensteinVariorum/fv-collation
  - collatexPrep directory
    - collChunks and collChunkFrags directories:
      - contain “chunked“ files with parallel names and prepped aligned “flattened“ markup, intended to be read together by collateX
    - python subdirectory: python scripts for preparing sets of “chunk“ files to be “read“ by collateX
      - collatex directory: contains collatex installation, (which Dr. B needs to update!)
    - ..._xmlOutput directories: contain aligned collation outputs for all parallel chunks read together. One output file for every set of input "chunk" files.

Post-processing pipeline

Output files are copied into the post-collation repo:
https://github.com/FrankensteinVariorum/fv-postCollation
Files in the postColl-workspace/P\d-output/ directories are corrected, finalized versions determined to be ready for publishing in the Variorum digital edition
Prior to getting here, we need to:
- try to ensure accuracy of collation markup!
- check and correct by:
  - correcting the Python script feeding the collation to handle repeated kinds of problems
  - correcting by hand the errors that are difficult to handle programmatically

Revisiting the Gothenberg Model

Error correction is a cyclical process!

We need to:

Inspect our collation results
Understand and categorize the kinds of errors we see
Revise the Python script
Re-run the collation
Re-inspect the results
Decide when it's okay to hand-correct
Minimize hand-correction as much as possible

Frankenstein Variorum: Orientation to the "Back End"

By Elisa Beshero-Bondar

Frankenstein Variorum: Orientation to the "Back End"

An orientation to the collation preparation work of the Frankenstein Variorum project.

2,249

Elisa Beshero-Bondar PRO

Professor of Digital Humanities and Chair of the Digital Media, Arts, and Technology Program at Penn State Erie, The Behrend College.

The Frankenstein Variorum:

An Orientation to the “Back End” of the Collation

A Digital Variorum

Print Publications of Frankenstein in MWS's lifetime:

Manuscript versions:

Critical and Diplomatic Editions Leading to the Frankenstein Variorum Project

Who benefits from a holistic digital variorum of Frankenstein?

Multi-institutional collaboration

Motivating Questions

Working with MS versions

Thomas copy marginalia

Shelley-Godwin Archive: sample page surface:

Variorum challenge

Progress Report

Frankenstein Variorum: The Collation Process

Collation is weaving. . .

Threading the machine...

Guiding the machine reading

In the Python script:

The “base markup” for collation

PA Electronic Edition (mid 1990s): 1818 vs 1831

Added new XML: Thomas Copy

S-GA: resequenced / compressed for collation

Computer-aided collation: Gothenburg Model

A ”panpipe” view of Frankenstein’s five versions

Working with the Output of Collation

Thinking about a spine for a variorum edition. . .

Constructing a spine for The Frankenstein Variorum

Where to find our files

Post-processing pipeline

Revisiting the Gothenberg Model

Error correction is a cyclical process!

We need to:

Frankenstein Variorum: Orientation to the "Back End"

More from Elisa Beshero-Bondar