The Frankenstein Variorum:
An Orientation to the “Back End” of the Collation
Twitter: @epyllia | GitHub: @ebeshero
14 May 2021
Link to these slides: http://bit.ly/fv-backend
A Digital Variorum
Our use of the term: A digital edition that investigates change to a work by comparing distinct versions of it.
Print Publications of Frankenstein in MWS's lifetime:
1818 Edition (3 volumes) published anonymously
1823 Edition (2 volumes) printed by MWS's father William Godwin, the first to include her name as the author.
1831 Edition (1/2 of a volume) extensively revised by MWS, bound with Friedrich von Schiller's The Ghost Seer in Bentley's Standard Series of novels)
- Fair copy MS drafts (~1816) at the Bodleian Library, Oxford: Abinger c56, c57, c58
Thomas Copy made sometime between 1818 and 1822: MWS's marginal comments on a print copy of 1818
Can we make an edition that conveniently compares the manuscripts to the print publications?
Can we make a comprehensive collation to show changes to the novel over time, from 1816 to 1831?
How many versions? (5 and a bit?)
Which editorial interventions persist from 1816 to 1831?
MWS in the "Thomas" copy: how much of this persists into 1831?
PBS's additions: which/how many of these persist to 1831?
What parts of the novel were most mutable?
Working with MS versions
Thomas copy marginalia
- prepared new XML from 1818 edition, with <add>, <del>, <note> elements
Shelley-Godwin Archive’s diplomatic edition of the 1816 Notebooks at http://shelleygodwinarchive.org
- encoded surface-by-surface, line-by-line
- required resequencing to include in the collation (margin notes not included at point of insertion but at the end of each page file)
Shelley-Godwin Archive: sample page surface:
1. Make visible and accessible a nonlinear, divergent edition history
- 1816 notebooks to 1818: uneven (gaps in notebooks)
- Thomas divergence:
- copy with margin notes was left in Italy before 1823, apparently not consulted later
- 1823 edits: largely retained in 1831
- 1831 major revisions:
- alteration of character relationships, added chapter and several lengthened passages
2. Introduce textual scholarship to students, fans of Frankenstein as well as text scholars, 19c specialists:
- Recruit next generation of text scholars
- Not marginalizing variants in print model of endnotes/footnotes
- Tell the story of Frankenstein’s ”hot” or ”cool” alterations inline
Frankenstein Variorum = my most technically challenging project
- One third of the collation is being displayed at https://frankensteinvariorum.github.io/
- This (first) portion of the novel gave us the basis for developing the ”spine” data model and the Variorum reader web interface
- Future work with my colleagues to complete the project. This includes
- refining my Python machine collation algorithms to complete the variorum.
- UX testing of the navigation and reading interface (and recursive cycles of development)
- publications and documentation of the ”spine” data model for the TEI Guidelines and other projects
Frankenstein Variorum: The Collation Process
Collation is weaving. . .
Threading the machine...
- We need to prepare the same consistent “threads”
- Make the text streams consistent with each other
- collateX = weaving machine
- needs to be able to find where the “threads” run together, and where they diverge
- Markup functions as signals to the collation weaving machine
Guiding the machine reading
- Markup is not the same in all of the editions
- That's okay, because we mask some of it from collateX
- Python script instructs on strings of text that collateX can just skip over
- Some of this is markup that is not helpful for comparing the documents
ignore = ['sourceDoc', 'xml', 'comment', 'w', 'mod', 'anchor', 'include', 'delSpan', 'addSpan', 'add', 'handShift', 'damage', 'restore', 'zone', 'surface', 'graphic', 'unclear', 'retrace', 'damage', 'restore'] inlineEmpty = ['pb', 'lb', 'gap', 'del', 'p', 'div', 'milestone', 'lg', 'l', 'note', 'cit', 'quote', 'bibl', 'ab', 'hi', 'head'] # 2018-05-12 ebb: I'm setting a white space on either side of the inlineEmpty elements in line 76 # 2018-07-20: ebb: CHECK: are there white spaces on either side of empty elements in the output? inlineContent = ['metamark', 'mdel', 'shi']
In the Python script:
creating lists of XML element names for special treatment
Ignored whole elements need to be screened out of the collation entirely.
Other whole elements need to be preserved.
XML Pulldom library helps us with special handling of XML elements.
The “base markup” for collation
- print books: early old simple structural markup for 1818 and 1831 eds.
- MS versions
- Thomas copy prepped from 1818 files with some simple manuscript coding of handwritten insertions, deletions, and notes
- 1816 MS Notebook: It's complicated...
PA Electronic Edition (mid 1990s): 1818 vs 1831
- started from base HTML 1.0 files
up-converted to clean, simple XML
- ”on its way” to TEI (structural elements in text)
- prepared for machine-assisted collation (via CollateX): including element tags
- deep hierarchy of novel ”flattened” to milestones: <div type="volume"/>, <p/>, etc.
- corrected against photofacsimiles of 1818 and 1831 print publications
Prepared from OCR new XML of 1823 edition
- prepared by William Godwin, the first edition bearing the name ”Mary Wollstonecraft Shelley” on the title page
- XML syntax matches that of 1818 and 1831 editions
Added new XML: Thomas Copy
- Added insertions, deletions, + margin-notes to 1818 edition
- checked against and respond to/update Nora Crook and James Reiger editions
S-GA: resequenced / compressed for collation
<surface lrx="3847" lry="5342" partOf="#ox-frankenstein_volume_i" ulx="0" uly="0" folio="21r" shelfmark="MS. Abinger c. 56" base="ox-ms_abinger_c56/ox-ms_abinger_c56-0045.xml" id="ox-ms_abinger_c56-0045" sID="ox-ms_abinger_c56-0045"/> <graphic url="http://shelleygodwinarchive.org/images/ox/ms_abinger_c56/ms_abinger_c56-0045.jp2"/> <zone type="main" sID="c56-0045__main"/> <lb n="c56-0045__main__17"/> <del rend="strikethrough" sID="c56-0045__main__d2e9811"/>But how<del eID="c56-0045__main__d2e9811"/> How can I describe my <lb n="c56-0045__main__18"/> emotion at this catastrophe; or how <w ana="start"/>deli<lb n="c56-0045__main__19"/>neate<w ana="end"/> the wretch whom with such <lb n="c56-0045__main__20"/> infinite pains and care I had endeavoured <lb n="c56-0045__main__21"/> to form. His limbs were in proportion <lb n="c56-0045__main__22"/> and I had selected his features <del rend="strikethrough" sID="c56-0045__main__d2e9830"/>h<del eID="c56-0045__main__d2e9830"/> as <lb n="c56-0045__main__23"/> <mod sID="c56-0045__main__d2e9835"/> <del rend="strikethrough" sID="c56-0045__main__d2e9837"/>handsome<del eID="c56-0045__main__d2e9837"/> <mdel>.</mdel> <anchor xml:id="c56-0045.01"/> <zone corresp="#c56-0045.01" type="left_margin" sID="c56-0045__left_margin"/> <lb n="c56-0045__left_margin__1"/> <add sID="c56-0045__left_margin__d2e9849"/> <mod sID="c56-0045__left_margin__d2e9851"/> <del rend="strikethrough" sID="c56-0045__left_margin__d2e9853"/>handsome<del eID="c56-0045__left_margin__d2e9853"/> <add hand="#pbs" place="superlinear" sID="c56-0045__left_margin__d2e9856"/>beautiful.<add eID="c56-0045__left_margin__d2e9856"/> <mod eID="c56-0045__left_margin__d2e9851"/> <add eID="c56-0045__left_margin__d2e9849"/> <zone eID="c56-0045__left_margin"/> <mod eID="c56-0045__main__d2e9835"/> <mod sID="c56-0045__main__d2e9863"/> <del rend="strikethrough" sID="c56-0045__main__d2e9865"/>Handsome<del eID="c56-0045__main__d2e9865"/> <add hand="#pbs" place="superlinear" sID="c56-0045__main__d2e9868"/>Beautiful<add eID="c56-0045__main__d2e9868"/> <mod eID="c56-0045__main__d2e9863"/>; Great God! His <lb n="c56-0045__main__24"/>
- added word boundary markup to indicate whole words spanning lines
- resequenced margin zone content: (followed S-GA's pointers to represent semantic reading order for collation)
Gothenburg model : algorithm for computer-assisted collation, developed in 2009 workshop of collateX and Juxta developers.
Break down the smallest unit of comparison: (words--with punctuation, or character-by-character): FV tokenizes words and includes punctuation
('&' = 'and')
Identify comparable divergence: what makes text sequences comparable units?
“Chunking” text into comparable passages (chapters/paragraphs that line up with identifiable start and end points). Collation proceeds chunk by chunk.
(study output, correct, and re-align after machine process, AND refine automated processing)
critical edition apparatus, graph displays
Computer-aided collation: Gothenburg Model
A ”panpipe” view of Frankenstein’s five versions
Alignments, gaps, and comparative lengths of each collation unit
chapter heading or other structural boundary
Working with the Output of Collation
Making a “stand-off“ Spine (info + pointers to collation data)
Generating the edition files with collation data marked “inline”
“Spine 2” by Buzz Spector:
polaroid of 33 books aligned at the spines, one per human vertebra
Thinking about a spine for a variorum edition. . .
express a holistic view structured according to variant locations
- serve as ”nerve plexus” of data pointers for dynamic coordination of multiple editions
- built up from computer-aided collation
Constructing a spine for The Frankenstein Variorum
“Spine” data model = standoff use of TEI critical apparatus:
- coordinates data on variance: piece by piece (vertebra) which specific passages line up and where they differ.
- can include processed data, like maximum edit-distance, at each location
- can include data on normalization: e.g. normalized tokens used in collation process
- points to specific locations in separate edition files
”Heatmap” view, showing variation intensity as blocks with circles color-coded by edition. Selecting a circle on the heatmap view displays the edition and its variants.
The Variorum Viewer and Its Options for Display
The visitor chooses an edition to read and a section aligned with the other editions, in this case the 1818 in section 10. Sections are usually chapter boundaries.
Variant passages are highlighted based on a three-part scale of intensity defined by maximum edit distance of any version from the others at this point. The darker the shade, the greater the divergence from at least one of the other editions. The colored dot beneath a passage indicates which edition(s) hold a variant at this location, following the legend provided above.
The presence of a number with a manicule indicates here that two contextual annotations are available (as shown below). These annotations were written by a team of scholars to offer commentary on content in this paragraph.
Selecting a variant passage opens a panel to show how all the editions read at this point. Contextual annotations (signalled by the manicule) would open in the same space as this variant display panel, so the two are not currently displayed together. The visitor may choose which to view.
A heavily revised passage, showing the MS notebook view
Selecting a manicule symbol reveals a contextual annotation on a passage. Such annotations often highlight an especially significant revision that affects our view of the characters, as with the one highlighted here.
Viewing a contextual annotation
Where to find our files
- Frankenstein Variorum GitHub Organization
- Collation work:
fv-collation repo: https://github.com/FrankensteinVariorum/fv-collation
collChunks and collChunkFrags directories:
- contain “chunked“ files with parallel names and prepped aligned “flattened“ markup, intended to be read together by collateX
python subdirectory: python scripts for preparing sets of “chunk“ files to be “read“ by collateX
- collatex directory: contains collatex installation, (which Dr. B needs to update!)
- ..._xmlOutput directories: contain aligned collation outputs for all parallel chunks read together. One output file for every set of input "chunk" files.
- collChunks and collChunkFrags directories:
- collatexPrep directory
- fv-collation repo: https://github.com/FrankensteinVariorum/fv-collation
- Output files are copied into the post-collation repo:
- Files in the postColl-workspace/P\d-output/ directories are corrected, finalized versions determined to be ready for publishing in the Variorum digital edition
Prior to getting here, we need to:
- try to ensure accuracy of collation markup!
check and correct by:
- correcting the Python script feeding the collation to handle repeated kinds of problems
- correcting by hand the errors that are difficult to handle programmatically
Revisiting the Gothenberg Model
Error correction is a cyclical process!
We need to:
- Inspect our collation results
- Understand and categorize the kinds of errors we see
- Revise the Python script
- Re-run the collation
- Re-inspect the results
- Decide when it's okay to hand-correct
- Minimize hand-correction as much as possible
Frankenstein Variorum: Orientation to the "Back End"
By Elisa Beshero-Bondar