Elisa Beshero-Bondar
Twitter: @epyllia | GitHub: @ebeshero
Presentation for ATNU (Animating Text Newcastle University)
14 June 2022
Link to these slides: https://bit.ly/fv-atnu
Our use of the term: A digital edition that investigates change to a work by comparing distinct versions of it.
1818 Edition (3 volumes) published anonymously
1823 Edition (2 volumes) printed by MWS's father William Godwin, the first to include her name as the author.
1831 Edition (1/2 of a volume) extensively revised by MWS, bound with Friedrich von Schiller's The Ghost Seer in Bentley's Standard Series of novels)
Thomas Copy made sometime between 1818 and 1822: MWS's marginal comments on a print copy of 1818
James Rieger, ed., first new edition of 1818 in 141
years : inline collation of "Thomas" w/ 1818,
1831 variants in endnotes
Legend:
Stuart Curran and Jack Lynch: PA Electronic Edition (PAEE) , collation of 1818 and 1831: HTML
Nora Crook crit. ed of 1818, variants of "Thomas", 1823, and 1831 in endnotes (P&C MWS collected works)
Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta
1974
~mid-1990s
1996
Charles Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts
2007
Shelley-Godwin Archive publishes diplomatic/documentary edition of 1816 ms drafts
print edition
digital edition
Legend:
2013
2017
Frankenstein Variorum Project :
assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text
None of these readers have seen a comprehensive view of the novel's changes from the manuscript of 1816 to 1831 without expensive and intensive study
Can we make an edition that conveniently compares the manuscripts to the print publications?
Can we make a comprehensive collation to show changes to the novel over time, from 1816 to 1831?
How many versions? (5 and a bit?)
Which editorial interventions persist from 1816 to 1831?
MWS in the "Thomas" copy: how much of this persists into 1831?
PBS's additions: which/how many of these persist to 1831?
What parts of the novel were most mutable?
Shelley-Godwin Archive’s diplomatic edition of the 1816 Notebooks at http://shelleygodwinarchive.org
1. Make visible and accessible a nonlinear, divergent edition history
2. Introduce textual scholarship to students, fans of Frankenstein as well as text scholars, 19c specialists:
ignore = ['sourceDoc', 'xml', 'comment', 'w', 'mod', 'anchor', 'include', 'delSpan',
'addSpan', 'add', 'handShift', 'damage', 'restore', 'zone', 'surface',
'graphic', 'unclear', 'retrace', 'damage', 'restore']
inlineEmpty = ['pb', 'lb', 'gap', 'del', 'p', 'div', 'milestone', 'lg', 'l',
'note', 'cit', 'quote', 'bibl', 'ab', 'hi', 'head']
# 2018-05-12 ebb: I'm setting a white space on either side of the inlineEmpty elements in line 76
# 2018-07-20: ebb: CHECK: are there white spaces on either side of empty elements in the output?
inlineContent = ['metamark', 'mdel', 'shi']
creating lists of XML element names for special treatment
Ignored whole elements need to be screened out of the collation entirely.
Other whole elements need to be preserved.
XML Pulldom library helps us with special handling of XML elements.
Prepared from OCR new XML of 1823 edition
<surface lrx="3847" lry="5342"
partOf="#ox-frankenstein_volume_i"
ulx="0" uly="0" folio="21r" shelfmark="MS. Abinger c. 56" base="ox-ms_abinger_c56/ox-ms_abinger_c56-0045.xml"
id="ox-ms_abinger_c56-0045" sID="ox-ms_abinger_c56-0045"/>
<graphic url="http://shelleygodwinarchive.org/images/ox/ms_abinger_c56/ms_abinger_c56-0045.jp2"/>
<zone type="main" sID="c56-0045__main"/>
<lb n="c56-0045__main__17"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9811"/>But how<del eID="c56-0045__main__d2e9811"/> How can I describe
my <lb n="c56-0045__main__18"/> emotion at this catastrophe; or how
<w ana="start"/>deli<lb n="c56-0045__main__19"/>neate<w ana="end"/>
the wretch whom with such <lb n="c56-0045__main__20"/> infinite pains and care I had endeavoured <lb n="c56-0045__main__21"/> to form. His limbs were in proportion <lb n="c56-0045__main__22"/> and I had selected his features <del rend="strikethrough" sID="c56-0045__main__d2e9830"/>h<del eID="c56-0045__main__d2e9830"/> as <lb n="c56-0045__main__23"/>
<mod sID="c56-0045__main__d2e9835"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9837"/>handsome<del eID="c56-0045__main__d2e9837"/>
<mdel>.</mdel>
<anchor xml:id="c56-0045.01"/>
<zone corresp="#c56-0045.01" type="left_margin" sID="c56-0045__left_margin"/>
<lb n="c56-0045__left_margin__1"/>
<add sID="c56-0045__left_margin__d2e9849"/>
<mod sID="c56-0045__left_margin__d2e9851"/>
<del rend="strikethrough" sID="c56-0045__left_margin__d2e9853"/>handsome<del eID="c56-0045__left_margin__d2e9853"/>
<add hand="#pbs" place="superlinear" sID="c56-0045__left_margin__d2e9856"/>beautiful.<add eID="c56-0045__left_margin__d2e9856"/>
<mod eID="c56-0045__left_margin__d2e9851"/>
<add eID="c56-0045__left_margin__d2e9849"/>
<zone eID="c56-0045__left_margin"/>
<mod eID="c56-0045__main__d2e9835"/>
<mod sID="c56-0045__main__d2e9863"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9865"/>Handsome<del eID="c56-0045__main__d2e9865"/>
<add hand="#pbs" place="superlinear" sID="c56-0045__main__d2e9868"/>Beautiful<add eID="c56-0045__main__d2e9868"/>
<mod eID="c56-0045__main__d2e9863"/>; Great God! His <lb n="c56-0045__main__24"/>
Gothenburg model : algorithm for computer-assisted collation, developed in 2009 workshop of collateX and Juxta developers.
Tokenization :
Break down the smallest unit of comparison: (words--with punctuation, or character-by-character):
FV tokenizes words and includes punctuation and tags:
'<del>the', 'frame', 'on', 'whic<del>', 'my', 'man', 'completeed,.'
Normalization
'&' = 'and'
<p xml:id="novel1_letter4_div4_p2"> = <p/>
Alignment
Identify comparable divergence: what makes text sequences comparable units?
“Chunking” text into comparable passages (chapters/paragraphs that line up with identifiable start and end points). Collation proceeds chunk by chunk.
Analysis
Study output, correct, and re-align after machine process, AND refine automated processing
Visualization:
Critical edition interface, graph displays
Legend
MS
1818
Thm
1823
1831
Alignments, gaps, and comparative lengths of each collation unit
chapter heading or other structural boundary
Mouse over a black box...
Making a “stand-off“ Spine (info + pointers to collation data)
Generating the edition files with collation data marked “inline”
“Spine 2” by Buzz Spector:
polaroid of 33 books aligned at the spines, one per human vertebra
”Heatmap” view, showing variation intensity as blocks with circles color-coded by edition. Selecting a circle on the heatmap view displays the edition and its variants.
The Variorum Viewer and Its Options for Display
The visitor chooses an edition to read and a section aligned with the other editions, in this case the 1818 in section 10. Sections are usually chapter boundaries.
Variant passages are highlighted based on a three-part scale of intensity defined by maximum edit distance of any version from the others at this point. The darker the shade, the greater the divergence from at least one of the other editions. The colored dot beneath a passage indicates which edition(s) hold a variant at this location, following the legend provided above.
The presence of a number with a manicule indicates here that two contextual annotations are available (as shown below). These annotations were written by a team of scholars to offer commentary on content in this paragraph.
Selecting a variant passage opens a panel to show how all the editions read at this point. Contextual annotations (signalled by the manicule) would open in the same space as this variant display panel, so the two are not currently displayed together. The visitor may choose which to view.
A heavily revised passage, showing the MS notebook view
Selecting a manicule symbol reveals a contextual annotation on a passage. Such annotations often highlight an especially significant revision that affects our view of the characters, as with the one highlighted here.
Viewing a contextual annotation
<app>
<rdgGrp n="['spot,', 'and', 'endeavoured,']">
<rdg wit="f1818">spot, and endeavoured, </rdg>
<rdg wit="f1823">spot, and endeavoured, </rdg>
<rdg wit="fThomas">spot, and endeavoured, </rdg>
<rdg wit="f1831">spot, and endeavoured, </rdg>
</rdgGrp>
<rdgGrp n="['spotand', 'endeavoured']">
<rdg wit="fMS">spot& endeavoured </rdg>
</rdgGrp>
</app>
. . . the spot<add eID="c57-0117__main__d3e21951"/> & endeavoured . . .
In the fMS source:
Strengths
Weaknesses
Solutions
<!-- Should punctuation be ignored? -->
<xsl:param name="tan:ignore-punctuation-differences" as="xs:boolean" select="false()"/>
<xsl:param name="additional-batch-replacements" as="element()*">
<!--ebb: normalizations to batch process for collation. NOTE: We want to do these to preserve some markup \\
in the output for post-processing to reconstruct the edition files.
Remember, these will be processed in order, so watch out for conflicts. -->
<replace pattern="(<.+?>\s*)>" replacement="$1"
message="normalizing away extra right angle brackets"/>
<replace pattern="&" replacement="and"
message="ampersand batch replacement"/>
<replace pattern="</?xml>" replacement=""
message="xml tag replacement"/>
<replace pattern="(<p)\s+.+?(/>)" replacement="$1$2"
message="p-tag batch replacement"/>
<replace pattern="(<)(metamark).*?(>).+?\1/\2\3" replacement=""
message="metamark batch replacement"/>
<!--ebb: metamark contains a text node, and we don't want its
contents processed in the collation, so this captures the entire element. -->
<replace pattern="(</?)m(del).*?(>)"
replacement="$1$2$3" message="mdel-SGA batch replacement"/>
<!--ebb: mdel contains a text node, so this catches both start and end tag.
We want mdel to be processed as <del>...</del>-->
<replace pattern="</?damage.*?>"
replacement="" message="damage-SGA batch replacement"/>
<!--ebb: damage contains a text node, so this catches both start and end tag. -->
<replace pattern="</?unclear.*?>" replacement=""
message="unclear-SGA batch replacement"/>
<!--ebb: unclear contains a text node, so this catches both start and end tag. -->
<replace pattern="</?retrace.*?>" replacement=""
message="retrace-SGA batch replacement"/>
<!--ebb: retrace contains a text node, so this catches both start and end tag. -->
def normalize(inputText):
return RE_MILESTONE.sub('', \
RE_INCLUDE.sub('', \
RE_AB.sub('', \
RE_HEAD.sub('', \
RE_AMP.sub('and', \
RE_MDEL.sub('', \
RE_SHI.sub('', \
RE_HI.sub('', \
RE_LB.sub('', \
RE_PB.sub('', \
RE_PARA.sub('<p/>', \
RE_sgaP.sub('<p/>', \
RE_LG.sub('<lg/>', \
RE_L.sub('<l/>', \
RE_CIT.sub('', \
RE_QUOTE.sub('', \
RE_OPENQT.sub('"', \
RE_CLOSEQT.sub('"', \
RE_GAP.sub('', \
RE_DELSTART.sub('<del>', \
RE_ADDSTART.sub('<add>', \
RE_METAMARK.sub('', inputText)))))))))))))))))))))).lower()
RE_MARKUP = re.compile(r'<.+?>')
RE_PARA = re.compile(r'<p\s[^<]+?/>')
RE_INCLUDE = re.compile(r'<include[^<]*/>')
RE_MILESTONE = re.compile(r'<milestone[^<]*/>')
RE_HEAD = re.compile(r'<head[^<]*/>')
RE_AB = re.compile(r'<ab[^<]*/>')
RE_AMP = re.compile(r'&')
RE_DELSTART = re.compile(r'<del[^<]*>')
RE_ADDSTART = re.compile(r'<add[^<]*>')
RE_MDEL = re.compile(r'<mdel[^<]*>.+?</mdel>')
RE_SHI = re.compile(r'<shi[^<]*>.+?</shi>')
RE_METAMARK = re.compile(r'<metamark[^<]*>.+?</metamark>')
RE_HI = re.compile(r'<hi\s[^<]*/>')
RE_PB = re.compile(r'<pb[^<]*/>')
RE_LB = re.compile(r'<lb.*?/>')
# 2021-09-06: ebb and djb: On <lb> collation troubles: LOOK FOR DOT MATCHES ALL FLAG
# b/c this is likely spanning multiple lines, and getting split by the tokenizing algorithm.
# 2021-09-10: ebb with mb and jc: trying .*? and DOTALL flag
RE_LG = re.compile(r'<lg[^<]*/>')
RE_L = re.compile(r'<l\s[^<]*/>')
RE_CIT = re.compile(r'<cit\s[^<]*/>')
RE_QUOTE = re.compile(r'<quote\s[^<]*/>')
RE_OPENQT = re.compile(r'“')
RE_CLOSEQT = re.compile(r'”')
RE_GAP = re.compile(r'<gap\s[^<]*/>')
# <milestone unit="tei:p"/>
RE_sgaP = re.compile(r'<milestone\sunit="tei:p"[^<]*/>')