Elisa Beshero-Bondar
Twitter: @epyllia | GitHub: @ebeshero
DH2022 Tokyo: session [LP9-01] Long Presentations 9-01
July 29, 2022 @ 9:30 AM - 11:00 JST
Link to these slides: https://bit.ly/fv-tangles
Our use of the term: A digital edition that investigates change to a work by comparing distinct versions of it.
1818 Edition published anonymously (3 volumes)
1823 Edition printed by MWS's father William Godwin, the first to include her name as the author. (2 volumes)
1831 Edition extensively revised by MWS, bound with Friedrich von Schiller's The Ghost Seer in Bentley's Standard Series of novels) (1/2 of a volume)
Thomas Copy made sometime between 1818 and 1822: MWS's marginal comments on a print copy of 1818
James Rieger, ed., first new edition of 1818 in 141
years : inline collation of "Thomas" w/ 1818,
1831 variants in endnotes
Legend:
Stuart Curran and Jack Lynch: PA Electronic Edition (PAEE) , collation of 1818 and 1831: HTML
Nora Crook crit. ed of 1818, variants of "Thomas", 1823, and 1831 in endnotes (P&C MWS collected works)
Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta
1974
~mid-1990s
1996
Charles Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts
2007
Shelley-Godwin Archive publishes diplomatic/documentary edition of 1816 ms drafts
print edition
digital edition
Legend:
2013
2017
Frankenstein Variorum Project :
assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text
Can we make an edition that conveniently compares the manuscripts to the print publications?
Can we make a comprehensive collation to show changes to the novel over time, from 1816 to 1831?
Which editorial interventions persist from 1816 to 1831?
MWS in the "Thomas" copy: how much of this persists into 1831?
PBS's additions: which/how many of these persist to 1831?
What parts of the novel were most mutable?
Shelley-Godwin Archive’s diplomatic edition of the 1816 Notebooks at http://shelleygodwinarchive.org
1. Make visible and accessible a nonlinear, divergent edition history
2. Introduce textual scholarship to students, fans of Frankenstein as well as text scholars, 19c specialists:
Complete the project! . . . This includes:
Gothenburg model : algorithm for computer-assisted collation, developed in 2009 workshop of collateX and Juxta developers.
Tokenization :
Break down the smallest unit of comparison: (words--with punctuation, or character-by-character):
FV tokenizes words and includes punctuation and tags:
'<del>the', 'frame', 'on', 'whic<del>', 'my', 'man', 'completeed,.'
Normalization
'&' = 'and'
<p xml:id="novel1_letter4_div4_p2"> = <p/>
Alignment
Identify comparable divergence: what makes text sequences comparable units?
“Chunking” text into comparable passages (chapters/paragraphs that line up with identifiable start and end points). Collation proceeds chunk by chunk.
Analysis
Study output, correct, and re-align after machine process, AND refine automated processing
Visualization:
Critical edition interface, graph displays
But let's take it one stage at a time. . .
Prepared from OCR new XML of 1823 edition
...unless we
<surface lrx="3847" lry="5342"
partOf="#ox-frankenstein_volume_i"
ulx="0" uly="0" folio="21r" shelfmark="MS. Abinger c. 56" base="ox-ms_abinger_c56/ox-ms_abinger_c56-0045.xml"
id="ox-ms_abinger_c56-0045" sID="ox-ms_abinger_c56-0045"/>
<graphic url="http://shelleygodwinarchive.org/images/ox/ms_abinger_c56/ms_abinger_c56-0045.jp2"/>
<zone type="main" sID="c56-0045__main"/>
<lb n="c56-0045__main__17"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9811"/>But how<del eID="c56-0045__main__d2e9811"/> How can I describe
my <lb n="c56-0045__main__18"/> emotion at this catastrophe; or how
<w ana="start"/>deli<lb n="c56-0045__main__19"/>neate<w ana="end"/>
the wretch whom with such <lb n="c56-0045__main__20"/> infinite pains and care I had endeavoured <lb n="c56-0045__main__21"/> to form. His limbs were in proportion <lb n="c56-0045__main__22"/> and I had selected his features <del rend="strikethrough" sID="c56-0045__main__d2e9830"/>h<del eID="c56-0045__main__d2e9830"/> as <lb n="c56-0045__main__23"/>
<mod sID="c56-0045__main__d2e9835"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9837"/>handsome<del eID="c56-0045__main__d2e9837"/>
<mdel>.</mdel>
<anchor xml:id="c56-0045.01"/>
<zone corresp="#c56-0045.01" type="left_margin" sID="c56-0045__left_margin"/>
<lb n="c56-0045__left_margin__1"/>
<add sID="c56-0045__left_margin__d2e9849"/>
<mod sID="c56-0045__left_margin__d2e9851"/>
<del rend="strikethrough" sID="c56-0045__left_margin__d2e9853"/>handsome<del eID="c56-0045__left_margin__d2e9853"/>
<add hand="#pbs" place="superlinear" sID="c56-0045__left_margin__d2e9856"/>beautiful.<add eID="c56-0045__left_margin__d2e9856"/>
<mod eID="c56-0045__left_margin__d2e9851"/>
<add eID="c56-0045__left_margin__d2e9849"/>
<zone eID="c56-0045__left_margin"/>
<mod eID="c56-0045__main__d2e9835"/>
<mod sID="c56-0045__main__d2e9863"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9865"/>Handsome<del eID="c56-0045__main__d2e9865"/>
<add hand="#pbs" place="superlinear" sID="c56-0045__main__d2e9868"/>Beautiful<add eID="c56-0045__main__d2e9868"/>
<mod eID="c56-0045__main__d2e9863"/>; Great God! His <lb n="c56-0045__main__24"/>
With witnesses prepared and inspected, we now work on the Python script that tokenizes and normalizes the source files, preparing the witnesses to be compared by collateX
warp: sets tension or looseness of weave:
normalizing the text "thread" to tell us how to "pull" it
weft: moves through the warp threads cross-wise:
establishes moments of alignment across the text threads
<!-- Should punctuation be ignored? -->
<xsl:param name="tan:ignore-punctuation-differences" as="xs:boolean" select="false()"/>
<xsl:param name="additional-batch-replacements" as="element()*">
<!--ebb: normalizations to batch process for collation. NOTE: We want to do these to preserve some markup \\
in the output for post-processing to reconstruct the edition files.
Remember, these will be processed in order, so watch out for conflicts. -->
<replace pattern="(<.+?>\s*)>" replacement="$1"
message="normalizing away extra right angle brackets"/>
<replace pattern="&" replacement="and"
message="ampersand batch replacement"/>
<replace pattern="</?xml>" replacement=""
message="xml tag replacement"/>
<replace pattern="(<p)\s+.+?(/>)" replacement="$1$2"
message="p-tag batch replacement"/>
<replace pattern="(<)(metamark).*?(>).+?\1/\2\3" replacement=""
message="metamark batch replacement"/>
<!--ebb: metamark contains a text node, and we don't want its
contents processed in the collation, so this captures the entire element. -->
<replace pattern="(</?)m(del).*?(>)"
replacement="$1$2$3" message="mdel-SGA batch replacement"/>
<!--ebb: mdel contains a text node, so this catches both start and end tag.
We want mdel to be processed as <del>...</del>-->
<replace pattern="</?damage.*?>"
replacement="" message="damage-SGA batch replacement"/>
<!--ebb: damage contains a text node, so this catches both start and end tag. -->
<replace pattern="</?unclear.*?>" replacement=""
message="unclear-SGA batch replacement"/>
<!--ebb: unclear contains a text node, so this catches both start and end tag. -->
<replace pattern="</?retrace.*?>" replacement=""
message="retrace-SGA batch replacement"/>
<!--ebb: retrace contains a text node, so this catches both start and end tag. -->
See Joel Kalvesmaki's Balisage papers on
tan:diff (2021) and tan:collate 2022 (next week)
ignore = ['mod', 'sourceDoc', 'xml', 'comment', 'w', 'anchor',
'include', 'delSpan', 'addSpan', 'handShift', 'damage', 'restore',
'zone', 'surface', 'graphic', 'unclear', 'retrace']
blockEmpty = ['pb', 'p', 'div', 'milestone', 'lg', 'l', 'note',
'cit', 'quote', 'bibl', 'ab', 'head']
inlineEmpty = ['lb', 'gap', 'hi', 'add', 'del']
inlineContent = ['metamark', 'mdel', 'shi']
Prepare the warp: create lists of XML element names for special treatment
RE_MARKUP = re.compile(r'<.+?>')
RE_PARA = re.compile(r'<p\s[^<]+?/>')
RE_INCLUDE = re.compile(r'<include[^<]*/>')
RE_MILESTONE = re.compile(r'<milestone[^<]*/>')
RE_HEAD = re.compile(r'<head[^<]*/>')
RE_AB = re.compile(r'<ab[^<]*/>')
RE_AMP = re.compile(r'&')
RE_DELSTART = re.compile(r'<del[^<]*>')
RE_ADDSTART = re.compile(r'<add[^<]*>')
RE_MDEL = re.compile(r'<mdel[^<]*>.+?</mdel>')
RE_SHI = re.compile(r'<shi[^<]*>.+?</shi>')
RE_METAMARK = re.compile(r'<metamark[^<]*>.+?</metamark>')
RE_HI = re.compile(r'<hi\s[^<]*/>')
RE_PB = re.compile(r'<pb[^<]*/>')
RE_LB = re.compile(r'<lb.*?/>')
RE_LG = re.compile(r'<lg[^<]*/>')
RE_L = re.compile(r'<l\s[^<]*/>')
RE_CIT = re.compile(r'<cit\s[^<]*/>')
RE_QUOTE = re.compile(r'<quote\s[^<]*/>')
RE_OPENQT = re.compile(r'“')
RE_CLOSEQT = re.compile(r'”')
RE_GAP = re.compile(r'<gap\s[^<]*/>')
# <milestone unit="tei:p"/>
RE_sgaP = re.compile(r'<milestone\sunit="tei:p"[^<]*/>')
def normalize(inputText):
return RE_MULTI_LEFTANGLE.sub('<',\
RE_MULTI_LEFTANGLE.sub('>', \
RE_INCLUDE.sub('', \
RE_AB.sub('', \
RE_HEAD.sub('', \
RE_AMP.sub('and', \
RE_MDEL.sub('', \
RE_SHI.sub('', \
RE_HI.sub('', \
RE_LB.sub('', \
RE_PB.sub('', \
RE_PARA.sub('<p/>', \
RE_sgaP.sub('<p/>', \
RE_MILESTONE.sub('', \
RE_LG.sub('<lg/>', \
RE_L.sub('<l/>', \
RE_CIT.sub('', \
RE_QUOTE.sub('', \
RE_OPENQT.sub('"', \
RE_CLOSEQT.sub('"', \
RE_GAP.sub('', \
RE_DELSTART.sub('<delstart/>', \
RE_DELEND.sub('<delend/>', \
RE_ADDSTART.sub('<addstart/>', \
RE_ADDEND.sub('<addend/>', \
RE_MOD.sub('', \
RE_METAMARK.sub('', inputText))))))))))))))))))))))))))).lower()
Find and replace the regex patterns before feeding to collateX
<app>
<rdgGrp n="['spot,', 'and', 'endeavoured,']">
<rdg wit="f1818">spot, and endeavoured, </rdg>
<rdg wit="f1823">spot, and endeavoured, </rdg>
<rdg wit="fThomas">spot, and endeavoured, </rdg>
<rdg wit="f1831">spot, and endeavoured, </rdg>
</rdgGrp>
<rdgGrp n="['spotand', 'endeavoured']">
<rdg wit="fMS">spot& endeavoured </rdg>
</rdgGrp>
</app>
. . . the spot<add eID="c57-0117__main__d3e21951"/> & endeavoured . . .
In the fMS source:
if event == pulldom.START_ELEMENT and node.localName in inlineEmpty:
output += '\n' + node.toxml() + '\n'
by adding newline characters around markup nodes
corrected this output...
<app>
<rdgGrp n="['spot', '<addend/>']">
<rdg wit="fMS">spot
<add eID="c57-0117__main__d3e21951"/> </rdg>
</rdgGrp>
<rdgGrp n="['spot,']">
<rdg wit="f1818">spot, </rdg>
<rdg wit="f1823">spot, </rdg>
<rdg wit="fThomas">spot, </rdg>
<rdg wit="f1831">spot, </rdg>
</rdgGrp>
</app>
<app>
<rdgGrp n="['and']">
<rdg wit="f1818">and </rdg>
<rdg wit="f1823">and </rdg>
<rdg wit="fThomas">and </rdg>
<rdg wit="f1831">and </rdg>
<rdg wit="fMS">& </rdg>
</rdgGrp>
</app>
but created a new problem...
<app>
<rdgGrp n="['for', 'there']">
<rdg wit="f1818">for there </rdg>
<rdg wit="f1823">for there </rdg>
<rdg wit="fThomas">for there </rdg>
<rdg wit="f1831">for there </rdg>
<rdg wit="fMS">for there </rdg>
</rdgGrp>
</app>
<app>
<rdgGrp n="['', '', '<addstart/>']">
<rdg wit="fMS"><lb n="c57-0118__main__4"/>
<lb n="c57-0118__left_margin__1"/> <add hand="#pbs"
sID="c57-0118__left_margin__d3e21996"/> </rdg>
</rdgGrp>
</app>
<app>
<rdgGrp n="['was']">
<rdg wit="f1818">was </rdg>
<rdg wit="f1823">was </rdg>
<rdg wit="fThomas">was </rdg>
<rdg wit="f1831">was </rdg>
<rdg wit="fMS">was </rdg>
</rdgGrp>
</app>
<app>
<rdgGrp n="['<addend/>']">
<rdg wit="fMS"><add eID="c57-0118__left_margin__d3e21996"/> </rdg>
</rdgGrp>
</app>
<app>
<rdgGrp n="['no', 'sign', 'of', 'any']">
<rdg wit="f1818">no sign of any </rdg>
<rdg wit="f1823">no sign of any </rdg>
<rdg wit="fThomas">no sign of any </rdg>
<rdg wit="f1831">no sign of any </rdg>
<rdg wit="fMS">no sign of any </rdg>
</rdgGrp>
</app>
<app>
<rdgGrp n="['violence']">
<rdg wit="fMS">violence </rdg>
</rdgGrp>
<rdgGrp n="['violence,']">
<rdg wit="f1818">violence, </rdg>
<rdg wit="f1823">violence, </rdg>
<rdg wit="fThomas">violence, </rdg>
<rdg wit="f1831">violence, </rdg>
</rdgGrp> </app>
Making a “stand-off“ Spine (info + pointers to collation data)
Generating the edition files with collation data marked “inline”
”Heatmap” view, showing variation intensity as blocks with circles color-coded by edition. Selecting a circle on the heatmap view displays the edition and its variants.
The Variorum Viewer: ALTER THE DIRECTIONALITY OF READING
The visitor chooses an edition to read and a section aligned with the other editions, in this case the 1818 in section 10. Sections are usually chapter boundaries.
Variant passages are highlighted based on a three-part scale of intensity defined by maximum edit distance of any version from the others at this point. The darker the shade, the greater the divergence from at least one of the other editions. The colored dot beneath a passage indicates which edition(s) hold a variant at this location, following the legend provided above.
The presence of a number with a manicule indicates here that two contextual annotations are available (as shown below). These annotations were written by a team of scholars to offer commentary on content in this paragraph.
Selecting a variant passage opens a panel to show how all the editions read at this point. Contextual annotations (signalled by the manicule) would open in the same space as this variant display panel, so the two are not currently displayed together. The visitor may choose which to view.
A heavily revised passage, showing the MS notebook view
Legend
MS
1818
Thm
1823
1831
Alignments, gaps, and comparative lengths of each collation unit
chapter heading or other structural boundary
Mouse over a black box...
Strengths
Weaknesses
Solutions