Processing Tangles in the Frankenstein Variorum

Elisa Beshero-Bondar

Twitter: @epyllia | GitHub: @ebeshero

DH2022 Tokyo: session [LP9-01] Long Presentations 9-01
July 29, 2022 @ 9:30 AM - 11:00 JST

Link to these slides: https://bit.ly/fv-tangles

A Digital Variorum

Our use of the term: A digital edition that investigates change to a work by comparing distinct versions of it.

Print Publications of Frankenstein in MWS's lifetime:

1818 Edition published anonymously (3 volumes)
1823 Edition printed by MWS's father William Godwin, the first to include her name as the author. (2 volumes)
1831 Edition extensively revised by MWS, bound with Friedrich von Schiller's The Ghost Seer in Bentley's Standard Series of novels) (1/2 of a volume)

Manuscript versions:

Fair copy MS drafts (~1816) at the Bodleian Library, Oxford: Abinger c56, c57, c58
Thomas Copy made sometime between 1818 and 1822: MWS's marginal comments on a print copy of 1818

James Rieger, ed., first new edition of 1818 in 141

years : inline collation of "Thomas" w/ 1818,

1831 variants in endnotes

Legend:

Stuart Curran and Jack Lynch: PA Electronic Edition (PAEE) , collation of 1818 and 1831: HTML

Nora Crook crit. ed of 1818, variants of "Thomas", 1823, and 1831 in endnotes (P&C MWS collected works)

Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta

1974

~mid-1990s

1996

Charles Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts

2007

Shelley-Godwin Archive publishes diplomatic/documentary edition of 1816 ms drafts

print edition

digital edition

Legend:

2013

2017

Critical and Documentary Editions Leading to the Frankenstein Variorum Project

Frankenstein Variorum Project :

assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text

Multi-institutional collaboration

Me (Pitt-Greensburg/Penn State Erie, and TEI)
Raffaele Viglianti (MITH: Shelley-Godwin Archive and TEI)
Rikk Mulligan, Jon Klancher and team at Carnegie Mellon University
Angela Brunk (Kansas State U: UX / accessibility)

Can we make an edition that conveniently compares the manuscripts to the print publications?
Can we make a comprehensive collation to show changes to the novel over time, from 1816 to 1831?
- Which editorial interventions persist from 1816 to 1831?
  - MWS in the "Thomas" copy: how much of this persists into 1831?
  - PBS's additions: which/how many of these persist to 1831?
  - What parts of the novel were most mutable?

Motivating Questions

Working with MS versions

Thomas copy marginalia

prepared new XML from 1818 edition, with <add>, <del>, <note> elements

Shelley-Godwin Archive’s diplomatic edition of the 1816 Notebooks at http://shelleygodwinarchive.org

encoded surface-by-surface, line-by-line
required resequencing to include in the collation (margin notes not included at point of insertion but at the end of each page file)

Shelley-Godwin Archive: sample page surface:

Variorum challenge

1. Make visible and accessible a nonlinear, divergent edition history

1816 notebooks to 1818: uneven (gaps in notebooks)
Thomas divergence:
- copy with margin notes was left in Italy before 1823, apparently not consulted later
1823 edits: largely retained in 1831
1831 major revisions:
- alteration of character relationships, added chapter and several lengthened passages

2. Introduce textual scholarship to students, fans of Frankenstein as well as text scholars, 19c specialists:

Recruit next generation of text scholars
Not marginalizing variants in print model of endnotes/footnotes
Tell the story of Frankenstein’s ”hot” or ”cool” alterations inline

Progress Report

One third of the collation is being displayed at https://frankensteinvariorum.github.io/
This (first) portion of the novel gave us the basis for developing the ”spine” data model and the Variorum reader web interface

What's next?

Refining our machine-assisted collation algorithms
UX testing of the navigation and reading interface*
- *with recursive cycles of error-correction (see stage 1)
Documentation of the ”spine” data model for the TEI Guidelines and other projects

Complete the project! . . . This includes:

Frankenstein Variorum: The Collation Process

Gothenburg model : algorithm for computer-assisted collation, developed in 2009 workshop of collateX and Juxta developers.

Tokenization :
- Break down the smallest unit of comparison: (words--with punctuation, or character-by-character):
- FV tokenizes words and includes punctuation and tags:
  '<del>the', 'frame', 'on', 'whic<del>', 'my', 'man', 'completeed,.'
Normalization
- '&' = 'and'
- <p xml:id="novel1_letter4_div4_p2"> = <p/>
Alignment
- Identify comparable divergence: what makes text sequences comparable units?
- “Chunking” text into comparable passages (chapters/paragraphs that line up with identifiable start and end points). Collation proceeds chunk by chunk.
Analysis
- Study output, correct, and re-align after machine process, AND refine automated processing
Visualization:
- Critical edition interface, graph displays

Computer-aided collation: Gothenburg Model

Collation is weaving...

Source documents supplying “threads”
collation software = weaving machine
- needs to be able to find where the “threads” run together,
- and where they diverge (and what constitutes divergence)
- Markup functions as signals to the collation weaving machine

Guiding the machine reading

Markup is not the same in all of the editions
That's okay, because we hide some of it from collation software
Our method:
- Python script instructs on strings of text that the collation algorithm can just skip over
- Marking the markup: Identify markup that is not helpful for comparing the documents

The “base markup” for collation

Print books
- simple structural markup for 1818, 1823, and 1831 eds.
Manuscript versions
- Thomas copy (handwritten notes on 1818 print text): prepped from 1818 files with some simple manuscript coding of handwritten insertions, deletions, and notes
- 1816 MS Notebook: It's complicated...

Perhaps more to the point, I can say. . .

But let's take it one stage at a time. . .

PA Electronic Edition (mid 1990s): 1818 vs 1831

started from base HTML 1.0 files
up-converted to clean, simple XML
- ”on its way” to TEI (structural elements in text)
- prepared for machine-assisted collation (via CollateX): including element tags
- deep hierarchy of novel ”flattened” to milestones: <div type="volume"/>, <p/>, etc.
corrected against photofacsimiles of 1818 and 1831 print publications

Prepared from OCR new XML of 1823 edition

prepared by William Godwin, the first edition bearing the name ”Mary Wollstonecraft Shelley” on the title page
XML syntax matches that of 1818 and 1831 editions

Added new XML: Thomas Copy

Added insertions, deletions, + margin-notes to 1818 edition using <add>, <del>, and <note>
checked against and respond to/update Nora Crook and James Reiger editions

Now... about that little problem...

resequence it*
- *to move margin notes into reading order
flatten the markup
ignore what markup has no counterpart in the other witnesses

...unless we

S-GA: resequenced / compressed for collation

<surface lrx="3847" lry="5342" 
partOf="#ox-frankenstein_volume_i" 
ulx="0" uly="0" folio="21r" shelfmark="MS. Abinger c. 56" base="ox-ms_abinger_c56/ox-ms_abinger_c56-0045.xml" 
id="ox-ms_abinger_c56-0045" sID="ox-ms_abinger_c56-0045"/>
      <graphic url="http://shelleygodwinarchive.org/images/ox/ms_abinger_c56/ms_abinger_c56-0045.jp2"/>
      <zone type="main" sID="c56-0045__main"/> 

<lb n="c56-0045__main__17"/> 
         <del rend="strikethrough" sID="c56-0045__main__d2e9811"/>But how<del eID="c56-0045__main__d2e9811"/> How can I describe
      my <lb n="c56-0045__main__18"/> emotion at this catastrophe; or how 

<w ana="start"/>deli<lb n="c56-0045__main__19"/>neate<w ana="end"/> 

the wretch whom with such <lb n="c56-0045__main__20"/> infinite pains and care I had endeavoured <lb n="c56-0045__main__21"/> to form. His limbs were in proportion <lb n="c56-0045__main__22"/> and I had selected his features <del rend="strikethrough" sID="c56-0045__main__d2e9830"/>h<del eID="c56-0045__main__d2e9830"/> as <lb n="c56-0045__main__23"/> 
         <mod sID="c56-0045__main__d2e9835"/>
            <del rend="strikethrough" sID="c56-0045__main__d2e9837"/>handsome<del eID="c56-0045__main__d2e9837"/>
            <mdel>.</mdel>
            <anchor xml:id="c56-0045.01"/>
            <zone corresp="#c56-0045.01" type="left_margin" sID="c56-0045__left_margin"/> 
               <lb n="c56-0045__left_margin__1"/> 
               <add sID="c56-0045__left_margin__d2e9849"/>
                  <mod sID="c56-0045__left_margin__d2e9851"/>
                     <del rend="strikethrough" sID="c56-0045__left_margin__d2e9853"/>handsome<del eID="c56-0045__left_margin__d2e9853"/>
                     <add hand="#pbs" place="superlinear" sID="c56-0045__left_margin__d2e9856"/>beautiful.<add eID="c56-0045__left_margin__d2e9856"/>
                  <mod eID="c56-0045__left_margin__d2e9851"/>
               <add eID="c56-0045__left_margin__d2e9849"/>
            <zone eID="c56-0045__left_margin"/>
         <mod eID="c56-0045__main__d2e9835"/>
         <mod sID="c56-0045__main__d2e9863"/>
            <del rend="strikethrough" sID="c56-0045__main__d2e9865"/>Handsome<del eID="c56-0045__main__d2e9865"/>
            <add hand="#pbs" place="superlinear" sID="c56-0045__main__d2e9868"/>Beautiful<add eID="c56-0045__main__d2e9868"/>
         <mod eID="c56-0045__main__d2e9863"/>; Great God! His <lb n="c56-0045__main__24"/>

added word boundary markup to indicate whole words spanning lines
resequenced margin zone content: (followed S-GA's pointers to represent semantic reading order for collation)

With witnesses prepared and inspected, we now work on the Python script that tokenizes and normalizes the source files, preparing the witnesses to be compared by collateX

Image credits: https://rootsofprogress.org/learning-the-loom,

https://www.oschaslings.com/blog/warp-weft-weave-structure-understanding-how-your-sling-is-made-wrapping-qualities

warp: sets tension or looseness of weave:
normalizing the text "thread" to tell us how to "pull" it

weft: moves through the warp threads cross-wise:

establishes moments of alignment across the text threads

XSLT or Python/CollateX?

XSLT differentiation driven by XPath string comparison
- So far, cleaner handling of space issues
- Precision/legiblity of normalizing code
- Needs a lot of adaptation to incorporate into production pipeline
Python and collateX
- Bespoke TEI output: customized in our project pipeline
- Maybe only a tokenization problem that we can correct?
- Needs to be rewritten from the ground up for sustainability/sharing the code


                <!-- Should punctuation be ignored? -->
    <xsl:param name="tan:ignore-punctuation-differences" as="xs:boolean" select="false()"/>
    
    <xsl:param name="additional-batch-replacements" as="element()*">
        <!--ebb: normalizations to batch process for collation. NOTE: We want to do these to preserve some markup \\
            in the output for post-processing to reconstruct the edition files. 
            Remember, these will be processed in order, so watch out for conflicts. -->
        <replace pattern="(<.+?>\s*)&gt;" replacement="$1" 
                 message="normalizing away extra right angle brackets"/>
         <replace pattern="&amp;" replacement="and" 
                  message="ampersand batch replacement"/>
        <replace pattern="</?xml>" replacement="" 
                 message="xml tag replacement"/>
        <replace pattern="(<p)\s+.+?(/>)" replacement="$1$2" 
                 message="p-tag batch replacement"/>
        <replace pattern="(<)(metamark).*?(>).+?\1/\2\3" replacement="" 
                 message="metamark batch replacement"/>
      <!--ebb: metamark contains a text node, and we don't want its 
contents processed in the collation, so this captures the entire element. -->
        <replace pattern="(</?)m(del).*?(>)" 
                 replacement="$1$2$3" message="mdel-SGA batch replacement"/> 
      <!--ebb: mdel contains a text node, so this catches both start and end tag.
        We want mdel to be processed as <del>...</del>-->
        <replace pattern="</?damage.*?>" 
                 replacement="" message="damage-SGA batch replacement"/> 
      <!--ebb: damage contains a text node, so this catches both start and end tag. -->
        <replace pattern="</?unclear.*?>" replacement="" 
                 message="unclear-SGA batch replacement"/>
      <!--ebb: unclear contains a text node, so this catches both start and end tag. -->
        <replace pattern="</?retrace.*?>" replacement=""
                 message="retrace-SGA batch replacement"/>
      <!--ebb: retrace contains a text node, so this catches both start and end tag. -->

See Joel Kalvesmaki's Balisage papers on
tan:diff (2021) and tan:collate 2022 (next week)

Normalizing/collating with TAN:Diff XSLT Library

ignore = ['mod', 'sourceDoc', 'xml', 'comment', 'w', 'anchor', 
          'include', 'delSpan', 'addSpan', 'handShift', 'damage', 'restore', 
          'zone', 'surface', 'graphic', 'unclear', 'retrace']

blockEmpty = ['pb', 'p', 'div', 'milestone', 'lg', 'l', 'note', 
              'cit', 'quote', 'bibl', 'ab', 'head']

inlineEmpty = ['lb', 'gap',  'hi', 'add', 'del']

inlineContent = ['metamark', 'mdel', 'shi']

In the Python script:

Prepare the warp: create lists of XML element names for special treatment

Ignored whole elements need to be screened out of the collation entirely.
Other whole elements need to be preserved but normalized.
XML Pulldom library helps us with special handling of XML elements.

How the normalizing looks in Python:
the RE_ variables

RE_MARKUP = re.compile(r'<.+?>')
RE_PARA = re.compile(r'<p\s[^<]+?/>')
RE_INCLUDE = re.compile(r'<include[^<]*/>')
RE_MILESTONE = re.compile(r'<milestone[^<]*/>')
RE_HEAD = re.compile(r'<head[^<]*/>')
RE_AB = re.compile(r'<ab[^<]*/>')
RE_AMP = re.compile(r'&')
RE_DELSTART = re.compile(r'<del[^<]*>')
RE_ADDSTART = re.compile(r'<add[^<]*>')
RE_MDEL = re.compile(r'<mdel[^<]*>.+?</mdel>')
RE_SHI = re.compile(r'<shi[^<]*>.+?</shi>')
RE_METAMARK = re.compile(r'<metamark[^<]*>.+?</metamark>')
RE_HI = re.compile(r'<hi\s[^<]*/>')
RE_PB = re.compile(r'<pb[^<]*/>')
RE_LB = re.compile(r'<lb.*?/>')
RE_LG = re.compile(r'<lg[^<]*/>')
RE_L = re.compile(r'<l\s[^<]*/>')
RE_CIT = re.compile(r'<cit\s[^<]*/>')
RE_QUOTE = re.compile(r'<quote\s[^<]*/>')
RE_OPENQT = re.compile(r'“')
RE_CLOSEQT = re.compile(r'”')
RE_GAP = re.compile(r'<gap\s[^<]*/>')
# &lt;milestone unit="tei:p"/&gt;
RE_sgaP = re.compile(r'<milestone\sunit="tei:p"[^<]*/>')

How the normalizing looks in Python

def normalize(inputText):
    return RE_MULTI_LEFTANGLE.sub('<',\
        RE_MULTI_LEFTANGLE.sub('>', \
        RE_INCLUDE.sub('', \
        RE_AB.sub('', \
        RE_HEAD.sub('', \
        RE_AMP.sub('and', \
        RE_MDEL.sub('', \
        RE_SHI.sub('', \
        RE_HI.sub('', \
        RE_LB.sub('', \
        RE_PB.sub('', \
        RE_PARA.sub('<p/>', \
        RE_sgaP.sub('<p/>', \
        RE_MILESTONE.sub('', \
        RE_LG.sub('<lg/>', \
        RE_L.sub('<l/>', \
        RE_CIT.sub('', \
        RE_QUOTE.sub('', \
        RE_OPENQT.sub('"', \
        RE_CLOSEQT.sub('"', \
        RE_GAP.sub('', \
        RE_DELSTART.sub('<delstart/>', \
        RE_DELEND.sub('<delend/>', \
        RE_ADDSTART.sub('<addstart/>', \
        RE_ADDEND.sub('<addend/>', \
        RE_MOD.sub('', \
        RE_METAMARK.sub('', inputText))))))))))))))))))))))))))).lower()

Find and replace the regex patterns before feeding to collateX

Tokenization error


    <app>
		<rdgGrp n="['spot,', 'and', 'endeavoured,']">
			<rdg wit="f1818">spot, and endeavoured, </rdg>
			<rdg wit="f1823">spot, and endeavoured, </rdg>
			<rdg wit="fThomas">spot, and endeavoured, </rdg>
			<rdg wit="f1831">spot, and endeavoured, </rdg>
		</rdgGrp>
		<rdgGrp n="['spotand', 'endeavoured']">
			<rdg wit="fMS">spot& endeavoured </rdg>
		</rdgGrp>
	</app>

  . . . the spot<add eID="c57-0117__main__d3e21951"/> & endeavoured . . .

In the fMS source:

Repairing the tokenization error


 if event == pulldom.START_ELEMENT and node.localName in inlineEmpty:
            output += '\n' + node.toxml() + '\n'

by adding newline characters around markup nodes

Repairing the tokenization error

corrected this output...

<app>
		<rdgGrp n="['spot', '&lt;addend/&gt;']">
			<rdg wit="fMS">spot 
              &lt;add eID=&quot;c57-0117__main__d3e21951&quot;/&gt; </rdg>
		</rdgGrp>
		<rdgGrp n="['spot,']">
			<rdg wit="f1818">spot, </rdg>
			<rdg wit="f1823">spot, </rdg>
			<rdg wit="fThomas">spot, </rdg>
			<rdg wit="f1831">spot, </rdg>
		</rdgGrp>
	</app>
	<app>
		<rdgGrp n="['and']">
			<rdg wit="f1818">and </rdg>
			<rdg wit="f1823">and </rdg>
			<rdg wit="fThomas">and </rdg>
			<rdg wit="f1831">and </rdg>
			<rdg wit="fMS">&amp; </rdg>
		</rdgGrp>
	</app>

but created a new problem...

<app>
		<rdgGrp n="['for', 'there']">
			<rdg wit="f1818">for there </rdg>
			<rdg wit="f1823">for there </rdg>
			<rdg wit="fThomas">for there </rdg>
			<rdg wit="f1831">for there </rdg>
			<rdg wit="fMS">for there </rdg>
		</rdgGrp>
	</app>
	<app>
		<rdgGrp n="['', '', '&lt;addstart/&gt;']">
			<rdg wit="fMS">&lt;lb n=&quot;c57-0118__main__4&quot;/&gt; 
              &lt;lb n=&quot;c57-0118__left_margin__1&quot;/&gt; &lt;add hand=&quot;#pbs&quot; 
              sID=&quot;c57-0118__left_margin__d3e21996&quot;/&gt; </rdg>
		</rdgGrp>
	</app>
	<app>
		<rdgGrp n="['was']">
			<rdg wit="f1818">was </rdg>
			<rdg wit="f1823">was </rdg>
			<rdg wit="fThomas">was </rdg>
			<rdg wit="f1831">was </rdg>
			<rdg wit="fMS">was </rdg>
		</rdgGrp>
	</app>
	<app>
		<rdgGrp n="['&lt;addend/&gt;']">
			<rdg wit="fMS">&lt;add eID=&quot;c57-0118__left_margin__d3e21996&quot;/&gt; </rdg>
		</rdgGrp>
	</app>
	<app>
		<rdgGrp n="['no', 'sign', 'of', 'any']">
			<rdg wit="f1818">no sign of any </rdg>
			<rdg wit="f1823">no sign of any </rdg>
			<rdg wit="fThomas">no sign of any </rdg>
			<rdg wit="f1831">no sign of any </rdg>
			<rdg wit="fMS">no sign of any </rdg>
		</rdgGrp>
	</app>
	<app>
		<rdgGrp n="['violence']">
			<rdg wit="fMS">violence </rdg>
		</rdgGrp>
		<rdgGrp n="['violence,']">
			<rdg wit="f1818">violence, </rdg>
			<rdg wit="f1823">violence, </rdg>
			<rdg wit="fThomas">violence, </rdg>
			<rdg wit="f1831">violence, </rdg>
		</rdgGrp> </app>

Working with the Output of Collation

Making a “stand-off“ Spine (info + pointers to collation data)
Generating the edition files with collation data marked “inline”

Post-processing pipeline

Output files are copied into the post-collation repo:
https://github.com/FrankensteinVariorum/fv-postCollation
Files in the postColl-workspace/P\d-output/ directories
- become the finalized versions determined to be ready for publishing in the Variorum digital edition
- XSLT pipeline "raises" the distinct edition files with their "hotspot" markers of variation from each other
Before the post-processing pipeline, we need to:
- try to ensure accuracy of collation!
- check and correct by:
  - correcting the Python script feeding the collation to handle repeated kinds of problems
  - correcting by hand the errors that are difficult to handle programmatically

“Spine” data model = standoff use of TEI critical apparatus:
- coordinates data on variance: piece by piece (vertebra) which specific passages line up and where they differ.
- points to specific locations in separate edition files
- includes data on normalization: e.g. normalized tokens used in collation process
- includes processed data, like maximum edit-distance, at each location

”Heatmap” view, showing variation intensity as blocks with circles color-coded by edition. Selecting a circle on the heatmap view displays the edition and its variants.

The Variorum Viewer: ALTER THE DIRECTIONALITY OF READING

The visitor chooses an edition to read and a section aligned with the other editions, in this case the 1818 in section 10. Sections are usually chapter boundaries.

Variant passages are highlighted based on a three-part scale of intensity defined by maximum edit distance of any version from the others at this point. The darker the shade, the greater the divergence from at least one of the other editions. The colored dot beneath a passage indicates which edition(s) hold a variant at this location, following the legend provided above.

The presence of a number with a manicule indicates here that two contextual annotations are available (as shown below). These annotations were written by a team of scholars to offer commentary on content in this paragraph.

Selecting a variant passage opens a panel to show how all the editions read at this point. Contextual annotations (signalled by the manicule) would open in the same space as this variant display panel, so the two are not currently displayed together. The visitor may choose which to view.

A heavily revised passage, showing the MS notebook view

Surveying Frankenstein’s five versions | Source XSLT

Legend

1818

Thm

1823

1831

Alignments, gaps, and comparative lengths of each collation unit

chapter heading or other structural boundary

Mouse over a black box...

Revisiting the Gothenberg Model

Error correction is a cyclical process!

We need to:

Inspect our collation results
Understand and categorize the kinds of errors we see
Revise the Python script OR the collation method
Re-run the collation
Re-inspect the results
Decide when it's okay to hand-correct
Minimize hand-correction as much as possible

FV Back-end to Front-end

Strengths

Data model for production pipeline
Collation spine constructs critical edition files in TEI XML
Interface delivers collation data for passages in each edition

Weaknesses

Error-prone tokenizing problem in the Python feed!
Python script is hard to read!
Need more legible preparation code

Solutions

Rewrite the Python, figure out the tokenization problem
Switch to XSLT?: TAN diff (TAN XSLT function library)
Try both and see which is better!

Our ongoing collation work

Frankenstein Variorum GitHub Organization
Original collation work:
- fv-collation repo: https://github.com/FrankensteinVariorum/fv-collation
  - collatexPrep directory (source files + Python + outputs)
Fine-tuning the collation: Testing repos
- Python-to-collateX: https://github.com/FrankensteinVariorum/collateX-Testing
- TAN XSLT: https://github.com/FrankensteinVariorum/TAN-2021
Website interface (to be updated!) https://frankensteinvariorum.github.io/viewer/

Collation projects take much longer to debug than you ever expected
Correct the input machinery, not the output.
- Reduce post-processing to correct output errors!
- Work on the pre-processing.
Machine-assisted processes need a lot of documentation
- for project sustainability
- for reproducibility of data

Processing Tangles in the Frankenstein Variorum

A Digital Variorum

Print Publications of Frankenstein in MWS's lifetime:

Manuscript versions:

Critical and Documentary Editions Leading to the Frankenstein Variorum Project

Multi-institutional collaboration

Motivating Questions

Working with MS versions

Thomas copy marginalia

Shelley-Godwin Archive: sample page surface:

Variorum challenge

Progress Report

What's next?

Frankenstein Variorum: The Collation Process

Computer-aided collation: Gothenburg Model

Collation is weaving...

Guiding the machine reading

The “base markup” for collation

Perhaps more to the point, I can say. . .

PA Electronic Edition (mid 1990s): 1818 vs 1831

Added new XML: Thomas Copy

Now... about that little problem...

S-GA: resequenced / compressed for collation

XSLT or Python/CollateX?

Normalizing/collating with TAN:Diff XSLT Library

In the Python script:

How the normalizing looks in Python: the RE_ variables

How the normalizing looks in Python

Tokenization error

Repairing the tokenization error

Repairing the tokenization error

Working with the Output of Collation

Post-processing pipeline

Surveying Frankenstein’s five versions | Source XSLT

Revisiting the Gothenberg Model

Error correction is a cyclical process!

We need to:

FV Back-end to Front-end

Our ongoing collation work

How the normalizing looks in Python:
the RE_ variables