Document Modeling with the TEI Critical Apparatus
A Panel for the TEI 2019 Conference in Graz, Austria
Presenters: Hugh Cayless (@hcayless), Elisa Beshero-Bondar (@epyllia), Raffaele Viglianti (@raffazizzi)
Respondent: James Cummings (@jamescummings)
Link to these slides: http://bit.ly/crit-app-panel
What is a Critical Apparatus, really?
Hugh Cayless (@hcayless)
What is a Critical Apparatus?
Latin: apparatus criticus, pl. apparatūs critici
- “Scholarly editions of texts...often record some or all of the known variations among different witnesses to the text.” — TEI Guidelines
- “[the apparatus]...records the work’s textual history over time” —Eggert (2007)
- “Editors are not always people who can be trusted, and critical apparatuses are provided so that readers are not dependent upon them.” —West (1973)
What is a Critical Apparatus?
A critical apparatus is the set of notes explaining an editor’s (re)construction of a text. These notes may contain the readings of witnesses, conjectures not promoted to the text, explanatory notes, alternative spellings or punctuation, parallels from other works, and in general any information that might help a reader understand the background of the presented text.
What is a TEI Critical Apparatus?
A critical apparatus is the set of notes explaining an editor’s (re)construction of a text.
- In TEI, where these notes present alternate possibilities, they are modeled in such a way that they may be substituted for the readings in the default text.
- The <app>, <lem>, <rdg> structure places variants in parallel with the default readings.
- So in TEI, the apparatus is more than just notes, it is an actionable data structure.
One view:
A TEI app. crit. represents a forking and rejoining of the text stream, a run of text for which there are multiple possibilities.
A: “The quick brown fox ju...”
B: “The quick brown mouse jumps over the lazy cat.”
C: “The quick brown cat jumps over the lazy dog.”
A: “The quick brown fox ju...”
B: “The quick brown mouse jumps over the lazy cat.”
C: “The quick brown cat jumps over the lazy dog.”
We think A and B derive from the archetype via different routes, and C derives from A.
<p>The quick brown <app> <lem wit="#A">fox</lem> <rdg wit="#B">mouse</rdg> <rdg wit="#C">cat</rdg></app> jumps over the lazy <app> <lem wit="#C">dog</lem> <rdg wit="#B">cat</rdg></app>.</p>
TEI app. crit. as variant graph
Implications
We might decide that, since the transmission of B and C was independent, you can’t have two cats.
”The quick, brown cat jumps over the lazy cat.”
<p>The quick brown <app>
<lem wit="#A">fox</lem>
<rdg wit="#B">mouse</rdg>
<rdg xml:id="C1" wit="#C" exclude="#C2">cat</rdg></app> jumps over the lazy <app>
<lem wit="#C">dog</lem>
<rdg xml:id="C2" wit="#B" exclude="#C1">cat</rdg></app>.</p>
Implications
These aren’t simple, independent variations. There can be interdependencies. Imagine a German family of the tradition with two versions:
“Der schnelle braune Fuchs springt über den faulen Hund.”
“Die schnelle braune Katze springt über die faule Katze.”
If you have “Fuchs” the first word must be “Der”, if “Katze” then “Die”. “Die schnelle braune Fuchs...” would be another impossible text.
A TEI app. crit. represents a forking and rejoining of the text stream, a run of text for which there are multiple possibilities. These possibilities may be constrained by their context.
A TEI app. crit. entry is a type of annotation on the text, asserting that a particular source or authority has a different opinion about the text content.
or...
TEI app. crit. as annotation
<p>The quick brown <app>
<lem wit="#A">fox</lem>
<rdg wit="#B">mouse</rdg>
<rdg xml:id="C1" wit="#C" exclude="#C2">cat</rdg></app> jumps over the lazy <app>
<lem wit="#C">dog</lem>
<rdg xml:id="C2" wit="#B" exclude="#C1">cat</rdg></app>.</p>
“A says, and the editor agrees, that the fourth word is ‘fox’. B says that it is ‘mouse’, and C says that it is ‘cat‘.”
Note that the apparatus doesn’t have to be inline. It could be standoff and say the same thing.
TEI app. crit. as (standoff) annotation
<p>The quick brown fox jumps over the lazy dog.</p>
...
<listApp>
<app from="#match(//p[1],'fox')">
<lem wit="#A">fox</lem>
<rdg wit="#B">mouse</rdg>
<rdg xml:id="C1" wit="#C" exclude="#C2">cat</rdg>
</app>
<app from="#match(//p[1],'dog')">
<lem wit="#C">dog</lem>
<rdg xml:id="C2" wit="#B" exclude="#C1">cat</rdg>
</app>
</listApp>
What TEI app. crit. is not
-
NOT a superimposition of two or more complete texts.
- You shouldn‘t expect to be able to derive any individual source text from a TEI critical edition.
- Not a tool for comparing versions of a text.
- Not particularly automatable—designed to show a (human) editor‘s interpretation of a textual tradition.
All that said, it’s a data structure, and can be repurposed. Collatex uses it as a collation export format, for example.
What it might be—a provocation
If we accept that a TEI critical apparatus can be viewed as a sort of (optionally standoff) assertive annotation, then we might imagine using it to describe things other than textual variation. What about variant markup?
Most annotation formats, including TEI <note> and things like Web Annotation, only allow you to associate the content of the annotation with the thing annotated, not to say something positive about it, like “I think this is a place name”.
I’ll just leave this here...
<div type="textpart" subtype="chapter" n="1" xml:id="c1"> <p type="textpart" subtype="section" n="1" xml:id="c1s1"> <seg n="1" xml:id="c1s1p1">Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, Aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur.</seg>...</p></div>... <standoff> <listApp> <app from="#match(//seg[@xml:id='c1s1p1'],'Gallia')"> <rdg><placeName ref="https://pleiades.stoa.org/places/993" source="#Damon">Gallia</placeName></rdg> </app> </listApp> </standoff>
“Damon says that ‘Gallia’ in chapter 1, paragraph 1, segment 1 is a place name referencing Pleiades #993.”
This is (not) Spinal Tap:
Modeling to Prioritize Variance
Elisa Beshero-Bondar (@epyllia)

“Spine 2” by Buzz Spector:
polaroid of 33 books aligned at the spines, one per human vertebra
Spine work of a Stand-off Critical Apparatus
-
express a holistic view structured according to variant locations
- serve as ”nerve plexus” of data pointers for dynamic coordination of multiple editions

-
can be built up from computer-aided collation
- case study (in the following slides) from Frankenstein Variorum project
Variorum - modeling change over time
Inspiration for Frankenstein Variorum: Darwin Online (ed. Barbara Bordalejo), except...
- Frankenstein Variorum only compares five witnesses
- Frankenstein Variorum incorporates two MS witnesses + three print editions
- Frankenstein Variorum integrates by collation earlier digital editions made by others

algorithm for computer-aided collation, developed in 2009 workshop of collateX and Juxta developers.
-
Tokenization :
-
Break down the smallest unit of comparison: (words--with punctuation, or character-by-character): FV tokenizes words and includes punctuation
-
-
Normalization
-
('&' = 'and')
-
-
Alignment
-
Identify comparable divergence: what makes text sequences comparable units?
-
“Chunking” text into comparable passages (chapters/paragraphs that line up with identifiable start and end points). Collation proceeds chunk by chunk.
-
-
Analysis
-
(study output, correct, and re-align after machine process, AND refine automated processing)
-
-
Visualization
-
critical edition apparatus, graph displays
-
Gothenburg Model

FV: Tokenizing/normalizing S-GA diplomatic encoding
-
required XSLT resequencing of margin zones (follow @corresp values to @xml:ids)
-
required Python normalizing algorithm to suppress <line> from collation
Why collate the markup?
- Markup expresses conditions relevant for comparing texts
- Genetic markup with critical comparison:
- genetic markup is not incomparable with markup of print editions
-
genetic markup can answer scholarly research questions at critical scale
- MWS reworking the text: How guilty does Victor Frankenstein appear in 1816, 1818, 1820s after Percy's death, 1831?
- Which passages underwent the most intense, ”molten” transformations over time?
- What kind of influence did Percy Shelley have on Frankenstein‘s print editions?
Preparing marked-up texts for collation
-
Determine comparable markup of text structures across Variorum editions:
- volume (print editions only), letter, chapter
- paragraph, poetry line-groups and lines
- notes
-
Markup of manuscript events
included in Variorum comparison:
- deletion, insertion, gap
-
Normalizing algorithm:
- Decide what marks are equivalent
- ignore but preserve other markup in collation process, also abbreviations, capitalization.
-
”Chunking” algorithm:
(limit possibility of major misalignments)
- Locate ”seams” where all editions align
- Divide into ”chunks” at the seams
- Prep each edition as 33 collation ”chunks”, C01 - C33
- All files identified as the same chunk are collated together

- output of computer-aided collation (not TEI, but like it)
- build up variorum edition expressed in app-crit with flattened tags
TEI App-Crit on its way to becoming a Spine
<app xml:id="C10_app44">
<rdgGrp xml:id="C10_app44_rg1"
n="['<del>handsome<del>
<del>handsome<
del>beautiful.<del>handsome<del>beautiful;', 'great']"
<rdg wit="fMS"><lb n="c56-0045__main__23"/>
<del rend="strikethrough" sID="c56-0045__main__d2e9837"/>
handsome<del eID="c56-0045__main__d2e9837"/>
<mdel>.
</mdel><lb n="c56-0045__left_margin__1"/>
<del rend="strikethrough" sID="c56-0045__left_margin__d2e9853"/>handsome<
del eID="c56-0045__left_margin__d2e9853"/>beautiful.
<del rend="strikethrough" sID="c56-0045__main__d2e9865"/>
Handsome<del eID="c56-0045__main__d2e9865"/>
Beautiful; Great </rdg>
</rdgGrp>
<rdgGrp xml:id="C10_app44_rg2" n="['beautiful.', 'beautiful!—great']">
<rdg wit="f1818">beautiful. Beautiful!—Great </rdg>
<rdg wit="f1823">beautiful. Beautiful!—Great </rdg>
<rdg wit="fThomas">beautiful. Beautiful!—Great </rdg>
<rdg wit="f1831">beautiful. Beautiful!—Great </rdg>
</rdgGrp>
</app>
Collating with markup: handsome” / “beautiful” passage processed by collateX
an ugly but powerful Frankenstein creature of collation!
TEI advantage: Interchange (cf. Syd Bauman, “Interchange vs. Interoperability”):
”Human A” reading code written and documented by ”Human B” can understand how to adapt that code without consulting Human B.
-
Determine how to follow the “running stream” of semantically readable text to be compared with other editions.
-
Map the semantically comparable units in collation algorithm
- Mask the markup that isn't semantically comparable (MS surfaces, zones, lines)
- Decide on how to handle <add> and <del> markup:
TEI Interchangeability :: Collation of Markup
Doing the work of interchange:
- Do you want your critical apparatus to include deleted material?
- Or only the “finished” MS? (Mask the <del> elements, and preserve the <add> material)
<milestone unit="tei:p"/>
::
<p>. . . . . . </p>
-
Method 1: produce edition files from the app-crit with XSLT
- Plant TEI element (e.g. <seg>) to indicate variant locations, give each an @xml:id
- Build Spine by generating @target directly accessing <seg> elements
-
Method 2: point to pre-existing editions
- Programmatic search-work to find variant passages (not signalled in the edition markup)
- Build Spine with XPath and string-range indicators
XPointer Challenge: find the locations expressed in each app in the original editions
- Flatten markup for computer assisted collation
- Edit the output collation (Gothenberg Model process)
- XSLT Transformation A (pipeline): raise editions with “hotspots”
- Raise the flattened markup to reconstruct some editions, with marked <seg> elements
- Deal with overlapping hierarchies: (e.g. Molten passages cross paragraph boundaries): Output editions break into fragments around up-raised markup.
- XSLT Transformation B: construct the standoff spine with pointers:
- Convert collateX output critical apparatus to ”spine nerve plexus” holding XML pointers
- These point to the marked hotspots in the editions reconstructed in Pipeline A
- And point to xml:ids + string-ranges in external editions that were not generated by the process (e.g. FV pointing to Shelley-Godwin Archive)
Markup is text, after all!
Summary of Spine-Making:
-
“Spine” data model = standoff use of TEI critical apparatus:
- can include processed data, like maximum edit-distance, at each location
- can include data on normalization: e.g. normalized tokens used in collation process
- coordinates data on variance,
- points to specific locations in separate edition files


Comparing five versions of Frankenstein
Legend
MS
1818
Thm
1823
1831
Alignments, gaps, and comparative lengths of each collation unit
chapter heading or other structural boundary
For more on our document data modeling, see
Beshero-Bondar, Elisa E., and Raffaele Viglianti. “Stand-off Bridges in the Frankenstein Variorum Project: Interchange and Interoperability within TEI Markup Ecosystems.” Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Beshero-Bondar01.
”Preparing diversely encoded documents for collation challenges us to consider inconsistent and overlapping hierarchies as a tractable matter for computational alignment—where alignment becomes an organizing principle that fractures hierarchies, chunking if not atomizing them at the level of the smallest meaningfully sharable semantic features.”
”We have negotiated interchangeability by cutting across individual text hierarchies to emphasize lateral connections and commonalities—making a new TEI whose hierarchy serves as a stand-off ”spine” or ”switchboard” permitting comparison and sharing of common data. Our goal of pointing to aligned data required us to locate the interchangeable structural markers in our source documents.”
Publishing a Stand-off Critical Apparatus: Leveraging isomorphic representations across text and music notation
Raff Viglianti (@raffazizzi)


songscapes.org
Stand-off apparatus and
the representation of primary sources
<l>alas forsaken I Complaine;</l>
<l>Alas deserted I Complain,</l>
<l>Alas deserted I complain;</l>
BL Add. MS 53723
C 709
Folger L638
Variant
Songscapes stand-off collation
TEI (no XPointer in this case)
<TEI>
<div>
<head>Text Collation</head>
<app>
<rdgGrp>
<rdg wit="#BL_53723">
<ptr target="tei/Ariadne-BL_53723.xml#v1"/>
</rdg>
<rdg wit="#L638">
<ptr target="tei/Ariadne-L638.xml#v1"/>
</rdg>
</rdgGrp>
<rdg wit="#C709">
<ptr target="tei/Ariadne-C709.xml#v1"/>
</rdg>
</app>
</div>
</TEI>

+
BL Add. MS 53723

+
Folger L638
C 709
Adapted from: https://github.com/EarlyModernSongscapes/songscapes/blob/master/data/collations/Theseus%2C_O_Theseus%2C_hark!.xml
Songscapes stand-off collation
<TEI>
<div>
<head>Music Collation</head>
<notatedMusic>
<mei:mei> <!-- header -->
<mei:music><mei:body><mei:mdiv><mei:score>
<mei:app>
<mei:rdg source="#M-BL_53723"
target="mei/Ariadne-BL_53723.xml#m-101
mei/Ariadne-BL_53723.xml#m-106"/>
<mei:rdg source="#M-L638"
target="mei/Ariadne-L638.xml#m-101
mei/Ariadne-L638.xml#m-106"/>
</mei:app>
</mei:score></mei:mdiv></mei:body></mei:music>
</mei:mei>
</div>
</TEI>

+
BL Add. MS 53723

+
Folger L638
MEI
Adapted from: https://github.com/EarlyModernSongscapes/songscapes/blob/master/data/collations/Theseus%2C_O_Theseus%2C_hark!.xml
Publishing this kind of model
(including Frankenstein Variorum!)
- Typical TEI to HTML transformation would require transforming pointers too.
- Pointers need to be followed in response to user interaction.
<ptr target="MSC56.xml#string-range(//line[13],0,21)" />

?