Bicentennial Bits and Bytes

The Pittsburgh Digital Frankenstein Project

Rikk Mulligan | Elisa Beshero-Bondar |   Matt Lavin    | Jon Klancher

    @CritRikk    |        @epyllia                 |  @mjlavin80    |  @jklancher

MLA 2018: Saturday Jan. 6 @ 3:30pm; Sheraton Riverside Suite

Link to these slides: http://bit.ly/BicFrankMLA18​

A Patchwork Team

  • Elisa Beshero-Bondar, Director, Center for the Digital Text, University of Pittsburgh at Greensburg

  • Jon Klancher, English Department, Carnegie Mellon University

  • Matt Lavin, Director, Digital Media Lab, University of Pittsburgh

  • Rikk Mulligan, University Libraries, Carnegie Mellon University

  • Raff Viglianti, Maryland Institute for Technology in the Humanities (MITH), University of Maryland
  • Scott Weingart, Program Director of Digital Humanities, University Libraries, Carnegie Mellon University

What We Contribute

  • Elisa Beshero-Bondar: Romanticist; Textual Scholar; TEI architecture and collation  

  • Jon Klancher: Romanticist; Book Historian; Annotations

  • Matt Lavin: 19th Century Americanist; Textual Analysis, Stylometry

  • Rikk Mulligan: 20th Century Americanist; Web Coding, Interface Design

  • Raffaele Viglianti: Research Programmer; Shelley-Godwin Archive encoding;
    TEI pointers to S-GA Notebooks
  • Scott Weingart: Early Modernist, History of Science; Textual Analysis, Stylometry

Illustrations by Bernie Wrightson.
Frankenstein. Mary Wollstonecraft Shelley. Marvel Comics 1983.
images from Dark Horse Comics 2008 reprint.

Print publications 

  • 1818 Edition (3 volumes)

  • 1823 Edition (2 volumes)

  • 1831 Edition (1/2 of a volume)

    • bound with Friedrich von Schiller's The Ghost Seer in Bentley's Standard Series of novels)

known/authorized by MWS

Illustrations by Bernie Wrightson.
Frankenstein. Mary Wollstonecraft Shelley. Marvel Comics 1983.
images from Dark Horse Comics 2008 reprint.

Digital Sources

Illustrations by Bernie Wrightson.
Frankenstein. Mary Wollstonecraft Shelley. Marvel Comics 1983.
images from Dark Horse Comics 2008 reprint.

  Rieger: inline collation of "Thomas" w/ 1818,
1831 variants in endnotes

Legend:

Curran and Lynch: PA Electronic Edition ( PAEE) , collation of 1818 and 1831: HTML

Crook crit. ed of 1818,  variants of "Thomas",   1823, and 1831 in endnotes (P&C MWS collected works)

Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta

1974

~mid-1990s

1996

C. Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts

2007

Shelley-Godwin Archive publishes diplomatic edition of 1816 ms drafts

print edition

digital edition

Legend:

2013

2017

Critical and Diplomatic Editions Leading to the Pgh Frankenstein Project

Pittsburgh Bicentennial Frankenstein Project begins:

assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text

Evolving Project

Returning to the original texts to produce:

  • Clean Text files for each edition (1818, 1823, 1831)

  • TEI XML files for each edition

  • Comprehensive collations from ms through 1831 to bridge and build on previous critical editions (print and digital)

  • Variorum interface to show changes over time

  • Stylometric analysis

  • Annotations

Variorum project

 

  • Manuscript:  
    (Notebooks: Abinger c56, c57, c58)

  • "Thomas copy" Edition  (1818 edition with hand annotations by Mary Shelley)

  • 1818 Edition (3 volumes)

  • 1823 Edition (2 volumes)

  • 1831 Edition  (1/2 of a volume)

Illustrations by Bernie Wrightson.
Frankenstein. Mary Wollstonecraft Shelley. Marvel Comics 1983.
images from Dark Horse Comics 2008 reprint.

Building a Digital Variorum

Elisa Beshero-Bondar

@epyllia

 

  • Can we make an edition that conveniently compares the manuscripts to the print publications?
     

  • Can we make a comprehensive collation to show changes to the novel over time, from 1816 to 1831?

    • How many versions? (5 and a bit?)

    • Which editorial interventions persist from 1816 to 1831?

      • MWS in the "Thomas" copy: how much of this persists into 1831?

      • PBS's additions: which/how many of these persist to 1831?

      • What parts of the novel were most mutable?  


 

Motivating Questions

  Rieger: inline collation of "Thomas" w/ 1818,
1831 variants in endnotes

Legend:

Curran and Lynch: PA Electronic Edition ( PAEE) , collation of 1818 and 1831: HTML

Crook crit. ed of 1818,  variants of "Thomas",   1823, and 1831 in endnotes (P&C MWS collected works)

Romantic Circles TEI conversion of PAEE ; separates the texts of 1818 and 1831; collation via Juxta

1974

~mid-1990s

1996

C. Robinson, The Frankenstein Notebooks (Garland): print facsimile of 1816 ms drafts

2007

Shelley-Godwin Archive publishes diplomatic edition of 1816 ms drafts

print edition

digital edition

Legend:

2013

2017

Our Project Genealogy:

Critical and Diplomatic Editions Leading to the Pgh Frankenstein Project

Pittsburgh Bicentennial Frankenstein Project begins:

assembly/proof-correcting of PAEE files; OCR/proof-correcting 1823; "bridge" TEI edition of S-GA notebook files; automated collation; incorporating "Thomas" copy text

The dream of the 90s. . .

Hypertext / Hypercard books and the PAEE

  • Accessing (reading, writing, editing)  texts in nonlinear ways

  • Multiplying and individualizing  points of access

  • Roughly contemporary with the PAEE + mid '90s scholarly edition efforts
  • What if the female creature survived and had a chance to create her own story with lots of options?
  • Experimental nonlinear navigation...hundreds of hypercards...plot your own course

 

The dream of the '90s:

Frankenstein's inspiration for hypertext experiment

PAEE: Hypertext Collation Experiment

hundreds of small html files, juxtaposed in frames

The dream of the '90s is alive...

(in Pittsburgh)

Digital Collation for a "Variorum" interface

  • select a text from what version the reader chooses:
    • 1816 MS | 1818 | "Thomas" | 1823 | 1831
  • compare that text to what version the reader chooses
  • view the "molten" portions of the novel in context with the stable portions
  • navigate multiple texts in context with one another
  • make the critical apparatus a vantage point:
    see how the novel changed over time without having to find the fine-print endnotes

The Creature of Collation?

 We make newly formed text "bodies"  from disparately formed source materials.

source: I programmer article on "Frankenstein" malware

  • a community-maintained standard
  • 1987 @ Vassar: draft of Poughkeepsie Principles

    provide a standard format for data interchange in humanities research.

    • Guidelines for the Encoding and Interchange of Machine Readable Texts: first drafted 1990; published on the web by 1999 (P3)
    •  Standards for encoding texts co-evolve with standards for developing human and machine-readable markup languages
      • HTML (w3c)  ||  (early) SGML and XML
  • TEI XML tree structure:
    • meant to store a stable format not subject to  commercial processing requirements
  • possible to publish TEI directly or convert to HTML; PDF; TEX; other document formats. 
  1. Small pieces are optimal for collation.

  2. There is no single "complete" edition.

  3. Each output (plain text, XML, TEI collation) = viable edition on its own.

  4. Interface invites the user to play: put the pieces together.

From PAEE to Pgh Variorum...

values in common

image source: a friend's Lego set

  • Reconcile multiple kinds of text encoding:

    • old '90s HTML   (1818, 1831)  

    • not-so-plain OCR-generated text (1823)

    • TEI XML for manuscripts: (S-GA diplomatic edition)

Pittsburgh's bridges (1963) 

Source: NewsCastic.com

A Bridge-Building Challenge

  • Construct "Bridge" XML for collation
    • Markup-assisted machine collation (collateX):
    • "flattened" XML hierarchies for even collation units
    • ms metadata markup (e.g. "hands") to ignore in collation, but preserve in the output
    • pointers outward to manuscript editions (S-GA, Morgan Library)

Collation "stitchery"

  • Can be done by hand in TEI
  • Automated: via CollateX

    • Algorithms for locating union and  "delta" points in "streams" of text

    • Inputs in a variety of formats (XML/TEI, plain text, JSON)

  • Output / Visualization options:

    • Text table (above); SVG flow chart; XML

    • JuxtaCommons on the web

    • Develop a custom web interface (via XML output)

image source: S-GA

A running text stream...?

Or an architecture of bridges?

(collateX SVG output)

XML collation:  flagging variants and Percy's hand

Stylometry and Digital Frankenstein

Matthew Lavin

University of Pittsburgh

@mjlavin80

Research Questions

  1. How does Frankenstein change stylistically across different expressions/manifestations?
     

  2. How can those changes be attributed and/or characterized?

    • In whose authorial voice is Frankenstein? Direct analysis of Percy and Mary
       

  3. Do stylistic changes affect how Frankenstein reads in relation to cultural categories like genre, “modernness,” linguistic register generally, and scientific vocabulary? If so, how?

Outline of Exploratory Measures

In notebooks, term counts/relative frequencies of:

  • Mary’s hand initial

  • Mary’s hand strikethrough

  • Percy’s hand suggested vs. adopted

  • Mary’s hand revised (sometimes Mary ver1, ver2, final, etc.)

Across our three print editions:  

  • Term counts/relative frequencies of each text

  • Term frequencies weighted against frequency across all documents (TF-IDF)

Term Counts

Absolute Values of Term Count Differences

across Editions,

1818 to 1823 (left) and

1823 to 1831 (right)

Relative

Term Frequencies

 

Absolute Value of Weighted Term Frequency Differences across Editions (tf-idf), 1818 to 1823 (left) and 1823 to 1831 (right)

Collational Alignment

Types of Changes:

 

  • Spelling normalizations

  • Punctuation

  • Word insertions, substitutions, deletions

  • Word to phrase or phrase to word

  • Reordering

Shelley-Godwin Notebooks

Image courtesy of shelleygodwinarchive.org

 

Punctuation Matters … but not for all measures

 

Image courtesy of dailywritingtips.com

Dynamics of DH Collaboration

The workflows and analytical paradigms of “machine learning DH” and “scholarly editing DH” are not factory fitted to one another, but they can be adapted to work in tandem. The gains are more valuable than the cost of the retrofit.

Dynamics of DH Collaboration

How can a single, carefully curated edition or set of editions be worked into a “macroanalysis” model where many uncorrected, dirty OCR texts are being compared to one another?

 

What kinds of  questions can we ask with hand-corrected editions that we cannot ask with HTRC corpora?

 

Open Data and Reproducibility

I have argued elsewhere that openness invites open discussion and collaboration. It doesn't guarantee that these things will happen, but closed data practices all but guarantee that these practices will be difficult or impossible.

 

Next Steps: Questions/Methods

How can we characterize changes by trends established in analysis of each person’s hand?
 

How can we think about changes as moving closer or further away from a genre baseline?
 

How do index quantifications like “how modern” or “how scientific” each version of the text is? How do we account for “modern” and “scientific” as rapidly changing ideas?

 

Jon Klancher

@jklancher

Source: Web Annotation Data Model
(w3c Recommendation of 23 Feb 2017)

Annotated Print Editions of Frankenstein

 

1993     Leonard Wolf, ed., The Essential Frankenstein: The Definitive, Annotated Edition of Mary Shelley’s Classic Novel (New York: Plume). (1st edition as The Annotated  Frankenstein, 1977)

 

2012     Susan J. Wolfson and Ronald L. Levao, The Annotated Frankenstein (Cambridge, MA: Belknap Press of Harvard University).

 

2017     Leslie S. Klinger, ed., The New Annotated Frankenstein (New York: Liveright/Norton).

 

2017     David G. Guston, ed., Frankenstein: Annotated for Scientists, Engineers, and Creators of  All Kinds (Cambridge, MA: MIT Press).

 Susan J. Wolfson and Ronald L. Levao, The Annotated Frankenstein (Cambridge, MA: Belknap Press, 2012)

2017     Leslie S. Klinger, ed., The New Annotated Frankenstein (New York: Liveright/Norton).

This annotation is the verbatim 1831 altered text.

Wolfson annotation:

Published in 1791, in the wake of the French Revolution (Volney was part of the Revolutionary government), Les Ruines; ou Meditation sur les revolutions des empires appeared in English as Ruins, or Meditations on the Revolutions of Empires, in 1792.

Klinger annotation:

More properly, The Ruins, Or, Meditation on the Revolutions of Empires; and the Laws of Nature, by Constantin-François Chasseboeuf, who took the name Volney, published in 1791 in French. It was translated in 1802 into English. The book is described by Frankenstein scholar Pamela Clemit as a “powerful Enlightenment critique of ancient and modern governments as tyrannical and supported by religious fraud” (“Frankenstein, Matilda, and the Legacies of Godwin and Wollstonecraft,” in The Cambridge Companion to Mary Shelley, ed. Esther Schor [Cambridge: Cambridge University Press, 2003] 35.)

  In light of the date of translation, the book in question must have been the French edition, and Safie and the creature learned French….

 Our Frankenstein Variorum annotation:

Of the books the Creature hears read aloud in the forest, Volney's The Ruins; or, A Survey of the Revolutions of Empires (1792) was the most closely associated with Europe's radical Enlightenment. (It was first published in French as Les Ruines: ou Meditation sur les revolutions des empires in 1791.)  The Creature learns an illuminating critique of imperialism and exploitation from Volney, even as he also absorbs some of the Enlightenment's own prejudices ("slothful Asiatics"). The effect on the Creature is to give him a sense of the social or structural and not only a personal framework for understanding virtue and suffering. On Volney’s role in the novel, see also Ian Balfour, "Allegories of Origins: Frankenstein after the Enlightenment," SEL: Studies in English Literature 1500-1900 56.4 (2016): 777-98.

using the hypothes.is tool for digital annotation with tags

hypothes.is: all tags so far...

Annotations that Tunnel through the Texts

(not only pointing outside)

domestic affection
(Walton - Margaret Seville)

domestic affection
(DeLaceys and Safie)

domestic affection
(Frankenstein family)

travel/expedition: Walton
 

travel/expedition: Victor
 

travel/expedition: Clerval
 

travel/expedition: Creature
 

law / judicial system

(Justine)
 

law / judicial system

Felix DeLacey
 

law / judicial system

Victor/Kirwin
 

Annotations in the Variorum Interface

Frankenstein's invitation/challenge:

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
      <fileDesc>
         <!--METADATA -->
      </fileDesc>
  </teiHeader>
  <text>
      <body>
         <!--DOCUMENT-->
      </body>
  </text>
</TEI>
  • Build Digitally:
    • Experiment with human and machine reading
  • Build a Strong, Sustainable Bridge: 
    • update Romantic Circles edition
    • interlinks to Shelley-Godwin Archive Notebooks: point to ms pages
    • Morgan Library "Thomas copy"
  • Centralize the Critical Apparatus
    • ​a tool for scholars
    • a metanarrative?
    • a remixing of the reading process for all who care about Frankenstein
    • make all the texts available to all the readers

The work continues...

  • Collation
  • Annotation
  • Stylometry
  • Visualization and Variorum Interface