HSS8004: Qualitative Methods and Critical Analysis in the Arts, Humanities and Social Sciences

Option 8: Digital Cultures IV: (Digital) Scholarly Editing

But who am I?

  • Senior Lecturer in SELLL in Late Medieval Literature and Digital Humanities
  • Especially interested in late medieval drama and digital scholarly editing
  • Elected member (and previous chair) of the Text Encoding Initiative Technical Council

What is this session about?

Its description is:

 

In this session the creation of scholarly editions will be introduced with an exploration of how the production of editions in the digital world has changed the requirements for a digital edition to be considered scholarly. The main standard in this area are the Guidelines of the Text Encoding Initiative (TEI), recommendations for encoding digital text from any time period, in any language and writing system. This session will give an overview of the TEI Guidelines and how they might be used to create a scholarly digital edition.

 

Its description is:

 

In this session the creation of scholarly editions will be introduced with an exploration of how the production of editions in the digital world has changed the requirements for a digital edition to be considered scholarly. The main standard in this area are the Guidelines of the Text Encoding Initiative (TEI), recommendations for encoding digital text from any time period, in any language and writing system. This session will give an overview of the TEI Guidelines and how they might be used to create a scholarly digital edition.

 

So to do that we will...

  • In part 1: 'requirements for a digital edition to be considered scholarly'
    • think about what makes up a scholarly edition
    • consider the possibilities provided by creating them digitally 
    • look at some requirements/checklists for scholarly digital editions
  • In part 2: 'overview of the TEI Guidelines'
    • look at what markup is and its form (and see some XML)
    • find out what the Text Encoding Initiative is
    • see the kinds of things the TEI Guidelines cover
    • think about what one might want to encode in a digital edition (and why)
  • In part 3: 'how they might be used to create a scholarly digital edition'
    • briefly look at software tools for publishing a scholarly digital edition
    • consider other solutions and approaches

Part 1:

Requirements for a digital edition to be considered scholarly

 

  • In part 1: 'requirements for a digital edition to be considered scholarly'
    • think about what makes up a scholarly edition
    • consider the possibilities provided by creating them digitally 
    • look at some requirements/checklists for scholarly digital editions

What features are necessary to make something a scholarly edition?

What is a scholarly edition?

"A scholarly edition is the critical representation of historic documents."

(Sahle, 2016) 

  • Representation: a whole spectrum of mediation between image facsimile and textual transcription

  • Critical: the application of scholarly knowledge and reasoning to the process of creating an edition

  • Documents: the non-abstract objects that are the subject of an edition

  • Historic: at a distance in time that their contents are not completely evident to the present-day reader

MLA: Committee On Scholarly Editions

  • it must account completely and responsibly for the textual landscape it represents;
  • it must fully describe and justify its editorial methods;
  • it should reveal the processes by which it was created and disseminated, and it should include a record of changes and updates made to the edition over time, which otherwise tend to remain invisible in the digital environment;
  • it should reveal the judgment and scholarship, the editorial rationales and processes, on which the edition is based;
  • it should evince a rigorous standard of accuracy and consistency in applying a particular editorial approach, set of theoretical premises, or method;
  • it should demonstrate the appropriate fit among stated methodology, stated goals of the edition (reconstructing authorial intent, reconstructing the social text, etc.), and the nature of the existing textual witnesses;
  • it should contain a detailed textual introduction or editorial policy statement, as distinguished from a critical introduction, that outlines these aspects; and
  • it should include consideration of how the edition can circulate and function as a scholarly resource over time.

What does doing an edition digitally enable us to do that we can't in a print edition?

(Group Exercise)

More on Digital Editions

  • “A digitised edition is not a digital edition”
  • “A digital edition cannot be given in print without significant loss of content and functionality.”
  • “Scholarly digital editions are scholarly editions that are guided by a digital paradigm in their theory, method and practice.”

Sperberg-McQueen 1994

  1. Electronic scholarly editions are worth having. And therefore it is worth thinking about the form they should take.

  2. Electronic scholarly editions should be accessible to the broadest audience possible. They should not require a particular type of computer, or a particular piece of software: unnecessary technical barriers to their use should be avoided.

  3. Electronic scholarly editions should have relatively long lives: at least as long as printed editions. They should not become technically obsolete before they are intellectually obsolete.

  4. Printed scholarly editions have developed their current forms in order to meet both intellectual requirements and to adapt to the characteristics of print publication. Electronic editions must meet the same intellectual needs. There is no reason to abandon traditional intellectual requirements merely because we are using a different medium to publish them.

  5. On the other hand, many conventions or requirements of traditional print editions reflect not the demands of readers or scholarship, but the difficulties of conveying complex information on printed pages without confusing or fatiguing the reader, or the financial exigencies of modern scholarly publishing. Such requirements need not be taken over at all, and must not be taken over thoughtlessly, into electronic editions.

Sperberg-McQueen 1994

  1. Electronic publications can, if suitably encoded and suitably supported by software, present the same text in many forms: as clear text, as diplomatic transcript of one witness or another, as critical reconstruction of an authorial text, with or without critical apparatus of variants, and with or without annotations aimed at the textual scholar, the historian, the literary scholar, the linguist, the graduate student, or the undergraduate. They can provide many more types of index than printed editions typically do. And so electronic editions can, in principle, address a larger audience than single print editions. In this respect, they may face even higher intellectual requirements than print editions, which typically need not attempt to provide annotations for such diverse readers.
  2. Print editions without apparatus, without documentation of editorial principles, and without decent typesetting are not acceptable substitutes for scholarly editions. Electronic editions without apparatus, without documentation of editorial principles, and without decent provision for suitable display are equally unacceptable for serious scholarly work.
  3. As a consequence, we must reject out of hand proposals to create electronic scholarly editions in the style of Project Gutenberg, which objects in principle to the provision of apparatus, and almost never indicates the sources, let alone the principles which have governed the transcription, of its texts.

MLA: CSE - Digital Scholarly Editions

  • it must note its technological choices and be aware of their implications, ideally using technologies appropriate to the goals of the edition (see fit between methods and goals, above), in recognition of the fact that technologies and methods are interrelated in that no technical decisions are innocent of methodological implications and vice versa;
  • it should be created and presented in ways ensuring the greatest chance of longevity—addressing this challenge involves infrastructural, financial, and data representation issues (such as the use of widely accepted, open standards);
  • it should readily respond to the challenge of maintaining the scholarly ability to be referenced in view of the ways that interfaces change over time; and
  • where possible, it should attend to possibilities of sampling, reuse, and remix, supporting approaches to the formation and curation of the edition such as reconstructing and documenting instances of texts and textual change over time, like algorithmic construction and reconstruction (with possible extensibility, including external data); in doing so, it should attempt to balance considerations for intellectual property and labor with the goals of achieving open access and reusability

(See MLA Report)

RIDE: Criteria for Reviewing
Scholarly Digital Editions

Detailed recommendations including: 

  1. Opening information about the SDE
    • About the reviewer, the editors, the SDE, general introduction and transparency
  2. Subject and content of the edition
    • Selection, previous and project's achievements, content
  3. Aims and methods
    • Documentation, objectives, mission, method, representation of texts, text criticism, indexing, data modelling
  4. Publication and presentation
    • Technical infrastructure, interfaces, searching, metadata, identification/citation, social, exports, basic data, licensing, etc.
  5. Conclusion
    • Terminology, realisation of aims, contribution to scholarship,  particularities, possible improvements

Part 2:

Overview of the TEI Guidelines

http://www.tei-c.org/

  • In part 2: 'overview of the TEI Guidelines'
    • look at what markup is and its form (and see some XML)
    • find out what the Text Encoding Initiative is
    • see the kinds of things the TEI Guidelines cover
    • think about what one might want to encode in a digital edition (and why)

About Markup

Markup is used in many different fields, for many different purposes: storing data, relating information, encoding understanding, preserving metadata

  • Markup is a way of making our knowledge or understanding about a text explicit
  • Markup makes strives to make explicit (to a machine) what is implicit (to a person)
  • Markup assists us in facilitating re-use of the same material:
    • in different formats
    • in different contexts
    • by different sorts of users

What can you tell about this text? (Assuming you don't know the language.)

What kind of text? What kind of edition?

A History of Digital Markup

Procedural Markup:
     RED INK ON; print "-£1000"; RED INK OFF

 

Presentational Markup:
 
   \textcolor{red}{-£1000}

 

Descriptive Markup:
 < measure unit=" pounds" value=" -1000">
   My current account is one thousand pounds in debt
  </ measure>

Descriptive Markup

  • It is usually more useful to mark up what we think things represent rather than what they look like.
  • Using descriptive markup enables us to make explicit the distinctions we want to make when processing a string of characters
  • It gives us a way of naming, characterising, and annotating textual data in a formalised way and recording this for re-use
  • Presentational markup cares more about fonts and layout than meaning
  • Descriptive markup says what things are, and usually leaves the rendition or processing of them for a separate step
  • Separating the form of something from its content makes its re-use more flexible
  • It also allows easy changes of presentation across a large number of documents
  • Also called 'Encoding' or 'Annotation'

Why do we use italic fonts?

Think about the uses for an italic font in any form of printed publication. Why might an author/publisher put some text into italics? What are they signalling about that text?

We can usually tell these types of things apart from context. If we want to use these categories, computers need to be told these things are different.

Some common uses include:

  • titles
  • emphasis
  • foreign phrases
  • technical terms
  • editorial apparatus, captions, cross references
  • quotations, speaker labels in drama
  • speech and thought 

                                                      ... and many more

What Should We Mark Up?

About XML Markup

XML is structured data represented as strings of text
XML looks like HTML, except that:

  • XML is extensible
  • XML must be well-formed
  • XML can be validated
  • XML is application-, platform-, and vendor- independent
  • XML empowers the content provider and facilitates data integration and migration
  • It is one of the best plain text long-term preservation formats for textual data that we have

About XML

<element> Text </element>

<element attribute="value">
Text or child elements here
</element>

<element attribute="value"/>

"Opening Tag"

"Closing Tag"

"Empty Element"

Attribute and Value in Opening Tag

About XML

<?xml version="1.0" ?>
<root xmlns="http://namespace/">
   <element attribute="value">
      content 
      <childElement type="empty"/>
      content
   </element>
   <!-- comment -->
</root>
<?xml version="1.0" encoding="utf-8" ?>
<div n="1">
   <head>SCENE I. On a ship at sea: a 
   tempestuous noise of thunder and lightning heard.</head>
   <stage>Enter a Master and a Boatswain</stage>
   <sp>
      <speaker>Master</speaker>
      <ab>Boatswain!</ab>
   </sp>
   <sp>
      <speaker>Boatswain</speaker>
      <ab>Here, master: what cheer?</ab>
   </sp>
   <sp>
      <speaker>Master</speaker>
      <ab>Good, speak to the mariners: fall to't, yarely,</ab>
      <ab>or we run ourselves aground: bestir, bestir.</ab>
   </sp>
   <stage>Exit</stage>
</div>

More About XML

  • An XML document is encoded as a linear string of characters
  • It begins with a special processing instruction
  • Element occurrences are marked by start and end-tags
  •  The characters < and & are Magic and must always be "escaped" using &lt; or &amp; if you want to use them as themselves
  • Comments are delimited by <!-- and -->
  • Attribute name/value pairs are supplied on the start-tag and may be given in any order
  • There are special attributes in the XML namespace like xml:id and xml:lang
  • Attribute values are always quoted
  • Everything is case-sensitive

Being Well-Formed

  • There is a single root node containing the whole of an XML document
  •  Each subtree is properly nested within the root node 
  •  Element/attribute names and values are always case sensitive
  •  Start-tags and end-tags are always mandatory (except there are combined start-and-end tags called 'empty elements' like <pb/> <gap/>)
  • Attribute values are always quoted

You can also be 'valid' which means you obey additional rules of a vocabulary like TEI about elements and attributes and where they can go.

XML Test

  •  <seg>some text</seg>
     
  •  <seg> <w>some</w> <hi>text</hi> </seg>
     

  •  <seg> <w>some <hi></w> text</hi> </seg>
     
  •  <seg type="text">some text</seg>
     
  •  <seg type=text>some text</seg>
     
  •  <seg type="text"> some text <seg/>
     
  •  <seg type="text"> some text<gap/> </seg>
     
  •  <seg type="text">some text</Seg>

The TEI (The Text Encoding Initiative) is:

  • An international consortium of institutions, projects and individual members
  • A community of users and volunteers
  • A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines' with definitions, examples, and discussion of over 560 markup distinctions
  • A mechanism for producing customized schemas for validating your project's digital texts
  • A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
  • A simple consensus-based way of organizing and structuring textual (and other) resources
  • An archival, well-understood, format for long-term preservation of digital data and metadata
  • Whatever you make it! It is a community-driven standard

What is happening here?

Simple Editorial Changes

  • The core module provides some phrase-level elements which may be used to record simple editorial interventions.
  • <choice> groups alternative encodings for the same point in a text
    • Abbreviations:
      • ​<abbr> abbreviated form
      • ​<expan> expanded form
    • ​Errors:
      • <sic> apparent error
      • ​<corr> corrected error
    • ​Regularization:
      • <orig> original form
      • <reg> regularized form  

Abbreviation and Expansion

Mr <expan>William</expan>
<lb />
<expan>Shakespeare</expan>
Mr <choice>
 <abbr>W<am rend="abbr-sup">m</am></abbr>
 <expan>W<ex>illia</ex>m</expan>
</choice>
<lb />
<choice>
 <abbr>Shakes<am rend="abbr-per">p</am>e</abbr>
 <expan>Shakes<ex>pear</ex>e</expan>
</choice>

Multiple Witnesses -- Critical Apparatus

  • <app> an entry in a critical apparatus

  • <lem> (optional) a lemma or base text

  • <rdg> a single reading within a textual variation

Examples of What Kinds of Documents People Encode in TEI

The TEI takes a generalistic approach to overall text structure and this means it should be able to cope with texts of any size, language, date, complexity, writing system, or media.

This could be in any form: books, journals, manuscripts, postcards, letters, rolls of papyrus, clay tablets, web pages, gravestones, etc. and contain any type of text.

Punch Magazine: a variety of content forms

Holinshed's Chronicles: columns, marginal notes, woodcuts

First Folio: 
forme-work, catchwords, decorative initials, etc.

Wilfred Owen: manuscripts, corrections, multiple versions

George Herbert: Graphic text layout, poetry

William Godwin's Diary: diary structure, abbreviated texts

Wilfred Owen: Letters, codewords

Print and Digital Dictionaries: 

entries, sense, etymologies, quotations, etc.

Epigraphical Texts: partial letters, supplied text, physical description

WW1 Propaganda: font, colour, glyph substitution, image classification and metadata

Various writing systems: Unicode/non-Unicode characters, right-to-left, reversing lines, etc.

What Would You Encode (And Why)?

For the material given make a list of the textual phenomena that you think are important to mark up. 

  • Make a list of textual phenomena and metadata that are important to capture
  • How likely is it that you can mark these up reliably and consistently?
  • Might it be possible to mark some of these up automatically by a script? 

Pretend an authoritarian anti-intellectual government has come to power and, through a series of bad decisions, has to slash your project funding by 50%.  What do you do?

  • Do you do half the amount of material in the same depth?
  • Markup less?
  • Invest in more semi-automatic markup?
  • Something else?

Repeat the exercise.

Part 3:

Creating a scholarly digital edition

 

  • In part 3: 'how they might be used to create a scholarly digital edition'
    • briefly look at software tools for publishing a scholarly digital edition
    • consider other solutions and approaches

Publishing TEI

  • There are many tools available e.g.:
    • Edition Visualization Technology
    • TEI Boilerplate
    • TEI Critical Edition Toolbox
    • TEI-C Stylesheets
    • OxGarage
    • CETEIcean
    • eXist-db 
    • TAPAS project
  • The tools you use may affect the features you can display to those reading your research and you may have more or less ability to customise

Edition Visualization Technology

  • Easy publication for multi-witness critical editions
  • Critical Edition support: rich and expandable critical apparatus, variant heat map, witnesses collation and variant filtering
  • Bookmark: direct reference to the current view of the web application, page and edition level, collated witnesses and selected apparatus entry
  • High level of customization: the editor can customize both the user interface layout and the appearance of the graphical components
  • https://visualizationtechnology.wordpress.com/

TEI Boilerplate

  • TEI Boilerplate gives in-browser conversion of TEI P5 XML using a simple XSL Stylesheet processing instruction
  • It transforms elements to HTML necessary for display of images, making links clickable, etc
  • Works in all major browsers
  • Works well for small, simple, individual web pages
  • Uses standard customisable CSS but also pays attention to CSS in TEI <rendition> elements
  • Viewing the web page source gives access to your TEI
  • http://teiboilerplate.org/

TEI Critical Apparatus Toolbox

  • Based on TEI Boilerplate
  • The toolbox lets you:
    • Check your encoding: offers facilities to display your edition while it is still in the making, and check the consistency of your encoding
    • Display parallel versions: choose the sigla of the witnesses, and the different versions of the text, following each chosen witness, will be displayed in parallel columns.
  • http://ciham-digital.huma-num.fr/teitoolbox/

TEI-C Stylesheets

  • Freely available, generalised XSLT stylesheets
  • Transformations to and/or from around 40 formats such as:
    • BibTeX, COCOA, CSV, DocBook, DocX (MS Word), DTD, EPub, XSL-FO, HTML, JSON, LaTeX, Markdown, NLM, ODT, PDF, RDF, RelaxNG, RNC, Schematron, Slides, TEI Lite, TEI ODD, TEI P4, TEI simplePrint, TCP, Text, Wordpress, XLSX (MS Excel), XSD
  • Customisable through importing and overwriting templates; Stylesheets repository allows for local 'profiles'
  • TEI-C offers services such as OxGarage which enable pipelined conversion to/from many more formats
  • https://github.com/TEIC/Stylesheets

CETEIcean

  •  CETEIcean is a Javascript (ES6) library that enables TEI P5 XML to be displayed in a web browser without transforming them to HTML
  • Instead it registers them with the browser as Custom Elements
  • Because the elements are treated as HTML, the HTML it produces is valid, and there are not element name collisions (like HTML <p> vs. TEI <p>)
  • http://github.com/TEIC/CETEIcean

Your Edition or Web Page Template

Embedded divisions of custom HTML elements

CETEIcean
​JavaScript

eXist-db TEI Publisher

  • The "instant publishing toolbox" based on eXist-db XML database 
  • Provides easy browsing and search of TEI XML documents initially built for TEI simplePrint
  • Default display is clean and sophisticated  page-by-page display
  • Control of element display is by editing the processing model documentation embedded in the TEI ODD (the TEI customisation format)

 

https://teipublisher.com

TAPAS Project

  • The TAPAS project: TEI Archiving, Publishing, and Access Service hosted by Northeastern University Library's Digital Scholarship Group
  • A free account can contribute to projects and collections in TAPAS to archive, publish, discover or share their TEI files
  • Built in XSLT transformations
  • TEI Members (or paid TAPAS membership) can create collections and projects
  • 1GB of XML file storage for TEI files, TEI ODD Customisations
  • http://tapasproject.org/

Creating TEI

  • How do people create TEI Files?
    • hand encoding
      • as individuals, through a variety of editors, etc.
    • through an interface
      • whether forms, tags-off views in editors, etc.
    • up-conversion
      • of materials created in other formats, such as docx, xlsx, etc.

Up-Conversion

Or you could...

  • Roll your own:
    • Using scripting languages like XSLT or XQuery, generate published versions (web, print, etc.)
  • Do not publish:
    • By creating TEI you are creating a textual dataset that you can then query, analyse, or otherwise exploit. There is not a 1-to-1 relationship between creating a TEI file and a single 'Digital Edition' output

Summary

  • In part 1: 'requirements for a digital edition to be considered scholarly'
    • think about what makes up a scholarly edition
    • consider the possibilities provided by creating them digitally 
    • look at some requirements/checklists for scholarly digital editions
  • In part 2: 'overview of the TEI Guidelines'
    • look at what markup is and its form (and see some XML)
    • find out what the Text Encoding Initiative is
    • see the kinds of things the TEI Guidelines cover
    • think about what one might want to encode in a digital edition (and why)
  • In part 3: 'how they might be used to create a scholarly digital edition'
    • briefly look at software tools for publishing a scholarly digital edition
    • consider other solutions and approaches

Digital Cultures IV: (Digital) Scholarly Editing

By James Cummings

Digital Cultures IV: (Digital) Scholarly Editing

HSS8004: Qualitative Methods and Critical Analysis in the Arts, Humanities and Social Sciences Option 8: Digital Cultures IV: Scholarly Editing In this session the creation of scholarly editions will be introduced with an exploration of how the production of editions in the digital world has changed the requirements for a digital edition to be considered scholarly. The main standard in this area are the Guidelines of the Text Encoding Initiative (TEI), recommendations for encoding digital text from any time period, in any language and writing system. This session will give an overview of the TEI Guidelines and how they might be used to create a scholarly digital edition.

  • 1,275