The TEI Community, Guidelines, and Software

Dr James Cummings

@jamescummings
http://slides.com/jamescummings/community

CC+By

TEI Community

Open Aspects of the TEI Community

TEI Board & Technical Council

  • TEI Board: Overall governance, strategic and financial oversight; 5 members elected in 2 year staggered terms

 

  • TEI Technical Council: Maintenance of TEI Guidelines (releases twice a year) and associated software and systems; 11 members elected in 2 year staggered terms

 

  • Appointed officers: Treasurer, Webmaster, Assistant Webmaster

TEI Consortium Membership

1 year free membership for taking part in certain openly-advertised members training events

The TEI Guidelines

The TEI Guidelines

 1 The TEI Infrastructure
 2 The TEI Header
 3 Elements Available in All TEI Documents
 4 Default Text Structure
 5 Characters, Glyphs, and Writing Modes
 6 Verse
 7 Performance Texts
 8 Transcriptions of Speech
 9 Dictionaries
10 Manuscript Description
11 Representation of Primary Sources

12 Critical Apparatus

13 Names, Dates, People, and Places
14 Tables, Formulæ, Graphics and Notated Music
15 Language Corpora
16 Linking, Segmentation, and Alignment
17 Simple Analytic Mechanisms
18 Feature Structures
19 Graphs, Networks, and Trees
20 Non-hierarchical Structures
21 Certainty, Precision, and Responsibility
22 Documentation Elements
23 Using the TEI

Don't forget the appendices:

Appendix A Model Classes
Appendix B Attribute Classes
Appendix C Elements
Appendix D Attributes
Appendix E Datatypes and Other Macros
Appendix F Bibliography
Appendix G Prefatory Notes
Appendix H Colophon

1 The TEI Infrastructure

This chapter describes the infrastructure for the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and the means by which that conceptual framework is implemented.

 

Sections include:

  • TEI Modules
  • Defining a TEI Schema
  • The TEI Class System
  • Macros
  • The TEI Infrastructure Module

2 The TEI Header

This chapter addresses the problems of describing an encoded work so that the text itself, its source, its encoding, and its revisions are all thoroughly documented.

Sections include:

  • Organization of the TEI Header
  • The File Description
  • The Encoding Description
  • The Profile Description
  • The Revision Description
  • Minimal and Recommended Headers
  • Note for Library Cataloguers
  • The TEI Header Module

3 Elements Available in All TEI Documents

This chapter describes elements often appear in any kind of text and the tags used to mark them in all TEI documents.

Sections include:

  • Paragraphs
  • Treatment of Punctuation
  • Highlighting and Quotation
  • Simple Editorial Changes
  • Names, Numbers, Dates, Abbreviations, and Addresses
  • Simple Links and Cross-References
  • Lists
  • Notes, Annotation, and Indexing
  • Graphics and Other Non-textual Components
  • Reference Systems
  • Bibliographic Citations and References
  • Passages of Verse or Drama
  • Overview of the Core Module

4 Default Text Structure

This chapter describes the default high-level structure for TEI documents. A full TEI document combines metadata describing it, represented by a teiHeader element, with the document itself, represented by a text element.

Sections include:

  • Divisions of the Body
  • Elements Common to All Divisions
  • Grouped and Floating Texts
  • Virtual Divisions
  • Front Matter
  • Title Pages
  • Back Matter
  • Module for Default Text Structure

5 Characters, Glyphs, and Writing Modes

Text encoders sometimes find that the published repertoire of Unicode characters is inadequate to their needs with ancient languages or recording particular variant glyph forms.

Sections include:

  • Is Your Journey Really Necessary?
  • Markup Constructs for Representation of Characters and Glyphs
  • Annotating Characters
  • Adding New Characters
  • How to Use Code Points from the Private Use Area
  • Writing Modes
  • Examples of Different Writing Modes
  • Text Rotation
  • Caveat
  • Formal Definition

6 Verse

This module is intended for use when encoding texts which are entirely or predominantly in verse, and for which the elements for encoding verse structure already provided by the core module are inadequate.

Sections include:

  • Structural Divisions of Verse Texts
  • Components of the Verse Line
  • Rhyme and Metrical Analysis
  • Rhyme
  • Metrical Notation Declaration
  • Encoding Procedures for Other Verse Features
  • Module for Verse

7 Performance Texts

This module is intended for use when encoding printed dramatic texts, screen plays or radio scripts, and written transcriptions of any other form of performance.

Sections include:

  • Front and Back Matter
  • The Body of a Performance Text
  • Other Types of Performance Text
  • Module for Performance Texts

8 Transcriptions of Speech

The module described in this chapter is intended for use with a wide variety of transcribed spoken material.

Sections include:

  • General Considerations and Overview
  • Documenting the Source of Transcribed Speech
  • Elements Unique to Spoken Texts
  • Elements Defined Elsewhere
  • Module for Transcribed Speech

9 Dictionaries

This chapter defines a module for encoding lexical resources of all kinds, in particular human-oriented monolingual and multilingual dictionaries, glossaries, and similar documents.

Sections include:

  • Dictionary Body and Overall Structure
  • The Structure of Dictionary Entries
  • Top-level Constituents of Entries
  • Headword and Pronunciation References
  • Typographic and Lexical Information in Dictionary Data
  • Unstructured Entries
  • The Dictionary Module

10 Manuscript Description

This module defines a special purpose element which can be used to provide detailed descriptive information about handwritten (and other unique) primary sources.

Sections include:

  • Overview
  • The Manuscript Description Element
  • Phrase-level Elements
  • The Manuscript Identifier
  • The Manuscript Heading
  • Intellectual Content
  • Physical Description
  • History
  • Additional Information
  • Manuscript Parts
  • Module for Manuscript Description

11 Representation of Primary Sources

This chapter defines a module intended for use in the representation of primary sources, such as manuscripts or other written materials.

Sections include:

  • Digital Facsimiles
  • Combining Transcription with Facsimile
  • Scope of Transcriptions
  • Advanced Uses of surface and zone
  • Aspects of Layout
  • Headers, Footers, and Similar Matter
  • Changes
  • Other Primary Source Features not Covered in these Guidelines
  • Module for Transcription of Primary Sources

12 Critical Apparatus


This chapter defines a module for use in encoding an apparatus of variants for scholarly editions, which may be used in conjunction with any of the modules defined in these Guidelines.

Sections include:

  • The Apparatus Entry, Readings, and Witnesses
  • Linking the Apparatus to the Text
  • Using Apparatus Elements in Transcriptions
  • Module for Critical Apparatus

13 Names, Dates, People, and Places

This chapter describes a module which may be used for the encoding of names and other phrases descriptive of persons, places, or organizations, in a manner more detailed than that possible using the elements already provided for these purposes in the Core module.

Sections include:

  • Attribute Classes Defined by This Module
  • Names [e.g. personal, place and organisational names]
  • Biographical and Prosopographical Data
  • Module for Names and Dates

14 Tables, Formulæ, Graphics and Notated Music

Many documents, both historical and contemporary, include not only text, but also graphics, artwork, and other images. Since they may frequently be most conveniently encoded and processed using external notations, they are dealt with together.

Sections include:

  • Tables
  • Formulæ and Mathematical Expressions
  • Notated Music in Written Text
  • Specific Elements for Graphic Images
  • Overview of Basic Graphics Concepts
  • Graphic Image Formats
  • Module for Tables, Formulæ, Notated Music, and Graphics

15 Language Corpora

This chapter discusses language corpora, with the distinguishing characteristic of any individual corpus is that its components have been selected or structured according to some conscious set of design criteria.

Sections include:

  • Varieties of Composite Text
  • Contextual Information
  • Associating Contextual Information with a Text
  • Linguistic Annotation of Corpora
  • Recommendations for the Encoding of Large Corpora
  • Module for Language Corpora

16 Linking, Segmentation, and Alignment

This chapter discusses a number of ways in which encoders may represent analyses of the structure of a text which are not necessarily linear or hierarchic.

Sections include:

  • Links
  • Pointing Mechanisms
  • Blocks, Segments, and Anchors
  • Correspondence and Alignment
  • Synchronization
  • Identical Elements and Virtual Copies
  • Aggregation
  • Alternation
  • Stand-off Markup
  • Connecting Analytic and Textual Markup
  • Module for Linking, Segmentation, and Alignment

17 Simple Analytic Mechanisms

This chapter describes a module for associating simple analyses and interpretations with text elements. We use the term analysis here to refer to any kind of semantic or syntactic interpretation which an encoder wishes to attach to all or part of a text.

Sections include:

  • Linguistic Segment Categories
  • Global Attributes for Simple Analyses
  • Spans and Interpretations
  • Linguistic Annotation
  • Module for Analysis and Interpretation

18 Feature Structures

A feature structure is a general purpose data structure which identifies and groups together individual features, each of which associates a name with one or more values.

Sections include:

  • Organization of this Chapter
  • Elementary Feature Structures and the Binary Feature Value
  • Other Atomic Feature Values
  • Feature Libraries and Feature-Value Libraries
  • Feature Structures as Complex Feature Values
  • Re-entrant Feature Structures
  • Collections as Complex Feature Values
  • Feature Value Expressions
  • Default Values
  • Linking Text and Analysis
  • Feature System Declaration

19 Graphs, Networks, and Trees

Graphical representations are widely used for displaying relations among informational units because they help readers to visualize those relations and hence to understand them better. 


Sections include:

  • Graphs and Digraphs
  • Trees
  • Another Tree Notation
  • Representing Textual Transmission
  • Module for Graphs, Networks, and Trees

20 Non-hierarchical Structures


XML employs a strongly hierarchical document model. At various points, these Guidelines discuss problems that arise when using XML to encode textual features that either do not naturally lend themselves to representation in a strictly hierarchical form or conflict with other hierarchies represented in the markup.

Sections include:

  • Multiple Encodings of the Same Information
  • Boundary Marking with Empty Elements
  • Fragmentation and Reconstitution of Virtual Elements
  • Stand-off Markup
  • Non-XML-based Approaches

21 Certainty, Precision, and Responsibility

Encoders of text often find it useful to indicate that some aspects of the encoded text are problematic or uncertain, and to indicate who is responsible for various aspects of the markup of the electronic text.

Sections include:

  • Levels of Certainty
  • Indications of Precision
  • Attribution of Responsibility
  • The Certainty Module

22 Documentation Elements

This chapter describes a module which may be used for the documentation of the XML elements and element classes which make up any markup scheme, in particular that described by the TEI Guidelines, and also for the automatic generation of schemas conforming to that documentation.

Sections include:

  • Phrase Level Documentary Elements
  • Modules and Schemas
  • Specification Elements
  • Common Elements
  • Building a Schema
  • Combining TEI and Non-TEI Modules
  • Linking Schemas to XML Documents
  • Module for Documentation Elements

23 Using the TEI

 

This section discusses some technical topics concerning the deployment of the TEI markup scheme documented elsewhere in these Guidelines.

Sections include:

  • Serving TEI files with the TEI Media Type
  • Obtaining the TEI Schemas
  • Personalization and Customization
  • Conformance
  • Implementation of an ODD System

Software

Publishing TEI

  • There are many tools available e.g.:
    • Edition Visualization Technology
    • TEI Boilerplate
    • TEI Critical Apparatus Toolbox
    • TEI-C Stylesheets
    • OxGarage
    • CETEIcean
    • TAPAS project
    • eXist-db: TEI Publisher 
    • The tools you use may affect the features you can display to those reading your research and you may have more or less ability to customise

Edition Visualization Technology

  • Easy publication for multi-witness critical editions
  • Critical Edition support: rich and expandable critical apparatus, variant heat map, witnesses collation and variant filtering
  • Bookmark: direct reference to the current view of the web application, page and edition level, collated witnesses and selected apparatus entry
  • High level of customization: the editor can customize both the user interface layout and the appearance of the graphical components
  • https://visualizationtechnology.wordpress.com/

TEI Boilerplate

  • TEI Boilerplate gives in-browser conversion of TEI P5 XML using a simple XSL Stylesheet processing instruction
  • It transforms elements to HTML necessary for display of images, making links clickable, etc
  • Works in all major browsers
  • Works well for small, simple, individual web pages
  • Uses standard customisable CSS but also pays attention to CSS in TEI <rendition> elements
  • Viewing the web page source gives access to your TEI
  • http://teiboilerplate.org/

TEI Critical Apparatus Toolbox

  • Based on TEI Boilerplate
  • The toolbox lets you:
    • Check your encoding: offers facilities to display your edition while it is still in the making, and check the consistency of your encoding
    • Display parallel versions: choose the sigla of the witnesses, and the different versions of the text, following each chosen witness, will be displayed in parallel columns.
  • http://ciham-digital.huma-num.fr/teitoolbox/

TEI-C Stylesheets

  • Freely available, generalised XSLT stylesheets
  • Transformations to and/or from around 40 formats such as:
    • BibTeX, COCOA, CSV, DocBook, DocX (MS Word), DTD, EPub, XSL-FO, HTML, JSON, LaTeX, Markdown, NLM, ODT, PDF, RDF, RelaxNG, RNC, Schematron, Slides, TEI Lite, TEI ODD, TEI P4, TEI simplePrint, TCP, Text, Wordpress, XLSX (MS Excel), XSD
  • Customisable through importing and overwriting templates; Stylesheets repository allows for local 'profiles'
  • TEI-C offers services such as OxGarage which enable pipelined conversion to/from many more formats
  • https://github.com/TEIC/Stylesheets

CETEIcean

  •  CETEIcean is a Javascript (ES6) library that enables TEI P5 XML to be displayed in a web browser without transforming them to HTML
  • Instead it registers them with the browser as Custom Elements
  • Because the elements are treated as HTML, the HTML it produces is valid, and there are not element name collisions (like HTML <p> vs. TEI <p>)
  • http://github.com/TEIC/CETEIcean

Your Edition or Web Page Template

Embedded divisions of custom HTML elements

CETEIcean
​JavaScript

TAPAS Project

  • The TAPAS project: TEI Archiving, Publishing, and Access Service hosted by Northeastern University Library's Digital Scholarship Group
  • A free account can contribute to projects and collections in TAPAS to archive, publish, discover or share their TEI files
  • Built in XSLT transformations
  • TEI Members (or paid TAPAS membership) can create collections and projects
  • 1GB of XML file storage for TEI files, TEI ODD Customisations
  • http://tapasproject.org/

eXist-db's TEI Publisher

  • The "instant publishing toolbox" based on eXist-db XML database 
  • Provides easy browsing and search of TEI XML documents initially built for TEI simplePrint
  • Default display is clean and sophisticated page-by-page display
  • Control of element display is by editing the processing model documentation embedded in the TEI ODD (the TEI customisation format) but has a visual interface to do so
  • http://showcases.exist-db.org
  • http://teipublisher.com

TEIC Github Repositories

  • TEI (The Guidelines)
  • Stylesheets (Transformations to/from TEI)
  • Oxgarage (Document format pipeline engine)
  • Oxygen-TEI (TEI framework for popular editor)
  • Roma (Schema customisation)
    • Though a replacement, RomaJS is in development
  • Jenkins (Continuous integration: jenkins.tei-c.org)
  • CETEIcean (Easy TEI as HTML 5 custom elements)
  • Legacy:
    • Carthage, Pure ODD, TEI Simple, Byzantium, tei-emacs, etc.

jenkins.tei-c.org: Continuous Integration