Freedom to Constrain

Introducing TEI Customisation

Dr James Cummings
 

@jamescummings
http://slides.com/jamescummings/freedom

What is the TEI?

What is the TEI?

  • An international consortium of institutions, projects and individual members; and a community of users and volunteers
  • A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'
  • Definitions, examples, and discussion of over 540 markup distinctions for textual, image facsimile, genetic editing etc.
  • A mechanism for producing customized schemas for validating your project's digital texts
  • A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
  • A simple consensus-based way of organizing and structuring textual (and other) resources
  • A format for documenting your interpretation and understanding of a text (and how text functions)
  • Whatever you make it! It is a community-driven standard

But what does it look like?

  • The TEI is a set of prose guidelines that currently recommend using XML as an encoding format:
<element attribute="value">
Text or child elements here
</element>
<?xml version="1.0" ?>
<root xmlns="http://namespace/">
 <element attribute="value">
  content 
   <childElement type="empty"/>
  content
 </element>
<!-- comment -->
</root>

But what does TEI look like?

<TEI>
   <teiHeader>
     <!-- required -->
   </teiHeader>
   <facsimile>
    <!-- optional -->
    </facsimile>
   <sourceDoc>
    <!-- optional -->
   </sourceDoc>
   <text>
     <!-- required if no 
       facsimile or sourceDoc -->
    </text>
</TEI>

And inside the text?

<text xml:lang="en">
     <front><!-- optional --></front>
      <body>
         <div n="1">
            <head>Section 1</head>
            <p>One or more paragraphs of text</p>
         </div>
         <div n="2">
            <head>Section 2</head>
            <p>One or more paragraphs of text (etc.)</p>
            <div n="2.1" xml:lang="fr">
               <head>Section 2.1</head>
               <p>Un ou plusieurs paragraphes de texte</p>
            </div>
         </div>
         <div n="3" xml:id="myExtraDivision">
            <p>A division with no heading!</p>
         </div>
         </body>
     <back><!-- optional --></back>
 </text>

What kind of texts can TEI handle?

About the TEI Guidelines

Customising the TEI

What does the XML look like?

  • @include = never get any new elements
  • @except = get new elements if you regenerate schema

Why Customise?

  • Enforce consistency between one or more encoders
  • Increase speed of encoding with set value lists and descriptions
  • Generate internationalised, project-specific, documentation
  • Record decisions and relationship with the TEI in a machine-processable form
  • Create local encoding manual with embedded schema specifications
  • Provide long-term archival documentation for your projects outputs

Interoperability

Freedom to Constrain

  • If you merely constrain the TEI to be:
    • smaller
    • more precise
    • have specified attributes 
    • project-specific examples
    • localised documentation
    • then interoperability is less of a problem
  • But there are still issues:
    • different practices in various communities
    • different element choices
    • variation in attribute values

These problems are multiplied if new elements are added

Possibilities of the

TEI Framework

Project A

Project B

New Elements

Unmediated Interoperability Fantasy

  • True interoperability only happens because of mediating factors between resources (e.g. crosswalks, normalisation scripts, understanding of the differences)
  • If seamless interoperability happens without these then it is lowest common denominator interchange instead:
    •  the initial data structures are trivial, limited or of only structural granularity,
    • the method of interoperation or combined processing is superficial,
    • there has been a loss of intellectual content, or
      the resulting interoperation is not significant.

Is this a problem?

  • Yes: the necessary fragmentation of resources (because they have different needs) means extra work is needed to make them truly work together
  • No: we don't encode texts for the purpose of interoperability but for research analysis (and besides we can always do the extra work)

 

  • In many ways it is an opportunity -- to survey the different needs and approaches by encoding projects to (programatically) investigate them and build better schemes
  • Chance to form more sub-communities like EpiDoc for other sorts of encoding, tightly bound to their subject areas and outputs 

Software Development

TEIC Github Repositories

  • TEI (The Guidelines)
  • Stylesheets (Transformations to/from TEI)
  • Oxgarage (Document format pipeline engine)
  • Oxygen-TEI (TEI framework for popular editor)
  • Roma (Schema customisation)
  • Jenkins (Continuous integration: jenkins.tei-c.org)
  • CETEIcean (Easy TEI as HTML 5 custom elements)
  • Replacement for Roma in planning
  • Legacy:
    • Carthage, Pure ODD, TEI Simple, Byzantium, tei-emacs, etc.

New TEI Processing Model

  • New TEI processing model documentation in TEI ODD Customisations enable record of processing intentions
  • Software developers can read TEI ODD Customisation file and generate processing streams based on documented behaviours
  • Testing and early adopters show that this 'build a factory rather than a car' approach saves significantly in code-length and complexity
  • eXist-db Native XML Database has incorporated this to produced their TEI Publisher eXist-db App

Elected Volunteers?

  • The majority of development on TEI Consortium provided software was provided by one person over many years:

Sebastian Patrick Quintus Rahtz

 

13 February 1955

15 March 2016

 

  • The TEI Technical Council has taken this on, in addition to the maintenance of the TEI Guidelines and Schemas, but are elected volunteers with two-year terms -- The TEI is an open source style of community, but less so with the software it produces

TEI Community

Aspects of the TEI Community

  • TEI Website: http://www.tei-c.org/
  • TEI Mailing List: TEI-L@LISTSERV.BROWN.EDU
  • TEI Wiki: http://wiki.tei-c.org/
  • Github: http://github.com/TEIC/
  • Jenkins: http://jenkins.tei-c.org/
  • Technical Council: http://lists.tei-c.org/pipermail/tei-council/
  • Twitter: @teiconsortium
  • Facebook: http://www.facebook.com/groups/TEIconsortium/
  • TEI SIGs:
    • Computer-Mediated Communication; Correspondence; East Asian/Japanese; Education; Libraries; Manuscripts; Music; Ontologies; Scholarly Publishing; TEI for Linguists; Text and Graphics; Tools;

TEI Board & Technical Council

  • TEI Board: Overall governance, strategic and financial oversight; 5 members elected in 2 year staggered terms

 

  • TEI Technical Council: Maintenance of TEI Guidelines (releases twice a year) and associated software and systems; 11 members elected in 2 year staggered terms

 

  • Appointed officers: Treasurer, Webmaster, Assistant Webmaster

TEI Consortium Membership

1 year free membership for taking part in members-advertised training event

Rahtz Prize for TEI Ingenuity

  • Dates:
    • 1 April: Nominations Due
    • 1 June: Submissions Due
    • 15 September: Recipient Selected
    • Annual Business Meeting: Recipient Announced
  • Eligibility: Individuals or teams; Membership not required
  • Criteria:
    • innovative development of the TEI Guidelines
    • creation of TEI-aware tools & technologies that further dissemination, adoption, or engaged use of Guidelines
    • expansive & inclusive TEI training & outreach opportunities
    • informed development and cultivation of particular TEI practitioner communities

http://www.tei-c.org/Activities/rahtz.xml

Conclusions

Flexibility vs Fragmentation

  • That the TEI enables incompatible, mutually exclusive, modifications to the vocabulary is a necessary evil
  • The fragmentation this causes by having (directly) incompatible document collections is real
  • But these difficulties are able to be overcome using good documentation (including proper use of TEI ODD customisation files)
  • On balance it is better to customise (and document it) to create the best textual data we can

Software Maintenace

  • The TEI needs to maintain the software it uses to generate the TEI Guidelines (HTML version)
  • The TEI needs to maintain the tools for transforming meta-schema customisations into schemas and documentation
  •  It has moved to GitHub, uses the Jenkins Continuous Integration Server for all its products, and is trying to encourage submissions from the community
  • (But this is difficult of a community of text encoders, not software developers)

Community Involvement

  • The TEI needs to increase and diversify its community involvement
  • Education forms part of this but must help users feel that they too can change the TEI! 
    • You can through GitHub feature requests
    • Changing (and exposing) your own customisations
    • Becoming a member and voting
    • Getting involved in Special Interest Groups
    • Writing or maintaining TEI-aware software

Freedom to Constrain

Introducing TEI Customisation

Dr James Cummings
University of Oxford

@jamescummings
http://slides.com/jamescummings/freedom

Thanks to the TEI Community for many of the ideas in this talk!