Freedom to Constrain
Introducing TEI Customisation
Dr James Cummings
@jamescummings
http://slides.com/jamescummings/freedom
What is the TEI?
What is the TEI?
- An international consortium of institutions, projects and individual members; and a community of users and volunteers
- A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'
- Definitions, examples, and discussion of over 540 markup distinctions for textual, image facsimile, genetic editing etc.
- A mechanism for producing customized schemas for validating your project's digital texts
- A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
- A simple consensus-based way of organizing and structuring textual (and other) resources
- A format for documenting your interpretation and understanding of a text (and how text functions)
- Whatever you make it! It is a community-driven standard
But what does it look like?
- The TEI is a set of prose guidelines that currently recommend using XML as an encoding format:
<element attribute="value">
Text or child elements here
</element>
<?xml version="1.0" ?>
<root xmlns="http://namespace/">
<element attribute="value">
content
<childElement type="empty"/>
content
</element>
<!-- comment -->
</root>
But what does TEI look like?
<TEI>
<teiHeader>
<!-- required -->
</teiHeader>
<facsimile>
<!-- optional -->
</facsimile>
<sourceDoc>
<!-- optional -->
</sourceDoc>
<text>
<!-- required if no
facsimile or sourceDoc -->
</text>
</TEI>
And inside the text?
<text xml:lang="en">
<front><!-- optional --></front>
<body>
<div n="1">
<head>Section 1</head>
<p>One or more paragraphs of text</p>
</div>
<div n="2">
<head>Section 2</head>
<p>One or more paragraphs of text (etc.)</p>
<div n="2.1" xml:lang="fr">
<head>Section 2.1</head>
<p>Un ou plusieurs paragraphes de texte</p>
</div>
</div>
<div n="3" xml:id="myExtraDivision">
<p>A division with no heading!</p>
</div>
</body>
<back><!-- optional --></back>
</text>
What kind of texts can TEI handle?
About the TEI Guidelines
Customising the TEI
What does the XML look like?
- @include = never get any new elements
- @except = get new elements if you regenerate schema
Why Customise?
- Enforce consistency between one or more encoders
- Increase speed of encoding with set value lists and descriptions
- Generate internationalised, project-specific, documentation
- Record decisions and relationship with the TEI in a machine-processable form
- Create local encoding manual with embedded schema specifications
- Provide long-term archival documentation for your projects outputs
Interoperability
Freedom to Constrain
- If you merely constrain the TEI to be:
- smaller
- more precise
- have specified attributes
- project-specific examples
- localised documentation
- then interoperability is less of a problem
- But there are still issues:
- different practices in various communities
- different element choices
- variation in attribute values
These problems are multiplied if new elements are added
Possibilities of the
TEI Framework
Project A
Project B
New Elements
Unmediated Interoperability Fantasy
- True interoperability only happens because of mediating factors between resources (e.g. crosswalks, normalisation scripts, understanding of the differences)
- If seamless interoperability happens without these then it is lowest common denominator interchange instead:
- the initial data structures are trivial, limited or of only structural granularity,
- the method of interoperation or combined processing is superficial,
- there has been a loss of intellectual content, or
the resulting interoperation is not significant.
Is this a problem?
- Yes: the necessary fragmentation of resources (because they have different needs) means extra work is needed to make them truly work together
- No: we don't encode texts for the purpose of interoperability but for research analysis (and besides we can always do the extra work)
- In many ways it is an opportunity -- to survey the different needs and approaches by encoding projects to (programatically) investigate them and build better schemes
- Chance to form more sub-communities like EpiDoc for other sorts of encoding, tightly bound to their subject areas and outputs
Software Development
TEIC Github Repositories
- TEI (The Guidelines)
- Stylesheets (Transformations to/from TEI)
- Oxgarage (Document format pipeline engine)
- Oxygen-TEI (TEI framework for popular editor)
- Roma (Schema customisation)
- Jenkins (Continuous integration: jenkins.tei-c.org)
- CETEIcean (Easy TEI as HTML 5 custom elements)
- Replacement for Roma in planning
- Legacy:
- Carthage, Pure ODD, TEI Simple, Byzantium, tei-emacs, etc.
New TEI Processing Model
- New TEI processing model documentation in TEI ODD Customisations enable record of processing intentions
- Software developers can read TEI ODD Customisation file and generate processing streams based on documented behaviours
- Testing and early adopters show that this 'build a factory rather than a car' approach saves significantly in code-length and complexity
- eXist-db Native XML Database has incorporated this to produced their TEI Publisher eXist-db App
Elected Volunteers?
- The majority of development on TEI Consortium provided software was provided by one person over many years:
Sebastian Patrick Quintus Rahtz
13 February 1955
–
15 March 2016
- The TEI Technical Council has taken this on, in addition to the maintenance of the TEI Guidelines and Schemas, but are elected volunteers with two-year terms -- The TEI is an open source style of community, but less so with the software it produces
TEI Community
Aspects of the TEI Community
- TEI Website: http://www.tei-c.org/
- TEI Mailing List: TEI-L@LISTSERV.BROWN.EDU
- TEI Wiki: http://wiki.tei-c.org/
- Github: http://github.com/TEIC/
- Jenkins: http://jenkins.tei-c.org/
- Technical Council: http://lists.tei-c.org/pipermail/tei-council/
- Twitter: @teiconsortium
- Facebook: http://www.facebook.com/groups/TEIconsortium/
- TEI SIGs:
- Computer-Mediated Communication; Correspondence; East Asian/Japanese; Education; Libraries; Manuscripts; Music; Ontologies; Scholarly Publishing; TEI for Linguists; Text and Graphics; Tools;
TEI Board & Technical Council
- TEI Board: Overall governance, strategic and financial oversight; 5 members elected in 2 year staggered terms
- TEI Technical Council: Maintenance of TEI Guidelines (releases twice a year) and associated software and systems; 11 members elected in 2 year staggered terms
- Appointed officers: Treasurer, Webmaster, Assistant Webmaster
TEI Consortium Membership
1 year free membership for taking part in members-advertised training event
Rahtz Prize for TEI Ingenuity
-
Dates:
- 1 April: Nominations Due
- 1 June: Submissions Due
- 15 September: Recipient Selected
- Annual Business Meeting: Recipient Announced
- Eligibility: Individuals or teams; Membership not required
-
Criteria:
- innovative development of the TEI Guidelines
- creation of TEI-aware tools & technologies that further dissemination, adoption, or engaged use of Guidelines
- expansive & inclusive TEI training & outreach opportunities
- informed development and cultivation of particular TEI practitioner communities
http://www.tei-c.org/Activities/rahtz.xml
Conclusions
Flexibility vs Fragmentation
- That the TEI enables incompatible, mutually exclusive, modifications to the vocabulary is a necessary evil
- The fragmentation this causes by having (directly) incompatible document collections is real
- But these difficulties are able to be overcome using good documentation (including proper use of TEI ODD customisation files)
- On balance it is better to customise (and document it) to create the best textual data we can
Software Maintenace
- The TEI needs to maintain the software it uses to generate the TEI Guidelines (HTML version)
- The TEI needs to maintain the tools for transforming meta-schema customisations into schemas and documentation
- It has moved to GitHub, uses the Jenkins Continuous Integration Server for all its products, and is trying to encourage submissions from the community
- (But this is difficult of a community of text encoders, not software developers)
Community Involvement
- The TEI needs to increase and diversify its community involvement
- Education forms part of this but must help users feel that they too can change the TEI!
- You can through GitHub feature requests
- Changing (and exposing) your own customisations
- Becoming a member and voting
- Getting involved in Special Interest Groups
- Writing or maintaining TEI-aware software
Freedom to Constrain
Introducing TEI Customisation
Dr James Cummings
University of Oxford
@jamescummings
http://slides.com/jamescummings/freedom
Thanks to the TEI Community for many of the ideas in this talk!
Freedom to constrain: Introducing TEI customisation
By James Cummings
Freedom to constrain: Introducing TEI customisation
Freedom to constrain: Introducing TEI customisation; A lecture for @UCLDH on 2016-10-12; https://www.ucl.ac.uk/dh/events/archive/cummings
- 2,314