A World of Difference:

Dr James Cummings
University of Oxford


Myths and misconceptions about the TEI


(Thanks as always to many members of the TEI Community)

What is the TEI?

  • An international consortium of institutions, projects and individual members; and a community of users and volunteers
  • A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'
  • Definitions, examples, and discussion of over 560 markup distinctions for textual, image facsimile, genetic editing etc.
  • A mechanism for producing customized schemas for validating your project's digital texts
  • A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
  • A simple consensus-based way of organizing and structuring textual (and other) resources
  • A format for documenting your interpretation and understanding of a text (and how text functions)
  • Whatever you make it! It is a community-driven standard

TEI myths and misconceptions

  • The TEI is too big / complicated
  • The TEI is too simple / general
  • There is no way to change the TEI
  • The TEI is too small (or doesn't have <mySpecialElement>)
  • You have to be a TEI guru to customize the TEI
  • The TEI is XML (and XML is broken or dead)
  • You can't get from TEI to $myPreferredFormat
  • You can't do stand-off markup in XML (or TEI)
  • XML (and TEI) can't handle overlapping hierarchies
  • There are no tools that understand the TEI
  • TEI is only for Anglo/Western works
  • Interoperability is impossible with the TEI
  • The TEI is only for digital editions I'm doing $otherThing
  • If you do a TEI you must learn other $tech

"The TEI is too big/complicated"

  • The TEI is a modular framework that allows you, a project, or a sub-community to choose precisely what elements are available (c.f. EpiDoc)
  • You customise the TEI in a TEI ODD customisation file where you include (and document) the choices you are making
  • This enforces consistency amongst a group of encoders (or just yourself), but also serves as machine processable documentation for long-term preservation
  • Your TEI ODD customisation is then a meta-schema source not only to generate your schema (to validate your documents) but also for your local encoding manual
  • Module element references by @include = only ever get these elements
  • Modue element references by @except = get any new elements when regenerating schema

"The TEI is too simple/general "

  • The TEI is indeed a general framework for the encoding of digital text
  • Where possible the TEI specifies the general case of textual phenomena to be encoded and the provides more details through attributes, additional child elements, or linked information stored elsewhere
  • Example: "The <damage> element is too general, I need something like <waterDamage> to say the damage was caused by water!"
    • Answer: <damage agent="water">

But: there are indeed places where the TEI is too 'simple': where other standards are recommended (e.g. SVG, MEI) or it has yet to provide more complicated methods (e.g. <collation>)

  • Although there are web-based tools to create TEI customisations for you, what they create is TEI XML underneath
  • In this case we are changing the 'name' element from the core module

"There is no way to change the TEI"

  • The <constraintSpec> element enables us to provide additional constraints (e.g. in Schematron)
  • The <model> element enables us to record our intended processing model(s)
  • Adding project-specific examples and notes is easy
  • A TEI ODD file is also able to contain as much prose description, examples, etc. as you want outside the schema specification and these sections can be linked together

Changing Attributes

(And you can change the TEI in other ways of course!)

  • The TEI is an open source community-developed standard
  • You can submit bugs/feature-requests at http://github.com/TEIC/TEI/issues/
  • You may get (or give) free support on the TEI-L mailing list
  • You can join Special Interest Groups and lobby for your particular viewpoint with a larger group of people
  • TEI-C outputs are free, but you can also get your projects or institutions to join as a member and vote in elections, get discounts on software, archiving, etc.

"The TEI is too small or doesn't have <mySpecialElement>"

  • The TEI has over 560 elements detailing various textual phenomena, although it does not have <mySpecialElement> the chances are it can cope with what you need in a more general manner
  • But even if you can't -- unlike most other standards -- you can add new elements, and do so in a manner that fully integrates and documents them (your TEI ODD customisation file)
  • You can also ask the TEI to add <mySpecialElement> and the elected group of volunteers will debate it (on the issue or council mailing list, both openly visible)
  • People's feature requests are usually eventually accepted

"You have to be a TEI guru to customise the TEI" 

"The TEI is XML"

  • The TEI is not XML
  • Although it currently uses XML as a serialization format, previously it was SGML
  • When a better format arises it could move away from XML (but so far in terms of clarity for long-term preservation, expressiveness, validation, integration, and mass adoption, nothing has come close)
  • TEI conformance is governed by the TEI abstract model instantiated in the prose of the TEI Guidelines
  • If the prose and generated schemas differ, it is the prose that should be considered normative
  • We have constraints in the prose that cannot be modelled in any existing schema language 

"(And XML is Broken or Dead)"

  • The death of XML is highly over-forecast by those who fall victim to technology hype cycles and want to push $theirSpecialFormat or technology
  • Their are limitations with XML, but often these either don't matter for most projects, are already solved, or are a result of misunderstandings or fear of learning a new technology
  • Preferring a different format doesn't mean you need to denigrate existing formats: This is not, and should not be, a religious war
  • You can use XML, JSON, RDF, LaTeX, DocX, Markdown, and many other formats together (and generate them from your TEI if you wish...)
  • Never believe zealots: your choice of format should be about the appropriate format for rich encoding suitable to those particular circumstances not about technology fads

"You can't get from TEI to $myPreferredFormat "

  • XML is easily processable with dozens of programming languages
  • The TEI Consortium provides XSLT stylesheets for transformations to/from around 40 other formats
    • Including, for example: bibtex, cocoa, csv, docbook, docx, dtd, epub, html(5), xsl-fo, json, InDesign, latex, markdown, mediawiki, nlm, odd, pdf, rdf, relaxng, slides, txt, wordpress, xlsx, xsd, and many more
  • Tools like OxGarage pipeline together these and other conversions 
  • Rolling your own XSLT, or profiles of the TEIC XSLT, is fairly easy (compared with other academic skills)
  • Important thing is granularity of information 
  • If you need to use another format... then use that format

"You can't do stand-off markup in XML (or TEI)"

  • This myth shows a misunderstanding of XML and unfamiliarity with TEI
  • While lots of TEI users favour embedded markup, there are lots of elements in the TEI specifically designed for stand-off markup (c.f. <link>, <join>, etc.)
  • A TEI document could be a very flat text and have stand-off markup (using URIs, XPointers, etc.) pointing into it
    • e.g. A critical apparatus can be completely separate from a base text and point into it 
  • There could be more documentation and explanation in the TEI Guidelines about this

But: where this could improve is in the provision of more general tools to do better stand-off markup in encoder-friendly ways

"XML (and TEI) can't handle
overlapping hierarchies"

  • The TEI Guidelines have a whole chapter (#20) about how to handle non-hierarchical structures
  • While it is true the TEI users often prefer to privilege the intellectual content over the physical construct, there are ways to mark both of these (e.g. milestones)
  • Revisions to TEI's <app> element enable <lem> and <rdg> to allow paragraphs,  divisions, and thus it isn't limited to phrase-level textual variance
  • Having multiple hierarchies is handled with forms of stand-off or out-of-line markup which are perfectly reasonably done in XML (and TEI)
  • It would be good to have more tools (there are some) specifically for this kind of work though

"There are no tools that understand the TEI"

(Of course, we'd be happy if there were more! These are just the ones people have listed on the wiki)

"TEI is only for Anglo/Western works"

"TEI is only for Anglo/Western works"

  • The TEI Guidelines strive to be applicable to encoding any text, of any time period, in any language, in any writing system
  • While the TEI has indubitably arisen from a western context, those involved often strive to broaden the scope of its examples, coverage of textual phenomena, improve its internationalisation and localisation methods, and create a more diverse TEI community
  • The TEI Guidelines have a built-in system for handling non-Unicode characters
  • The section on writing systems is occasionally updated when new developments for handling digital text emerge
  • But... more should be done

"Interoperability is impossible with the TEI"

  • The necessary ability to customise, constrain, extend the standard does pose a challenge for interoperability, but it is certainly possible
  • Usually people interoperate (rather than interchange) through lowest-common denominator subsets or pre-existing TEI subsets (like TEI Lite or TEI simplePrint)
  • More complex forms of markup interoperability may need some mediating influence (e.g. someone to understand both uses of the TEI)
  • The solution is proper documentation (by which I mean machine-processable TEI ODD customisation files with lots of prose as well).
  • The ability to interchange many documents improves significantly with a common interchange format
  • Customisation can document the differences in a machine processable format so tools can compare different corpora

- @louburnard

"The TEI is only for digital editions
I'm doing $otherThing"

  • The TEI is for many forms of output not just digital editions
  • Moreover, there isn't a one-to-one relationship between a TEI file and 'The Digital Edition' -- if you are using the format to its potential then you can create many aspects of the edition, supplementary files, indices, camera-ready print copy, interactive graphic visualisations of encoded information, etc
  • TEI is used for many other forms of digital text, such as catalogues of medieval manuscripts, linguistic corpora, etc.
  • But if you want to do $otherThing and there is a good standard for that, then use that! It is about appropriate formats for your use case.

"If you do a TEI you must learn other $tech"

  • When people create digital resources using TEI they often take it on themselves to learn not only TEI, but the technologies to transform and manipulate this
  • Great for those who can do so, or want to learn, but only need those which affect the intellectual content
  • Increasingly, tools like TEI Boiler Plate, eXist-db's TEI Publisher, in addition to the TEIC Stylesheets (and many others) give editors more independent control
  • The introduction of TEI Processing Model documentation inside TEI ODD gives tool-makers a way to generate software based on implementation-agnostic instructions that an editor (or editorial assistant) could modify

But: This doesn't mean there couldn't be more simple how-to's or off-the-shelf software

Why do these myths exist?

Possible reasons:

  1. TEI has become mainstream (well, in DH) -- it is not a ragtag group of rebels but the establishment that people want to challenge
  2. Some teaching of the TEI is focused on rules and precepts rather than why something is being encoded
  3. Intensive teaching in workshops means that people often ingest a lot of new concepts and ideas in a short time
  4. Misunderstandings because people have glanced at part of the enormous TEI Guidelines 

What can we do about these myths?

Possible solutions:

  1. Foreground TEI as open community, not elitist priesthood
  2. Produce more open training materials focusing more on the why than the how
  3. Encourage longer term TEI learning through creating more self-tuition materials, build teaching into other longer courses, and encourage new users to discuss more on TEI-L
  4. Provide more easily digested introductions, surveys of types of encoding, and larger fully-worked examples (from encoding to publication)

More solutions? Do let me know!

A World of Difference: Myths and misconceptions about the TEI

By James Cummings

A World of Difference: Myths and misconceptions about the TEI

A World of Difference: Myths and misconceptions about the TEI; Dr James Cummings; DH 2017; Friday 11 August 2017

  • 2,999