Why the TEI may not be as limited as you think

Dr James Cummings


@jamescummings

http://slides.com/jamescummings/teiunlimited

Digital Editions & Presentation

  • Textual editors should beware those talking about digital editions primarily in terms of their presentation and layout for consumption by readers
  • The power of digital editions comes not from their shiny front-ends (which should look different for different sorts of users) but the data model and APIs that power their back-ends
  • These data models and APIs need to fully model the intellectual understanding of the text; it is the data model, with a documented API which are the real digital editions, presentation layers are only views of the edition
  • Digital editors only need to understand those technologies which affect the intellectual content of the edition

What is the TEI?

  • An international consortium of institutions, projects and individual members; and a community of users and volunteers
  • A freely available manual of set of regularly maintained and updated recommendations: 'The Guidelines'
  • Definitions, examples, and discussion of over 540 markup distinctions for textual, image facsimile, genetic editing etc.
  • A mechanism for producing customized schemas for validating your project's digital texts
  • A set of free and openly licensed, customizable tools and stylesheets for transformations to many formats (e.g. HTML, Word, PDF, Databases, RDF/LinkedData, Slides, ePub, etc.)
  • A simple consensus-based way of organizing and structuring textual (and other) resources
  • A format for documenting your interpretation and understanding of a text (and how text functions)
  • Whatever you make it! It is a community-driven standard

Some myths about the TEI

  • The TEI is too big (or complicated)
  • There is no way to change the TEI
  • The TEI is too small (or doesn't have <mySpecialElement>)
  • The TEI is XML (and XML is broken or dead)
  • You can't get from TEI to $myPreferredFormat
  • You can't do stand-off markup in XML (or TEI)
  • XML (and TEI) can't handle overlapping hierarchies
  • There are no tools that understand the TEI
  • TEI is only for Anglo/Western works
  • Interoperability is impossible with the TEI
  • The TEI is only for a digital edition
  • If you do a TEI-based edition you must learn other $tech

"The TEI is too big"

  • The TEI is a modular framework that allows you, a project, or a sub-community to choose precisely what elements are available (c.f. EpiDoc)
  • You customise the TEI in a TEI ODD customisation file where you include (and document) the choices you are making
  • This enforces consistency amongst a group of encoders (or just yourself), but also serves as machine processable documentation for long-term preservation
  • Your TEI ODD customisation is then a meta-schema source not only to generate your schema (to validate your documents) but also for your local encoding manual
  • Module element references by @include = only ever get these elements
  • Modue element references by @except = get any new elements when regenerating schema
  • Although there are web-based tools to create TEI customisations for you, what they create is TEI XML underneath
  • In this case we are changing the 'name' element from the core modue

"There is no way to change the TEI"

  • The <constraintSpec> element enables us to provide additional constraints (e.g. in SchemaTron)
  • The <model> element enables us to record our intended processing model(s)
  • Adding project-specific examples and notes is easy
  • Your TEI ODD file is also able to contain as much prose description, examples, etc. as you want outside the schema specification

(And you can change the TEI in other ways of course!)

  • The TEI is an open source community-developed standard
  • You can submit bugs/feature-requests at http://github.com/TEIC/TEI/issues/
  • You may get (or give) free support on the TEI-L mailing list (often on textual editing as much as TEI)
  • You can join Special Interest Groups and lobby for your particular view on critical apparatus (or something else)
  • Although everything it makes is free, you can also get your projects or institutions to join as a member and vote in elections, get discounts on software, archiving, etc.

"The TEI is too small"

  • The TEI has over 540 elements detailing various textual phenomena, although it does not have <mySpecialElement> the chances are it can cope with what you need in a more general manner
  • But even if you can't -- unlike most other standards -- you can add new elements, and do so in a manner that fully integrates and documents them (your TEI ODD customisation file)
  • You can also ask the TEI to add <mySpecialElement> and the elected group of volunteers will debate it (on the issue or council mailing list, both openly visible)
  • People's feature requests are usually eventually accepted

"The TEI is XML"

  • The TEI is not XML
  • Although it currently uses XML as a serialization format, previously it was SGML
  • When a better format arises (and so far in terms of clarity for long-term preservation, expressiveness, validation, integration, and mass adoption, nothing has come close), it may move away from XML
  • TEI conformance is governed by the TEI abstract model instantiated in the prose of the TEI Guidelines
  • If the prose and generated schemas differ, it is the prose that is considered normative
  • We have constraints in the prose that cannot be modelled in any existing schema language (hence development of Pure ODD)

"(And XML is Broken or Dead)"

  • The death of XML is highly over-forecast by those who fall victim to technology hype cycles and those who want to push $theirSpecialFormat or technology
  • Their are limitations with XML, but usually these either don't matter, are solved, or are a misunderstanding
  • Preferring a different format doesn't mean you need to denigrate existing formats; This is not, and should not be, a religious war
  • You can use XML, JSON, RDF, LaTeX, DocX, Markdown, and many other formats (and generate them from your TEI if you wish
  • Don't believe zealots: your choice of format should be about the appropriate format for rich encoding suitable to those particular circumstances not about technology fads (but for critical editing TEI is a very good choice)

"You can't get from TEI to $myPreferredFormat"

  • XML is easily processable with dozens of programming languages
  • The TEI Consortium provides XSLT stylesheets for transformations to/from around 40 other formats
    • Including, for example: bibtex, cocoa, csv, docbook, docx, dtd, epub, html(5), xsl-fo, json, InDesign, latex, markdown, mediawiki, nlm, odd, pdf, rdf, relaxng, slides, txt, wordpress, xlsx, xsd, and many more
  • Tools like OxGarage pipeline together these and other conversions 
  • Rolling your own XSLT, or profiles of the TEIC XSLT, is fairly easy (compared with other academic skills)
  • Important thing is granularity of information

"You can't do stand-off markup in XML (or TEI)"

  • This myth shows a misunderstanding of XML and unfamiliarity with TEI
  • While lots of TEI users favour embedded markup, there are lots of elements in the TEI specifically designed for stand-off markup (c.f. <link>, <join>)
  • Your edition could be a very flat text and you could point into it (using URIs, XPointers, etc.) to provide stand-off markup
  • A critical apparatus can be completely separate from a base text and point into it using many of the URI datatyped attributes 
  • There could be more documentation and explanation in the TEI Guidelines about this, but there are proposals to improve this; more general tools needed

"XML (and TEI) can't handle overlapping hierarchies"

  • The TEI Guidelines have a whole chapter (#20) about how to handle non-hierarchical structures
  • While it is true the TEI users often prefer to privilege the intellectual content over the physical construct, there are ways to mark both of these (e.g. milestones)
  • Revisions to TEI's <app> element enable <lem> and <rdg> to allow paragraphs,  divisions, and thus it isn't limited to phrase-level textual variance
  • Having multiple hierarchies is handled with forms of stand-off or out-of-line markup which are perfectly reasonably done in XML (and TEI)
  • It would be good to have more tools (there are some) specifically for this kind of work though

"There are no tools that understand the TEI"

(Of course, we'd be happy if there were more!)

"TEI is only for Anglo/Western works"

"Interoperability is impossible with the TEI"

  • The necessary ability to customise, constrain, extend the standard does pose a challenge for interoperability, but it is certainly possible
  • Usually people interoperate (rather than interchange) through lowest-common denominator subsets or pre-existing TEI subsets (like TEI Lite or TEI Simple)
  • More complex forms of markup interoperability may need some mediating influence (e.g. someone to understand both uses of the TEI)
  • The solution is proper documentation (by which I mean machine-processable TEI ODD customisation files with lots of prose as well).
  • The ability to interchange many documents improves significantly with a common interchange format
  • Customisation can document the differences in a machine processable format so tools can compare different corpora

- @louburnard

"TEI is only for a digital edition"

  • The TEI is for many forms of output
  • There isn't a one-to-one relationship between a TEI file and 'The Digital Edition' -- if you are using the format to its potential then you can create many aspects of the edition, supplementary files, indices
  • From a well-encoded TEI file you can create not only a digital edition, but camera-ready print copy, interactive graphic visualisations of encoded information, and many other formats
  • A single encoded TEI file can be used to produce multiple forms of edition (e.g. for different audiences; or diplomatic, eclectic, etc. )

"If you do a TEI-based edition you must learn other $tech"

  • When people create digital editions they often take it on themselves to learn not only TEI, but the technologies to transform and manipulate this
  • Great for those who can do so, or want to learn, but only need those which affect the intellectual content
  • Increasingly, tools like TEI Boiler Plate, eXist-db's TEI Publisher, in addition to the TEIC Stylesheets give editors more independent control
  • The new introduction of TEI Processing Model documentation inside TEI ODD gives tool-makers a way to generate software based on implementation-agnostic instructions that an editor (or editorial assistant) could modify

Conclusions

  • The TEI is as big or small as you want it to be -- the community helps users, projects, disciplines to change it
  • XML, and the TEI, are alive and well
  • You can use stand-off markup in the TEI and it is one of the recommended ways to handling overlapping hierarchies
  • With good markup you can get to/from almost any format (and many conversions already exist)
  • There are  tools that understand XML and TEI, but more generalised ones are always good
  • TEI is used for texts of any language, any time period, and any writing system
  • Interoperability is always a challenge, but easier when you converge on a format
  • The TEI is for many outputs, not just digital editions
  • What editors need to learn is TEI, others depend on needs