James Cummings
@jamescummings
http://slides.com/jamescummings/teistructure-core
Thanks as ever to many members of the TEI Community
(How TEI documents are structured)
A TEI document is represented by means of:
Each <TEI> element could represent a collection of encoded texts, versions of a text, or samples of a language corpora, etc.
A group may contain sub-groups
represented by nested <group> elements.
Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the front and back elements are identical.
'appendix' an ancillary self-contained section of a work
'glossary' a list of terms associated with definition texts
(<list type="gloss">)
'notes' a section where notes are gathered together
'bibliogr' list of bibliographic citations (<listBibl>)
'index' any form of index of the work
'colophon' statement describing the physical production of the work
and many more...
Of course, there are elements like <index> for marking index entries in the body of a text to enable auto-generation of a detailed index
Some features (potentially) apply to everything, therefore members of the attribute class att.global can appear in every TEI element:
Hierarchical grouping of text sequences into textual divisions and subdivisions by means of nested <div> elements.
What do devisions contain (apart from other divisions)?
Headings, tagged with <head>
Prose, which may be organized as a sequence of
paragraphs <p>
Poetry, divided into metrical lines <l>, optionally grouped into stanzas <lg>
Drama, divided into speeches <sp>, containing an
optional speaker label <speaker>, followed by a mix of <p> or <l> elements, optionally mixed up with stage directions <stage>
Within the <text> element the logical view is privileged, but the physical view can be encoded as well through 'empty' elements:
<pb /> marks the start of a new page
<cb /> marks the start of a new column
<lb /> marks the start of a new line
<gb /> marks the start of a new gathering
and for other forms of milestone:
(Things lots of documents have)
Identification information
e.g. shelfmark, inventory number, page number, titles…
Divisions and subdivisions
Pictures, diagrams, some kind of graphical information
A number of writing modes or registers
e.g. prose, verse, drama…
With formal structural units
e.g. paragraphs, lists, stanzas, lines, speeches
Containing textual distinctions (sometimes signalled by rendition)
e.g. titles, headings, quotes, names…
Metatextual indications/interventions
e.g. deletions, additions, annotations, revisions…
The TEI core module can cope with this and more phenomena!
A paragraph is a significant organizational unit for all prose texts
Typographic features in order to distinguish passages from its surroundings:
<hi> word or phrase which is graphically distinct from the surrounding text
<foreign> word or phrase not written in the same
language than the surrounding text
@xml:lang global attribute to specify the language, using an ISO standard code (e.g. ISO 639-1)
You may disagree that 'croissant' is foreign word.
Markup is never neutral.
<emph> words or phrases which are emphasized for
linguistic or rhetorical effect
original rendition recorded with: @rend, @rendition and @style
The TEI distinguishes a variety of 'distinct' text enclosed in quotation marks (or indicated by other means):
You can also show abbreviation markers (<am/>) and expanded text (<ex>)
<add> addition to the text
<del> letter, word or phrase marked as deleted in the text
<unclear> illegible or inaudible passage which cannot be read with confidence
<gap> indicates a point where material is omitted
<name> a proper noun or noun phrase
<rs> a string referring to some person, place, object, etc.
@type attribute specifies the type of the name in more detail
Note: Including the namesdates module gives many more name elements (for personal, place, organisational, and geographic names).
Elements to distinguish postal and electronic addresses
<address> contains a postal address
<email> contains an email address
<addrLine> a non-specific address line
<street> a full street address
<postCode> a postal or
zip code
<postBox> a postal box
number
<name> can also be
used within address
(More attributes added if the namesdates module is loaded)
<ptr> defines a pointer to another location
<ref> defines a reference to another location with an
optional linking text
@target taking a URI reference
While <ref> provides link text (though not all references are hyperlinks), <ptr/> is only used for pointers.
<bibl> a structured or unstructured bibliographic entry
<title>, <editor>, <title>, <pubPlace>, <publisher>, <date>, etc. for further structuring
<biblStruct> a structured bibliographic entry
<lg> a formal unit (e.g. stanza) containing one or more verse lines
<l> contains a single verse line
The verse module extends this with more elements for metrical analysis.
The drama module extends this with more elements for dramatic structures like cast lists.
Core Module:
abbr, add, addrLine, address, analytic, author, bibl, biblScope, biblStruct, binaryObject, cb, choice, cit, citedRange, corr, date, del, desc, distinct, divGen, editor, email, emph, expan, foreign, gap, gb, gloss, graphic, head, headItem, headLabel, hi, imprint, index, item, l, label, lb, lg, list, listBibl, measure, measureGrp, media, meeting, mentioned, milestone, monogr, name, note, num, orig, p, pb, postBox, postCode, ptr, pubPlace, publisher, q, quote, ref, reg, relatedItem, resp, respStmt, rs, said, series, sic, soCalled, sp, speaker, stage, street, teiCorpus, term, textLang, time, title, unclear