Dr James Cummings
@jamescummings
http://slides.com/jamescummings/tei-structure-core-mca
Thanks as ever to many members of the TEI Community
A TEI document is represented by means of:
Each element could represent a collection of encoded texts, versions of a text, or samples of a language corpora, etc.
Because cultural conventions differ as to which elements are grouped as front matter and which as back matter, the content models for the front and back elements are identical.
'appendix' an ancillary self-contained section of a work
'glossary' a list of terms associated with definition texts
(<list type="gloss">)
'notes' a section where notes are gathered together
'bibliogr' list of bibliographic citations (<listBibl>)
'index' any form of index of the work
'colophon' statement describing the physical production of the work
and many more...
Some features (potentially) apply to everything, therefore members of the attribute class att.global can appear in every TEI element:
Hierarchical grouping of text sequences into textual divisions and subdivisions by means of nested <div> elements.
What do devisions contain (apart from other divisions)?
Headings, tagged with <head>
Prose, which may be organized as a sequence of
paragraphs <p>
Poetry, divided into metrical lines <l>, optionally grouped into stanzas <lg>
Drama, divided into speeches <sp>, containing an
optional speaker label <speaker>, followed by a mix of <p> or <l> elements, optionally mixed up with stage directions <stage>
Within the <text> element the logical view is privileged, but the physical view can be encoded as well through 'empty' elements:
<pb/> marks the start of a new page
<cb/> marks the start of a new column
<lb/> marks the start of a new line
<gb/> marks the start of a new gathering
and for other forms of milestone:
A paragraph is a significant organizational unit for all prose texts
<hi> word or phrase which is graphically distinct from the surrounding text
<foreign> word or phrase not written in the same
language than the surrounding text
@xml:lang global attribute to specify the language, using an ISO standard code (e.g. ISO 639-1)
<emph> words or phrases which are emphasized for
linguistic or rhetorical effect
original rendition recorded with: @rend, @rendition and @style
The TEI distinguishes a variety of 'distinct' text enclosed in quotation marks (or indicated by other means):
versus
<add> addition to the text
<del> letter, word or phrase marked as deleted in the text
<unclear> illegible or inaudible passage which cannot be read with confidence
<gap> indicates a point where material is omitted
<name> a proper noun or noun phrase
<rs> a string referring to some person, place, object, etc.
@type attribute specifies the type of the name in more detail
Note: Including the namesdates module gives many more name elements (for personal, place, organisational, and geographic names).
Elements to distinguish postal and electronic addresses
<address> contains a postal address
<email> contains an email address
<addrLine> a non-specific address line
<street> a full street address
<postCode> a postal or
zip code
<postBox> a postal box
number
<name> can also be
used within address
(More attributes added if the namesdates module is loaded)
<bibl> a structured or unstructured bibliographic entry
<title>, <editor>, <title>, <pubPlace>, <publisher>, <date>, etc. for further structuring
<biblStruct> a structured bibliographic entry
<lg> a formal unit (e.g. stanza) containing one or more verse lines
<l> contains a single verse line
The verse module extends this with more elements for metrical analysis.
The drama module extends this with more elements for dramatic structures like cast lists.
abbr add addrLine address analytic author bibl biblScope biblStruct binaryObject cb choice cit citedRange corr date del desc distinct divGen
editor email emph expan foreign gap gb gloss graphic head headItem
headLabel hi imprint index item l label lb lglist listBibl measure measureGrp
media meeting mentioned milestone monogr name note num orig p pb
postBox postCode ptr pubPlace publisher q quote ref reg relatedItem resp
respStmt rs said series sic soCalled sp speaker stage street teiCorpus term
textLang time title unclear