TEI Metadata
What is Metadata?
- often called "data about data"
- term originally used only with electronic data but its meaning has broadened
- data about the content, context, and structure of information resources
- the catalogue record of the data/text/edition
Some examples:
- Purpose of the data
- Means of creation of the data
- Time and date of creation
- Creator or author of the data
- Where the data was created
- Standards used in creating the data
- Size of the data in useful units
- Related or supplemental data
- Last revision date of the data
- Stage of production of the data
TEI Metadata
- TEI requires some of its metadata to be stored inside the XML document, prefixed to the content.
- This information comprises the TEI header although some can be included inside the <body> or pointed to outside the document. It is:
- used to store bibliographical information about both the electronic version(s) of the text as well as any physical, or analogue, source(s)
- basic information is similar to library cataloguing and supports interroperability with other metadata standards
- much like an electronic version of a title page attached to a printed work
The <teiHeader>
-
The TEI header was designed with two goals in mind:
-
needs of bibliographers and librarians trying to document what were called 'electronic books'
-
needs of text analysts and digital editors trying to document ‘coding practices’ within digital resources
-
-
The result is that discussion of the header tends to be pulled in two directions...
-
Where can I read about this?
-
Chapter 2: The TEI Header
-
Chapter 10: Manuscript Description
-
Librarian's Header
- Conforms to standard bibliographic models
- Easily mapped to METS/EAD/MARC and other library metadata formats
- Based on TEI for Libraries Special Interest Group
- Pressure for more specific constraints
- Prefers structured data over loose prose
Editor's Header
- Polite nod to bibliographic practices
- Supports (potentially) huge range of miscellaneous information
- Different codes of practice in different communities
- Often concerned with editorial principles
- Mixture of tightly controlled and lose prose
Most headers are somewhere between the two
<teiHeader>
Structure of a <teiHeader>
The TEI header has four main components:
- <fileDesc> (file description) contains a full bibliographic description of the file
- <encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived
- <profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, the languages and sublanguages used, the situation in which it was produced, the participants and their setting
- <revisionDesc> (revision description) summarizes the revision history for a file
Only <fileDesc> is required -- the others are optional!
A Minimal Header
- fileDesc (with titleStmt & title), publicationStmt, and sourceDesc are all that are required
Two Levels of Header
<teiHeader>: Required vs Optional Components
/TEI/teiHeader/fileDesc
The <fileDesc> element has some mandatory elements:
- <titleStmt>: provides a title for the resource and any associated statements of responsibility
- <sourceDesc>: documents the sources from which the encoded text derives (if any)
- <publicationStmt>: documents how the encoded text is published or distributed
and some optional ones such as:
- <editionStmt>: yes, digital texts have editions too
- <seriesStmt>: and they also t into "series"
- <extent>: how many words, gigabytes, volumes, files?
- <notesStmt>: notes of various types
More About <fileDesc>
- <titleStmt>: contains a mandatory <title> which identifies the electronic file (not its source!)
- optionally followed by additional titles, and by ‘statements of responsibility’, as appropriate, using <author>, <editor>, <sponsor>, <funder>, <principal> or the generic <respStmt>
- <publicationStmt>: may contain
- <p> to give prose (e.g. to say the text is unpublished) or
- one or more <publisher>, <distributor>, <authority>, each followed by <pubPlace>, <address>, <availability>, <idno> etc.
/TEI/teiHeader/fileDesc/titleStmt
/TEI/teiHeader/fileDesc/publicationStmt
- Mandatory element
- At least one of <publisher>,<distributor> and/or <authority> must be present unless the entire
- publication statement is given as prose paragraphs using <p>
- If the creation date is different than the date of publication, creation date could be given within <profileDesc>, not in the <publicationStmt>
- A formal license may be entered in <licence> included in <availability>
Example <publicationStmt>
/TEI/teiHeader/fileDesc/notesStmt
The optional <notesStmt> can contain notes on almost any aspect of the file or its contents:
- These notes can be short statements, or many parargaphs long.
- Where possible, take care to encode such information with more precise elements elsewhere in the TEI header
- For example, text types, such as 'reportage' or 'detective fiction', should be described under <profileDesc>
/TEI/teiHeader/fileDesc/sourceDesc
All electronic works need to document their source,
even 'born digital' ones! The <sourceDesc> can have:
- prose description, just a <p>
- <bibl> (bibliographic citation): contains free text and/or any mixture of bibliographic elements such as <author>, <publisher> etc.
- <biblStruct> (structured) contains similar elements but constrained in various ways according to bibliographic standards
- A <listBibl> may be used for lists of such descriptions, e.g. bibliographies
- Specialised elements for spoken texts (<recordingStmt> etc.) and for manuscripts (<msDesc>)
- Authority lists: <listPerson>, <listPlace>, <listOrg> if not storing elsewhere
Example <sourceDesc>
Or your <sourceDesc> could have one or more <msDesc> elements
/TEI/teiHeader/encodingDesc
<encodingDesc> groups notes about the procedures used when the text was encoded, either summarised in prose or within specific elements such as
- <projectDesc>: goals of the project
- <samplingDecl>: sampling principles
- <editorialDecl>: editorial principals,
- e.g. <correction>, <hyphenation>, <interpretation>, <normalization>, <punctuation>, <quotation>, <segmentation>
- <classDecl>: classification system/s used
- <tagsDecl>: specifics about usage of particular elements
Detailed notes in <encodingDesc> could be used to generate a section of an editorial description.
Example <encodingDesc>
/TEI/teiHeader/encodingDesc/classDecl
/TEI/teiHeader/encodingDesc/tagsDecl
- <tagsDecl> records elements namespace, tag frequency, information about the usage of particular tags not specified elsewhere, and default rendition of the text in the source.
- <rendition> structured information about appearance in the source document
/TEI/teiHeader/profileDesc
The <profileDesc> contains a collection of descriptions, categorised only as ‘non-bibliographic’. Default members of the model.profileDescPart class include:
- <creation>: information about the origination of the intellectual content of the text, e.g. time and place
- <langUsage>: information about languages, registers, writing systems etc used in the text
- <textDesc> and <textClass>: classifications applied to the text by means of a list of specified criteria or by means of a collection of pointers
- <particDesc> and <settingDesc>: information about the ‘participants’, either real or depicted, in the text
- <handNotes>: information about the particular style or hand distinguished within a manuscript when not giving full manuscript description
/TEI/teiHeader/profileDesc/creation (& particDesc)
/TEI/teiHeader/profileDesc/langUsage
The <langUsage> element is provided to document usage of languages and writing systems in the text. Languages are identified by their ISO codes:
/TEI/teiHeader/profileDesc/textDesc
<textDesc> provides a description of a text in terms of its 'Situational parameters', a description of the situation within which the text was produced or experienced.
/TEI/teiHeader/revisionDesc
- Inside <revisionDesc> you find list of <change> elements, usually each with a @date and @who attributes, indicating significant stages in the evolution of a document.
- Conventionally, the most recent change is given first.
- Can be given in a <listChange> elements. Used here it is about the electronic file, used in <creation> it is about the stages of textual production.
- Can be maintained manually, or done by means of a version control system (like Subversion or Git)
Manuscript Description
About <msDesc>
The TEI <msDesc> element is intended for several different kinds of applications:
- standalone database of library records (finding aid)
- discursive text collecting many records (catalogue raisonné)
- metadata component within a digital surrogate (electronic edition)
- tool for ‘quantitative codicology’
Manuscript description in the TEI caters for two conflicting desires:
- preserve (or perpetuate) existing descriptive prose
- reliable search, retrieval, and analysis of data
The <msDesc> tries, wherever possible, to enable both of these approaches.
Inside <msDesc>
-
One or more <p> paragraphs or more structured elements:
- <msIdentifier>: information identifying this manuscript
- <msContents>: a list of the intellectual content of the manuscript
- <physDesc>: groups information concerning all physical aspects of the manuscript
- <history>: provides information on the history of the manuscript, its origin, provenance and acquisition by current holding institution
- <additional>: groups other information about the manuscript (e.g. administrative information relating to its availability, custodial history, surrogates)
- <msPart>: parts of a composite manuscript
- <msFrag>: fragments of a scattered manuscript
msDesc
msDesc
msDesc/msIdentifier
The <msIdentifier> element has a traditional manuscript location three part specification:
- place: <country>, <region>, <settlement>
- repository: <institution>, <repository>
- identifier: <collection>, <idno>, <altIdentifier>
msDesc/msIdentifier
msDesc/msContents
The <msContents> element contains information about the intellectual content of the manuscript. Multiple <msItem> elements provide a detailed table of contents
Example <msContents>
msDesc/physDesc
The <physDesc> element records any information concerning the physicality or materiality of the manuscript.
If using the structured form this might include:
- The physical carrier: <objectDesc>
- What it carries: <handDesc>, <scriptDesc>, <typeDesc>
- Special features: <additions>, <decoDesc>, <musicNotation>
- External things: <bindingDesc>, <sealDesc>, <accMat>
Example <physDesc>
msDesc/physDesc/objectDesc
<objectDesc> gives a way to describe the support, foliation, collation, condition, layouts, and more.
msDesc/physDesc/handDesc (& typeDesc & scriptDesc)
msDesc/physDesc/musicNotation (& decoDesc & additions)
msDesc/physDesc/bindingDesc (& sealDesc & accMat)
msDesc/history
<history> groups elements describing the full history of a manuscript or manuscript part.
- <origin>: where it all began
- <provenance>: everything in between
- <acquisition>: how you acquired it
Although <origin> is a member of att.datable, so has all the usual dating attributes, it also has special purpose elements <origDate> and <origPlace> to record the manuscript's origin date and place.
Example <history>
Example <history>
Example <history>
msDesc/additional
- <additional> groups additional information, combining bibliographic information about a manuscript, or surrogate copies of it with curatorial or administrative information.
- <adminInfo> administrative information
- <surrogates> information about other surrogates (e.g. photographs, microfilms, digital images) etc.
- <listBibl> bibliography of works concerning the manuscript
Example <additional>
Example <additional>
msDesc/msPart
- <msPart>: to describe individual parts of a composite manuscript
- <msFrag>: to describe manuscript fragments as part of a virtual whole
TEI header module elements:
abstract appInfo application authority availability biblFull cRefPattern calendar calendarDesc
catDesc catRef category change classCode classDecl correction correspAction correspContext
correspDesc creation distributor edition editionStmt editorialDecl encodingDesc extent fileDesc
funder geoDecl handNote hyphenation idno interpretation keywords langUsage language
licence listChange listPrefixDef namespace normalization notesStmtprefixDef principal
profileDesc projectDesc publicationStmt punctuation quotation refState refsDecl rendition
revisionDesc samplingDecl scriptNote segmentation seriesStmt sourceDesc sponsor stdVals
styleDefDecl tagUsagetagsDecl taxonomy teiHeader textClass titleStmt typeNote xenoData
TEI manuscript description module elements:
accMat acquisition additional additions adminInfo altIdentifier binding bindingDesc catchword
collation collection colophon condition custEvent custodialHist decoDesc decoNote depth dim
dimensions explicit filiationfinalRubric foliation handDesc height heraldry history incipit
institution layout layoutDesc locus locusGrp material msContents msDesc msFrag msIdentifier
msItem msItemStruct msName msPart musicNotation objectDescobjectType origDate
origPlace origin physDesc provenance recordHist repository rubric scriptDesc seal sealDesc
secFol signatures source stamp summary support supportDesc surrogates typeDesc
watermark width
TEI Metadata with Manuscript Description
By James Cummings
TEI Metadata with Manuscript Description
A workshop presentation of TEI Metadata and Manuscript Description
- 2,298