James Cummings
@jamescummings
http://slides.com/jamescummings/sro-tei
Thanks as ever to many members of the TEI Community
Markup is used in many different fields, for many different purposes: storing data, relating information, encoding understanding, preserving metadata
Procedural Markup:
RED INK ON; print "-£1000"; RED INK OFF
Presentational Markup:
\textcolor{red}{-£1000}
Descriptive Markup:
<measure unit="pounds" value="-1000">
One thousand pounds in debt
</measure>
XML is structured data represented as strings of text
XML looks like HTML, except that:
<element> Text </element>
<element attribute="value">
Text or child elements here
</element>
<element attribute="value"/>
<?xml version="1.0" ?>
<root xmlns="http://namespace/">
<element attribute="value">
content
<childElement type="empty"/>
content
</element>
<!-- comment -->
</root>
You can also be 'valid' which means you obey additional rules about elements and attributes and where they can go.
<seg> <w>some</w> <hi>text</hi> </seg>
Some features (potentially) apply to everything, therefore members of the attribute class att.global can appear in every TEI element:
Hierarchical grouping of text sequences into textual divisions and subdivisions by means of nested <div> elements.
What do devisions contain (apart from other divisions)?
Headings, tagged with <head>
Prose, which may be organized as a sequence of
paragraphs <p>
Poetry, divided into metrical lines <l>, optionally grouped into stanzas <lg>
Drama, divided into speeches <sp>, containing an
optional speaker label <speaker>, followed by a mix of <p> or <l> elements, optionally mixed up with stage directions <stage>
Within the <text> element the logical view is privileged, but the physical view can be encoded as well through 'empty' elements:
<pb /> marks the start of a new page
<cb /> marks the start of a new column
<lb /> marks the start of a new line
<gb/> marks the start of a new gathering
and for other forms of milestone:
A paragraph is a significant organizational unit for all prose texts
Typographic features in order to distinguish passages from its surroundings:
<hi> word or phrase which is graphically distinct from the surrounding text
<foreign> word or phrase not written in the same
language than the surrounding text
@xml:lang global attribute to specify the language, using an ISO standard code (e.g. ISO 639-1)
<add> addition to the text
<del> letter, word or phrase marked as deleted in the text
<supplied> marks editorially supplied text
<gap> indicates a point where material is omitted
<unclear> marks where text is illegible, containing best guess
<persName role="stationer">
<forename>Thomas</forename>
<surname>marshe</surname>
</persName>
<seg type="fee" rend="roman-numerals aligned-right">
<!--processing: iiijd-->
<num type="totalPence" value="4">
<!--orig: iiijd-->
<num type="pence" value="4">
iiij<hi rend="superscript">d</hi>
</num>
</num>
</seg>
<date from="1557-07-19" to="1558-07-09">19 July 1557–9 July 1558.</date>
<date notBefore="1559-07-14" notAfter="1560-07-05">14 July 1559–5 July 1560.</date>
<date when="1560-03-04">
iiij<hi rend="superscript">th</hi> Daye of marche
<note resp="#arber">1560</note>
</date>
SRO is being slightly unusual in embedding a metadata block (using the 'anonymous block' element <ab>) inside every entry.
<ab type="metadata">
<date notBefore="1565-07-22" notAfter="1566-07-22">
22 July 1565–22 July 1566.
</date>
<idno type="RegisterRef">Register A, f.132v</idno>
<idno type="ArberRef">I. 296</idno>
<idno type="RegisterID">?</idno>
<num type="works" value="0"/>
<note type="status" subtype="unknown"/>
</ab>
In the header <revisionDesc> is used to store the major stages of modification/creation/revision of the electronic file:
<revisionDesc>
<change when="2017-01-29">
Metadata block created by JC; Arber's corrections made by IG
</change>
<change when="2017-01-22">
Material other than copy entries removed by Ian Gadd
</change>
<change from="2013-06" to="2013-10">Semi-automated changes based
on bodleian proofreading made to the SRO data after the initial
conversion (and up-conversion of roman numerals, fees, dates,
names, etc.) from abbreviated tei-corset schema by James Cummings
</change>
<change from="2012-12" to="2013-05"> Encoding reviewed, with
suggestions made for improvements, a random sample of names
checked, and spot-proofed by Pip Willcox. December 2012 - May 2013.
</change>
</revisionDesc>