READ | Recognition and
Enrichment of Archival Data
University of Leipzig
Konstantin Freybe, M.A.
konstantin.freybe@uni-leipzig.de
Digitising Violin Labels
Transcription, Encoding, Enrichment
READ | Recognition and
Enrichment of Archival Data
Outline
- Facts & Figures
- Mission
- Target User Groups
- READ | Tools
- Approaching Archival Documents
- Enriching Violin Labels
- Encoding Violin Labels
READ | Recognition and
Enrichment of Archival Data
1. Facts & Figures
Duration:
January 1st 2016 - June 30th 2019
Grant:
8.200.000EUR
received through EU research & innovation programme H2020
Partners:
13 partners coordinated by University of Innsbruck
10 institutions (increasing) as associated partners via memorandum of understanding
READ | Recognition and
Enrichment of Archival Data
2. Mission
READ | Recognition and
Enrichment of Archival Data
2. Mission
- increase accessibility of archival documents
- revolutionise Handwritten Text Recognition (HTR)
- establish Virtual Research Environments (VRE)
READ | Recognition and
Enrichment of Archival Data
3. Target User Groups
READ | Recognition and
Enrichment of Archival Data
3. Target User Groups
- Memory Institutions
READ | Recognition and
Enrichment of Archival Data
3. Target User Groups
- Memory Institutions
- Archives
- Libraries
- Museums
READ | Recognition and
Enrichment of Archival Data
3. Target User Groups
- Memory Institutions
- Humanities Scholars
- Computer Scientists
- public users & volunteers (crowd-sourcing)
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
Transkribus
ScanREAD
e-Learning-App
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
Transkribus
ScanREAD
e-Learning-App
- utility app for digitisation of archival documents
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
Transkribus
ScanREAD
e-Learning-App
- utility app for digitisation of archival documents
- software intended to help users decipher historical handwritings
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
Transkribus
ScanREAD
e-Learning-App
- utility app for digitisation of archival documents
- software intended to help users decipher historical handwritings
- text segmentation, automated transcriptions, writer hand retrieval
- trainable software which is intended for unsupervised use
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
Transkribus
- text segmentation, automated transcriptions, writer hand retrieval
- trainable software which is intended for unsupervised use
- Transkribus is part webservice, part locally installed software & GUI and still in developement (current version: beta 0.10.1)
- Registration to online platform and software installation is required (proper JAVA-Environment mandatory)
READ | Recognition and
Enrichment of Archival Data
4. READ | Tools
Transkribus
- text segmentation, automated transcriptions, writer hand retrieval
- trainable software which is intended for unsupervised use
- Transkribus is part webservice, part locally installed software & GUI and still in developement (current version: beta 0.10.1)
- Registration to online platform and software installation is required (proper JAVA-Environment mandatory)
In case Transkribus_beta is not running let us take a look at the How-to-Guide in order to understand the functions provided:
https://transkribus.eu/wiki/images/7/77/How_to_use_TRANSKRIBUS_-_10_steps.pdf
4. READ | Tools

READ | Recognition and
Enrichment of Archival Data
5. Approaching Archival Documents
5. Approaching Archival Documents
Musical Instruments Museum Leipzig, Inv.: 926
5. Approaching Archival Documents
- Violin Labels are basically little snippets of paper which were/are used to authenticate musical instruments from the Violin family
- they serve as sources of information for the evaluation of Violin instruments
- prints and other facsimile like those published by Paul de Wit gave rise to industrial use of fake labels in order to gain profit
- extensive research on these historical documents promises to help distinguish fake from original (labels - and maybe even instruments)
5. Approaching Archival Documents
5. Approaching Archival Documents


Musical Instruments Museum Leipzig, Inv.: 793
Musical Instruments Museum Leipzig, Inv.: 926
READ | Recognition and
Enrichment of Archival Data
5. Approaching Archival Documents
- identifying documents
READ | Recognition and
Enrichment of Archival Data
5. Approaching Archival Documents
- identifying documents
- digitisation
- images identified:
4356 - signatures identified:
11722
- images identified:
READ | Recognition and
Enrichment of Archival Data
5. Approaching Archival Documents
- identifying documents
- digitisation
=> apply READ's tools
READ | Recognition and
Enrichment of Archival Data
6. Enriching Violin Labels
6. Enriching Violin Labels
-storing Violin Labels digitally requires use of machine-readable formats like XML
6. Enriching Violin Labels
-storing Violin Labels digitally requires use of machine-readable formats like XML
-XML stands for Extensible Markup Language
6. Enriching Violin Labels
-storing Violin Labels digitally requires use of machine-readable formats like XML
-XML stands for Extensible Markup Language
-XML's greatest benefits compared to other markup languages:
- allows to create own vocabulary
- provides interoperability (docs can be edited in any text editor)
- readable by humans & machines
6. Enriching Violin Labels
-storing Violin Labels digitally requires use of machine-readable formats like XML
-XML stands for Extensible Markup Language
-XML's greatest benefits compared to other markup languages:
- allows to create own vocabulary
- provides interoperability (docs can be edited in any text editor)
- readable by humans & machines
XML-documents themselves only store or interchange information
6. Enriching Violin Labels
-storing Violin Labels digitally requires use of machine-readable formats like XML
-XML stands for Extensible Markup Language
-XML's greatest benefits compared to other markup languages:
- allows to create own vocabulary
- provides interoperability (docs can be edited in any text editor)
- readable by humans & machines
XML-documents themselves only store or transport information
XML seperates data/information from its representation
6. Enriching Violin Labels
-the Text Encoding Initiative (TEI) provides guidelines & semi-standardised modules in order to allow collaborative research & assure high quality
How can TEI-modules help to enrich Violin Labels?
6. Enriching Violin Labels
-the Text Encoding Initiative (TEI) provides guidelines & semi-standardised modules in order to allow collaborative research & assure high quality
How can TEI-modules help to enrich Violin Labels?
TEI-modules already contain a lot of elements that can be used to thoroughly describe Violin Labels in high quantities.
6. Enriching Violin Labels
-the Text Encoding Initiative (TEI) provides guidelines & semi-standardised modules in order to allow collaborative research & assure high quality
How can TEI-modules help to enrich Violin Labels?
TEI-modules already contain a lot of elements that can be used to thoroughly describe Violin Labels in high quantities.
How could TEI-XML encoding work on Violin Labels?
7. Encoding Violin Labels
READ | Recognition and
Enrichment of Archival Data
7. Encoding Violin Labels
- XML is relatively easy
- follows the same syntactic rules as HTML (Hypertext Markup Language)
validate the document
elements are put in brackets "<" & ">" and are formed with an opening tag (<tag>) and a closing tag (</tag>)
7. Encoding Violin Labels
- XML is relatively easy
- follows the same syntactic rules as HTML (Hypertext Markup Language)
validate the document
elements are put in brackets "<" & ">" and are formed with an opening tag (<tag>) and a closing tag (</tag>)
- there are lots of online resource that provide aid when learning to encode in XML
W3Schools:
Tutorialspoint:
Youtube Tutorials (my favourite: Derek Banas):
hierarchical document structure
7. Encoding Violin Labels
- XML is relatively easy
- follows the same syntactic rules as HTML (Hypertext Markup Language)
validate the document
elements are put in brackets "<" & ">" and are formed with an opening tag (<tag>) and a closing tag (</tag>)
- there are lots of online resource that provide aid when learning to encode in XML
W3Schools:
Tutorialspoint:
Youtube Tutorials (my favourite: Derek Banas):
hierarchical document structure
well formed documents
7. Encoding Violin Labels
- well formed documents -
<!-- Write comments like this to keep track of your work! -->
<?xml version="1.0" encoding="UTF-8"?>
<body>
<div>
<element>Element names are case sensitive in XML! This Element is not well formed.</Element>
<!-- The tagging above would produce an error because the contents of opening and closing tag differ. -->
</div>
<div>
<table>I mean furniture not tabular data.</table>
<!-- The tagging above might be well formed but its content/intent conflicts with HTML vocabulary and is therefore invalid. -->
</div>
</body>
correct syntax = well formed
7. Encoding Violin Labels
- valid documents -
- a XML document validated against either a Document Type Declaration (DTD) or a Schema is both well formed and valid
7. Encoding Violin Labels
- valid documents | DTD -
<!-- Write comments like this to keep track of your work! -->
<?xml version="1.0" encoding="UTF-8"?>
<!-- The followong is an example of the content of a fictional external DTD file "violinLabel.dtd":
<!DOCTYPE violinLabel
[
<!ELEMENT violinLabel (name, term, location, year)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT term (#PCDATA)>
<!ELEMENT location (#PCDATA)>
<!ELEMENT year (#PCDATA)>
]>
-->
<!-- The following is a reference to the external DTD file mentioned above -->
<!DOCTYPE violinLabel SYSTEM "violinLabel.dtd">
<!-- We will discuss the flaws of the following XML document. -->
<violinLabel>
<name>Sympertus Niggell</name>
<term>Geigen- und Lauten- Macher in</term>
<location>Füssen</location>
<year>1791</year>
</violinLabel>
Musical Instruments Museum Leipzig, Inv.: 926
7. Encoding Violin Labels
- valid documents | DTD -
<!-- Write comments like this to keep track of your work! -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE violinLabel SYSTEM "violinLabel.dtd">
<!-- We will discuss the flaws of the following XML document. -->
<violinLabel>
<name>Sympertus Niggell</name>
<term>Geigen- und Lauten- Macher in</term>
<location>Füssen</location>
<year>1791</year>
</violinLabel>
Musical Instruments Museum Leipzig, Inv.: 926
Flaws:
- does not include "," and other characters
- does not distinguish between printed and handwritten text
- does not describe material properties of the Violin Label
- does not reference existing resources
7. Encoding Violin Labels
- valid documents | DTD vs Schema -
- major disadvantage of DTD validation:
decreased flexibility of used elements
Schemas provide a powerful and XML-based alternative to DTD!
- advantageous due to allowing XML documents to carry their own format descriptions
7. Encoding Violin Labels
- valid documents | Schema -
Schemas provide a powerful and XML-based alternative to DTD!
- advantageous due to allowing XML documents to carry their own format descriptions
<?xml version="1.0" encoding="UTF-8"?>
<!-- Although more flexible, there are still flaws to discuss. -->
<xs:element name="violinLabel">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="term" type="xs:string"/>
<xs:element name="location" type="xs:string"/>
<xs:element name="year" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Flaws:
- does still not include "," and other characters
- does still not distinguish between printed and handwritten text
- does still not describe material properties of the Violin Label
- does still not reference existing resources
7. Encoding Violin Labels
- valid documents -
- in scientific contexts TEI-XML modules have proven useful
As outlined on the handout "TEI-XML & Violin Labels" I identified the following modules as useful for the description of Violin Labels:
ID | Name | URL |
---|---|---|
tei | TEI Infrastructure | http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ST.html |
core | Common Core | http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html |
gaiji | Character and Glyph Documentation | http://www.tei-c.org/release/doc/tei-p5-doc/en/html/WD.html |
transcr | Transcription of Primary Sources | http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html |
namesdates | Names, Dates, People, and Places | http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html |
msdescription | Manuscript Description | http://www.tei-c.org/release/doc/tei-p5-doc/en/html/MS.html |
READ | Recognition and
Enrichment of Archival Data
Thank you for your attention!
contact me:
konstantin.freybe@uni-leipzig.de
find this presentation online:
https://slides.com/kfreybe/violinlabels/
Violin Labels: Digitisation, Transcription & Enrichment
By kfreybe
Violin Labels: Digitisation, Transcription & Enrichment
presentation
- 905