READ | Recognition and
Enrichment of Archival Data

University of Leipzig

Konstantin Freybe, M.A.
konstantin.freybe@uni-leipzig.de

Digitising Violin Labels

Transcription, Encoding, Enrichment

READ | Recognition and
Enrichment of Archival Data

Outline

  1. Facts & Figures
  2. Mission
  3. Target User Groups
  4. READ | Tools
  5. Approaching Archival Documents
  6. Enriching Violin Labels
  7. Encoding Violin Labels

READ | Recognition and
Enrichment of Archival Data

1. Facts & Figures

Duration:
January 1st 2016 - June 30th 2019
Grant:
8.200.000EUR
received through EU research & innovation programme H2020
Partners:
13 partners coordinated by University of Innsbruck
10 institutions (increasing) as associated partners via memorandum of understanding

READ | Recognition and
Enrichment of Archival Data

2. Mission

READ | Recognition and
Enrichment of Archival Data

2. Mission

 

  • increase accessibility of archival documents
  • revolutionise Handwritten Text Recognition (HTR)
  • establish Virtual Research Environments (VRE)

READ | Recognition and
Enrichment of Archival Data

3. Target User Groups

READ | Recognition and
Enrichment of Archival Data

3. Target User Groups

  • Memory Institutions

READ | Recognition and
Enrichment of Archival Data

3. Target User Groups

 

 

 

  • Memory Institutions
    • Archives
    • Libraries
    • Museums

 

READ | Recognition and
Enrichment of Archival Data

3. Target User Groups

 

 

  • Memory Institutions
  • Humanities Scholars
  • Computer Scientists
  • public users & volunteers (crowd-sourcing)

 

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

Transkribus

ScanREAD

e-Learning-App

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

Transkribus

ScanREAD

e-Learning-App

  • utility app for digitisation of archival documents

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

Transkribus

ScanREAD

e-Learning-App

  • utility app for digitisation of archival documents
  • software intended to help users decipher historical handwritings

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

Transkribus

ScanREAD

e-Learning-App

  • utility app for digitisation of archival documents
  • software intended to help users decipher historical handwritings
  • text segmentation, automated transcriptions, writer hand retrieval
  • trainable software which is intended for unsupervised use

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

Transkribus

  • text segmentation, automated transcriptions, writer hand retrieval
  • trainable software which is intended for unsupervised use
  • Transkribus is part webservice, part locally installed software & GUI and still in developement (current version: beta 0.10.1)
  • Registration to online platform and software installation is required (proper JAVA-Environment mandatory)

READ | Recognition and
Enrichment of Archival Data

4. READ | Tools

Transkribus

  • text segmentation, automated transcriptions, writer hand retrieval
  • trainable software which is intended for unsupervised use
  • Transkribus is part webservice, part locally installed software & GUI and still in developement (current version: beta 0.10.1)
  • Registration to online platform and software installation is required (proper JAVA-Environment mandatory)

In case Transkribus_beta is not running let us take a look at the How-to-Guide in order to understand the functions provided:

https://transkribus.eu/wiki/images/7/77/How_to_use_TRANSKRIBUS_-_10_steps.pdf

4. READ | Tools

READ | Recognition and
Enrichment of Archival Data

5. Approaching Archival Documents

5. Approaching Archival Documents

Musical Instruments Museum Leipzig, Inv.: 926

5. Approaching Archival Documents

- Violin Labels are basically little snippets of paper which were/are used to authenticate musical instruments from the Violin family

- they serve as sources of information for the evaluation of Violin instruments

- prints and other facsimile like those published by Paul de Wit gave rise to industrial use of fake labels in order to gain profit

- extensive research on these historical documents promises to help distinguish fake from original (labels - and maybe even instruments)

5. Approaching Archival Documents

5. Approaching Archival Documents

Musical Instruments Museum Leipzig, Inv.: 793

Musical Instruments Museum Leipzig, Inv.: 926

READ | Recognition and
Enrichment of Archival Data

5. Approaching Archival Documents

  • identifying documents

READ | Recognition and
Enrichment of Archival Data

5. Approaching Archival Documents

  • identifying documents
  • digitisation
    • images identified:
      4356
    • signatures identified:
      11722

READ | Recognition and
Enrichment of Archival Data

5. Approaching Archival Documents

  • identifying documents
  • digitisation

=> apply READ's tools

READ | Recognition and
Enrichment of Archival Data

6. Enriching Violin Labels

6. Enriching Violin Labels

-storing Violin Labels digitally requires use of machine-readable formats like XML

6. Enriching Violin Labels

-storing Violin Labels digitally requires use of machine-readable formats like XML

-XML stands for Extensible Markup Language

6. Enriching Violin Labels

-storing Violin Labels digitally requires use of machine-readable formats like XML

-XML stands for Extensible Markup Language

-XML's greatest benefits compared to other markup languages:

  • allows to create own vocabulary
  • provides interoperability (docs can be edited in any text editor)
  • readable by humans & machines

6. Enriching Violin Labels

-storing Violin Labels digitally requires use of machine-readable formats like XML

-XML stands for Extensible Markup Language

-XML's greatest benefits compared to other markup languages:

  • allows to create own vocabulary
  • provides interoperability (docs can be edited in any text editor)
  • readable by humans & machines

XML-documents themselves only store or interchange information

6. Enriching Violin Labels

-storing Violin Labels digitally requires use of machine-readable formats like XML

-XML stands for Extensible Markup Language

-XML's greatest benefits compared to other markup languages:

  • allows to create own vocabulary
  • provides interoperability (docs can be edited in any text editor)
  • readable by humans & machines

XML-documents themselves only store or transport information

XML seperates data/information from its representation

6. Enriching Violin Labels

-the Text Encoding Initiative (TEI) provides guidelines & semi-standardised modules in order to allow collaborative research & assure high quality

How can TEI-modules help to enrich Violin Labels?

6. Enriching Violin Labels

-the Text Encoding Initiative (TEI) provides guidelines & semi-standardised modules in order to allow collaborative research & assure high quality

How can TEI-modules help to enrich Violin Labels?

TEI-modules already contain a lot of elements that can be used to thoroughly describe Violin Labels in high quantities.

6. Enriching Violin Labels

-the Text Encoding Initiative (TEI) provides guidelines & semi-standardised modules in order to allow collaborative research & assure high quality

How can TEI-modules help to enrich Violin Labels?

TEI-modules already contain a lot of elements that can be used to thoroughly describe Violin Labels in high quantities.

How could TEI-XML encoding work on Violin Labels?

7. Encoding Violin Labels

READ | Recognition and
Enrichment of Archival Data

7. Encoding Violin Labels

- XML is relatively easy

- follows the same syntactic rules as HTML (Hypertext Markup Language)

validate the document

elements are put in brackets "<" & ">" and are formed with an opening tag (<tag>) and a closing tag (</tag>)

7. Encoding Violin Labels

- XML is relatively easy

- follows the same syntactic rules as HTML (Hypertext Markup Language)

validate the document

elements are put in brackets "<" & ">" and are formed with an opening tag (<tag>) and a closing tag (</tag>)

- there are lots of online resource that provide aid when learning to encode in XML

Youtube Tutorials (my favourite: Derek Banas):

https://www.youtube.com/playlist?list=PLBB413675AFBDC1F4

hierarchical document structure

7. Encoding Violin Labels

- XML is relatively easy

- follows the same syntactic rules as HTML (Hypertext Markup Language)

validate the document

elements are put in brackets "<" & ">" and are formed with an opening tag (<tag>) and a closing tag (</tag>)

- there are lots of online resource that provide aid when learning to encode in XML

Youtube Tutorials (my favourite: Derek Banas):

https://www.youtube.com/playlist?list=PLBB413675AFBDC1F4

hierarchical document structure

well formed documents

7. Encoding Violin Labels

- well formed documents -

<!-- Write comments like this to keep track of your work! -->

<?xml version="1.0" encoding="UTF-8"?>

<body>
    <div>
        <element>Element names are case sensitive in XML! This Element is not well formed.</Element>
        <!-- The tagging above would produce an error because the contents of opening and closing tag differ. -->
    </div>
    <div>
        <table>I mean furniture not tabular data.</table>
        <!-- The tagging above might be well formed but its content/intent conflicts with HTML vocabulary and is     therefore invalid. -->
    </div>
</body>

correct syntax = well formed

7. Encoding Violin Labels

- valid documents -

- a XML document validated against either a Document Type Declaration (DTD) or a Schema is both well formed and valid

7. Encoding Violin Labels

- valid documents | DTD -

<!-- Write comments like this to keep track of your work! -->

<?xml version="1.0" encoding="UTF-8"?>

<!-- The followong is an example of the content of a fictional external DTD file "violinLabel.dtd":

<!DOCTYPE violinLabel
[
<!ELEMENT violinLabel (name, term, location, year)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT term (#PCDATA)>
<!ELEMENT location (#PCDATA)>
<!ELEMENT year (#PCDATA)>
]>

-->
<!-- The following is a reference to the external DTD file mentioned above -->

<!DOCTYPE violinLabel SYSTEM "violinLabel.dtd">
<!-- We will discuss the flaws of the following XML document. -->

<violinLabel>
    <name>Sympertus Niggell</name>
    <term>Geigen- und Lauten- Macher in</term>
    <location>Füssen</location>
    <year>1791</year>
</violinLabel>

Musical Instruments Museum Leipzig, Inv.: 926

7. Encoding Violin Labels

- valid documents | DTD -

<!-- Write comments like this to keep track of your work! -->

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE violinLabel SYSTEM "violinLabel.dtd">
<!-- We will discuss the flaws of the following XML document. -->

<violinLabel>
    <name>Sympertus Niggell</name>
    <term>Geigen- und Lauten- Macher in</term>
    <location>Füssen</location>
    <year>1791</year>
</violinLabel>

Musical Instruments Museum Leipzig, Inv.: 926

Flaws:

  • does not include "," and other characters
  • does not distinguish between printed and handwritten text
  • does not describe material properties of the Violin Label
  • does not reference existing resources

7. Encoding Violin Labels

- valid documents | DTD vs Schema -

- major disadvantage of DTD validation:

decreased flexibility of used elements

Schemas provide a powerful and XML-based alternative to DTD!

- advantageous due to allowing XML documents to carry their own format descriptions

7. Encoding Violin Labels

- valid documents | Schema -

Schemas provide a powerful and XML-based alternative to DTD!

- advantageous due to allowing XML documents to carry their own format descriptions

<?xml version="1.0" encoding="UTF-8"?>
<!-- Although more flexible, there are still flaws to discuss. -->
<xs:element name="violinLabel">

<xs:complexType>
  <xs:sequence>
    <xs:element name="name" type="xs:string"/>
    <xs:element name="term" type="xs:string"/>
    <xs:element name="location" type="xs:string"/>
    <xs:element name="year" type="xs:string"/>
  </xs:sequence>
</xs:complexType>

</xs:element>

Flaws:

  • does still not include "," and other characters
  • does still not distinguish between printed and handwritten text
  • does still not describe material properties of the Violin Label
  • does still not reference existing resources

7. Encoding Violin Labels

- valid documents -

- in scientific contexts TEI-XML modules have proven useful

As outlined on the handout "TEI-XML & Violin Labels" I identified the following modules as useful for the description of Violin Labels:

ID Name URL
tei TEI Infrastructure http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ST.html
core Common Core http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html
gaiji Character and Glyph Documentation http://www.tei-c.org/release/doc/tei-p5-doc/en/html/WD.html
transcr Transcription of Primary Sources http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html
namesdates Names, Dates, People, and Places http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html
msdescription Manuscript Description http://www.tei-c.org/release/doc/tei-p5-doc/en/html/MS.html

READ | Recognition and
Enrichment of Archival Data

Thank you for your attention!

contact me:
konstantin.freybe@uni-leipzig.de

find this presentation online:
https://slides.com/kfreybe/violinlabels/

Violin Labels: Digitisation, Transcription & Enrichment

By kfreybe

Violin Labels: Digitisation, Transcription & Enrichment

presentation

  • 905