This slide deck has been co-created using generative AI for demonstrative and educational purposes.
Human-Machine Collaboration for
Better Access to Special Collections
by
Annika Rockenberger, PhD
University of Oslo Library
Digital Research Methods in the Humanities,
Social Sciences, Pedagogy, and Theology
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi nec metus justo. Aliquam erat volutpat.
Daniel Georg Lindhagen to Christopher Hansteen. St. Petersburg, 1st September 1849
Digitised letter (left) – Record with metadata on Alvin platform, without images (right)
The academic correspondence of the Norwegian Observatory:
Letters to Christopher Hansteen in the Collection of the Museum
of University History (MUV)
ALVIN platform and metadata, scanning and digitisation
Situation: no time or money for expert manual transcription
Using machine learning to do text recognition, correct result by hand until publishable
AI = Machine Learning
Handwritten Text Recognition (HTR)
Pattern recognition based on lines
Manual creation of Ground Truth (25+ pages)
Community project with materials from archives, libraries, museums, and research projects
Ground Truth: human-created, human-quality checked
AI = Large Language Model (LLM)
Language generation based on probability
trained on Ground Truth documents in the Transkribus collection
combined with HTR
Language/language group specific
Source bias (genre, medium, social group, type of document)
Training bias (good at recognising the known/most probable based on training set)
Machine Learning only HTR-model
trained on 15 million words / 3 million lines
running texts from 17th-19th century
Archival collections mainly
LLM-enhanced HTR model
trained on 31 million words
multilingual, heterogeneous, balanced training set
historical and modern documents
???
High-quality automatic transcription
Some characters are misread (r/n, a/o, k/h)
High-quality text recognition, but:
too eager to produce probable text!
Excitement
Attention
Irritation (Believable Bullshit)
Frustration
Disenchantment
Created concise "Regesta" to illuminate essential content of letters.
In the field of artificial intelligence, a hallucination, or artificial hallucination (also called bullshitting, confabulation, or delusion), is a response generated by an AI that presents false or misleading information as fact. (Wikipedia)
One example of many: HTR-model with LLMs built in tends to confabulate
ALVIN platform enhances digital cultural heritage accessibility.