Building a Fichero: New Tools, Old Documents, and Machine Learning Workflows with an Endangered Afro-Colombian Archive

— Andrew Paul Janco, Kelly López Roldán, Daniel Tubb

Key Ideas

artificial intelligence + social intelligence

Machine learning must support and expand on investments in people.

 

scale don't replace

Machine learning should augment and scale human archival and research work

situated knowledge

Build situated knowledge and empower people to avoid extractive practices

2022

 photograph and digitize the archive

Semillero

 cataloging and interpretation phase

2023

members of the project team visited the Muntu Bantu Center in Qubidó

Fichero

a complementary approach to cataloguing which uses a machine learning workflow

Semillero

Semillero

as data

Weasel: A small and easy workflow system

https://github.com/explosion/weasel

prepare

FIchero

prepare

transcribe

FIchero

prepare

transcribe

process

FIchero

prepare

transcribe

process

publish

FIchero

Future Directions: LLMs for HTR post correction

Future Directions: Ensemble of Kraken Models

Hand-Type Classifier

Text

Thank you!

Building a Fichero: New Tools, Old Documents, and Machine Learning Workflows with an Endangered Afro-Colombian Archive

By Andrew Janco

Building a Fichero: New Tools, Old Documents, and Machine Learning Workflows with an Endangered Afro-Colombian Archive

This paper describes outcomes and challenges in human-scale document processing. We discuss a workflow that begins with document preservation, moves through text recognition, and ends with a catalogue that demonstrates capabilities of LLMs for research and archival work, while remaining attuned to the vision of research partners. Until 2022, the Istmina Circuit Court archive, with documents from the 1870s to 1930s, was rotting, disorganized, and in garbage bags. Yet, this archive is a crucial source of Afro-Colombian history in an often-marginalized region of the Chocó in Colombia. In 2023, seven young people from Istmina and Quibdó worked with the Muntú Bantú Foundation, a community center focused on Afro-diasporic memory. With researchers from various universities, they were able to digitize the archive, which is available online at the British Library. While the project was successful, the digitization has enabled new workflows to catalogue and interpret the archive. This paper explores these workflows. Throughout, we are interested in a key problem of equity in knowledge production: How can new tools be used to the benefit of local knowledge-producers? Our paper focuses on the work of cataloguing archival materials, a first step in enabling local researchers (and others) to make meaning. Project interns catalogued 330 Case Files and wrote a book of micro-history. Yet, t

  • 142