Building a Fichero: New Tools, Old Documents, and Machine Learning Workflows with an Endangered Afro-Colombian Archive
By Andrew Janco
Building a Fichero: New Tools, Old Documents, and Machine Learning Workflows with an Endangered Afro-Colombian Archive
This paper describes outcomes and challenges in human-scale document processing. We discuss a workflow that begins with document preservation, moves through text recognition, and ends with a catalogue that demonstrates capabilities of LLMs for research and archival work, while remaining attuned to the vision of research partners. Until 2022, the Istmina Circuit Court archive, with documents from the 1870s to 1930s, was rotting, disorganized, and in garbage bags. Yet, this archive is a crucial source of Afro-Colombian history in an often-marginalized region of the Chocó in Colombia. In 2023, seven young people from Istmina and Quibdó worked with the Muntú Bantú Foundation, a community center focused on Afro-diasporic memory. With researchers from various universities, they were able to digitize the archive, which is available online at the British Library. While the project was successful, the digitization has enabled new workflows to catalogue and interpret the archive. This paper explores these workflows. Throughout, we are interested in a key problem of equity in knowledge production: How can new tools be used to the benefit of local knowledge-producers? Our paper focuses on the work of cataloguing archival materials, a first step in enabling local researchers (and others) to make meaning. Project interns catalogued 330 Case Files and wrote a book of micro-history. Yet, t