Daniel Varela, Daniel Tubb, Kelly López Roldán, Ann Farnsworth-Alvear, Yuri Romaña Rivas, María Fernanda Parra Rodríguez, Sergio Mosquera, Andrew Janco, Cynthia Heider, Brie Gettleson, Eliecer Angulo Castro, Angélica Aqualimpia Copete, Laura Caicedo, Milagros Gonzalez, Javier Hurtado Ibargüen, Ernestina Lemos Rentería, Yusleyda Perea Cuesta, Jhon Leison Rivas Rodríguez, Nallely Taborda Castañeda, Yeison Vente
Photographs: Yeison Vente, Daniel Varela, María Fernanda Parra Ramírez. Persons shown lower left, clockwise from left: Jhon Leison Rivas, Yusleyda Perea Cuesta, Ernestina Lemos Rentería (2022)
Circuit Court of Istmina, Chocó
(Project with Daniel Varela (University of Michigan)
and Fundación Muntú Bantú)
https://eap.bl.uk/project/EAP1477
Location of the Chocó
Map credit: Instituto Geográfico Agustín Codazzi (IGAC)
Pacific-coast rainforest region: long history of mining and impoverishment, majority Afrocolombian population. Persons shown: Javier Hurtado, Angelica Aqualimpia, Yusleyda Perea Cuesta (2022). Photos: Ann Farnsworth-Alvear
Colombia
the work of photographing and digitizing the archive was done by
Angélica Aqualimpia, Yeison Vente, and Daniel Varela
in photo: Yeison Vente of @formatonegro
the work of human cataloging and the creation of youth-authored essays was done in collaboration with
Fundación Muntú Bantú and Kelly López Roldán, University of Pennsylvania
transcribes and auto-catalogues historical archives using vision large language models and artificial intelligence, running locally or in the cloud -- a collaboration with Anthropologist Daniel Tubb, University of New Brunswick
Text
model | accuracy |
---|---|
HTR-Araucania_XIX | 90.4% |
Fmb-best | 91.8% |
Sergio diary (single handwriting style) |
97.5% |
Se trata de una diligencia judicial para constatar la realidad y magnitud de los |
chat interleaved VL data
{
"messages":[
{
"content":"<image>extract text",
"role":"user"
},
{
"content":"plotar dichas minas en la forma que lo",
"role":"assistant"
}
],
"images":[
"fmb_images/055e42a3-a5c9-40f8-9bf6-3cd5e82dcc1c.png"
]
},
Fine-tuning "small" VLMs on line-level text-image pairs from eScriptorium
CER | min | q1 | q2 | q3 | max | |
---|---|---|---|---|---|---|
model | param | |||||
Qwen2VL | 2B | 0.0 | 0.076 | 0.168 | 0.478 | 3.979 |
FMB-2b | 2B | 0.0 | 0.0426 | 0.086 | 0.211 | 1.062 |
Qwen2VL | 7B | 0.0 | 0.059 | 0.105 | 0.303 | 3.609 |
FMB | 7B | 0.0 | 0.058 | 0.087 | 0.353 | 0.923 |
Qwen-VL-Plus - | 72B | 0.0 | 0.057 | 0.180 | 0.417 | 1.125 |
GPT-4o | 200B+ | 0.0 | 0.029 | 0.0883 | 0.408 | 1.125 |
Fine-tuning "small" VLMs on line-level text-image pairs from eScriptorium
38 test images
CNNs | "small" VLMs (7B) | "big" VLMs (70B+) | |
---|---|---|---|
fine-tuning | easy to fine-tune | fine-tuning | usually none |
segmentation | req. line segmentation | work well with chunks | work with full pages |
hardware | no GPU, M1 | 1 x A10G GPU train: 4 x L40S |
Many GPUs or API |
pos/neg | char errors | fewer hallucinations | hallucinations |
capability |
HTR only | NER, case-level summarization | collection-level metadata |
Base-line Segementer (kraken.blla)
Chunking
Processing for readability
Output formats: Word image-text pairs
1930 Luis Enrique Bernal Contra Nicanor Córdoba y Juan Francisco Moreno
Daniel Tubb,
fichero collaborator
Output formats: Case-summaries with entities in context
2023 workshop