AI Librarians
Quick introduction to AI in libraries and information science.
Fine-tuning
Example from archives in Colombia. Use of "small" VLMs for text extraction and document understanding.
Inference
Lessons learned from processing all of Princeton's Senior theses and Dissertations.
Fine-tuning "small" VLMs on line-level text-image pairs from eScriptorium
CNN
base
FT
VLM
FT
7B
FT
2B
base
7B
base
2B
{
"messages":[
{
"content":"<image>extract text",
"role":"user"
},
{
"content":"plotar dichas minas en la forma que lo",
"role":"assistant"
}
],
"images":[
"fmb_images/055e42a3-a5c9-40f8-9bf6-3cd5e82dcc1c.png"
]
},Metadata context
Â
Identifier: EAP699/23/1
Title: Photographs
Place: Moldova, EuropeZoom / select tool for
image search
multimodal-RAG
18,608 Senior theses and PhD Dissertations, 2013-2025
Metadata from DataSpace
Markdown from nanonets/Nanonets-OCR-s (Qwen2.5-VL)
4.4 GB, 1.1 billion tokens, 2 million pages
31 days, 531 HPC jobs, 4x4hours at a time
Average CPU utilization 99.5%, GPU utilization 86%
Adapted from uv-scripts/ocr by davanstrien
doc = pymupdf.open(pdf)
for i, page in tqdm(enumerate(doc)): # iterate through the pages
pix = page.get_pixmap(dpi=100)
img = pix.pil_image()
pdf_images.append({
"image": img,
"page": i + 1,
}) pdf_text = """"""
for batch in tqdm(image_batches, desc=f"Processing {pdf.stem}"):
batch_messages = [make_ocr_message(page["image"]) for page in batch]
# Process with vLLM
outputs = llm.chat(batch_messages, sampling_params)
# Extract markdown from outputs
for output in outputs:
markdown_text = output.outputs[0].text.strip()
pdf_text += markdown_text + "\n\n"
md_file.write_text(pdf_text, encoding='utf-8')Â
LLM Collective Nov 12, 2025, 4:00 pm – 5:00 pm, Commons Library Classroom
Â
slides.com/andrewjanco/vlms-in-stacks