Digital Scholarship & AI
Andrew Janco, PhD
🦜
Machine Learning & Research Practices
- Annotation
- Document Segmentation
- Text Extraction (OCR/HTR)
- Visual Document Understanding
- Summarization / Metadata
- Classification
Identify how you, as a researcher, identify relevant information in your materials. How can we teach a machine to do the same?
https://journaliststudio.google.com/pinpoint
- Regular Expressions and pattern matching
- Named Entity Recognition
- Named Entity Linking
- Text categorization
Making sense of semantic similarity
Hi Folks,
I’ve just been playing with a HTR Model Andy sent me.
On these three random images ... I’m getting near perfect results compared to our hand corrected text. (I’d guess 99.9%).
In short, Andy. Can we run the [the model] on our computer? How much RAM do we need. Can we automate it...?
Large Language Model Operations (LLMOps)
Paid cloud endpoint (openai, replicate)
Local desktop workstation
Partner laptop
Large Language Model Operations (LLMOps)
Started with 72b parameter model
Local desktop workstation, test with 7B parameter, 4bit quantized model (11Gb GPU RAM)
Partner laptop, M chip on MacBook
Thank you!
DiScho
By Andrew Janco
DiScho
- 83