Digital Scholarship & AI

Andrew Janco, PhD

Outline

Finding paths to research outcomes using AI

  • What is AI?

  • When is it the right tool?

  • How to use it effectively?

     

🦜

Machine Learning & Research Practices

  • Annotation
  • Document Segmentation
  • Text Extraction (OCR/HTR)
  • Visual Document Understanding
  • Summarization / Metadata
  • Classification

 

Identify how you, as a researcher, identify relevant information in your materials. How can we teach a machine to do the same?

https://journaliststudio.google.com/pinpoint

  • Regular Expressions and pattern matching
  • Named Entity Recognition
  • Named Entity Linking
  • Text categorization

Making sense of semantic similarity

Hi Folks,

I’ve just been playing with a HTR Model Andy sent me. 

On these three random images ... I’m getting near perfect results compared to our hand corrected text. (I’d guess 99.9%). 

In short, Andy. Can we run the [the model] on our computer?  How much RAM do we need. Can we automate it...? 


Large Language Model Operations (LLMOps)

Paid cloud endpoint (openai, replicate)

Local desktop workstation

Partner laptop

Large Language Model Operations (LLMOps)

Started with 72b parameter model

Local desktop workstation, test with 7B parameter, 4bit quantized model (11Gb GPU RAM)

Partner laptop, M chip on MacBook

Intro to AI

  • What is "AI"? How does it work?
  • When is it the right tool? For what tasks?
  • How can we use it effectively to create useful research data
  • Follow good practices of documentation and responsible operations.

Thank you!

DiScho

By Andrew Janco

DiScho

  • 19