https://slides.com/d/2Ouudnw/live

https://slides.com/didaskalos/ai-humanities-inkcode

AI & Humanities Data

Allen J. Romano

allenjromano@gmail.com

https://www.linkedin.com/in/allenjromano/

AI?

De-hyping the Noise

AI = Generative AI?

(above) "Pelican on a bike" by gpt-4o (we'll come back to this....)

🤔 ...

built-in (sort-of) to Google products;

end-to-end: v0, lovable, and many others
see also: chaos coding with https://yoheinakajima.com/blog/

AI -assisted coding tools

Oct. 2022

Ridiculous virality

Not Hype

Not Hype

  • Multimodal (voice, image, video) models and architectures
  • Agentic systems
  • Integration into other systems
  • Robots

 

  • Data transformations: structured to unstructured, language to language, code to language, etc.
  • Code generation

A Bet on Scaling

Emergent Abilities in Large Language Models: A Survey, https://arxiv.org/html/2503.05788v1

Sharp increase disappears (bottom) with linear metric

1. AI: What is it good for

0.

Model access options

claude.ai

chatgpt.com/

google gemini

pi

ollama (local, small models)

llm (cli): https://github.com/simonw/llm

1.

First instincts

Based on what you have seen and been thinking about this week with your project, what is the first thing that you would consider doing with these tools?

2.

It is strongly recommended that you use a disposable or non-primary Google account for signing up or engaging with these services

Setup

Chat is not the default interface for AI

Chat is not the data type of AI

2. Clean Data

1. Get Data

3. Explore Data

4. Model | Analyze | Visualize Data

5. Communicate about the Data

Label data

Enrich data

Form research question

https://www.datascience-pm.com/data-science-life-cycle/

Good software turns best practices into tools

🤔 ...

See also: F. Moretti, Graphs, Maps, Trees: Abstract Models for a Literary History: https://www.versobooks.com/products/1939-graphs-maps-trees

Unstructured: Free-form text, images, audio, video

 

Semi-structured: JSON: {"key": "value", "list": [a, b, c]} | XML/TEI: <stuff>blah</stuff> | Graphs | Trees | Maps/dictionaries

 

Structured: Tabular: CSV, Parquet, SQL tables | Simple lists: [a, b, c] | Fixed-format records

 

 

Some Varieties of Data

AI Data Types: Tokens | Vector Embeddings | Models

Tokenization

Sequences of pieces

Vary it

Different languages, characters, anything at all

2.

Assess

What do you observe?

3.

Exercise: Tokenization

Vector Embeddings

Go to embeddings projector

https://projector.tensorflow.org/

1.

Click!

Explore the data. Anything surprising? What is this showing?

2.

Analysis?

How does this kind of data compare to other forms of data you have worked with?

3.

Embeddings Visualization

Pre-2010s Vector Space Models (LSA, LDA) Topic modeling, semantic search Gensim, scikit-learn Sparse, interpretable vectors based on word/document co-occurrence.
2013–2014 Word2Vec (Mikolov et al.), GloVe Semantic similarity, analogies Gensim, spaCy Dense, static word vectors. Each word has a single vector regardless of context123.
2015–2017 FastText (Facebook) Morphologically rich languages fastText, Gensim Adds subword info (character n-grams) for OOV/generalization.
2018–2019 ELMo, BERT (contextual embeddings) Context-aware similarity, QA Hugging Face Transformers Dynamic vectors—same word gets different embeddings in different contexts.
2019+ Sentence Transformers (SBERT) Semantic search, clustering, retrieval Sentence Transformers (SBERT) SBERT builds on Hugging Face Transformers, adding pooling layers and fine-tuning for sentence-level meaning4.
2021+ CLIP, ALIGN, Multimodal Embeddings Cross-modal retrieval, image-text alignment OpenAI CLIP, LLaVA, Hugging Face Joint embedding spaces for text and images; enables text-to-image and vice versa.
2022+ Domain-Specific/Instruction-Tuned Embeddings Specialized DH tasks, RAG, QA SPECTER, E5, Ada, etc. Embeddings fine-tuned for tasks (e.g., academic papers, legal docs, retrieval-augmented generation).

Run the notebook

(it will take a few minutes)

2.

Observations?

What do you notice? What patterns do you see?

 

Optional: Experiment with other embedding models

https://huggingface.co/spaces/mteb/leaderboard

3.

Models as data

  • Model "knowledge" is a compressed abstraction of its training data
  • Sparse data is often under-represented or represented through denser data (e.g. low resource languages accessed through alignment with English)
  • Adaptive and contextual compression in modern LLMs (vs. traditional data compression)

Models as Data Compression

Model-Oriented ML

  • Improve performance by adjusting architecture or parameters
  • Make the best model
  • Most work today is ostensibly model-centric

Data-Centric ML

  • Improve performance by getting better | cleaner | optimal data
  • Make the best dataset
  • Data-centric generally produce larger overall performance gains

Do data types shape research questions?

Impact of Format Restrictions on Large Language Models: https://arxiv.org/pdf/2408.02442

Choose a task relevant to your data or project

 

1.

Vary the output structure

"Output as JSON" vs. "Output as xml" vs. no instructions vs. more specific and detailed instructions

2.

Differences? No Change?

 

3.

Exercise: Structured Outputs

DH as discovery within Latent Space

Traversing AI "circuits"

Universal NLP?

see also: Universal Natural Language Processing:  https://arxiv.org/abs/2010.02584

Choose a text, image, object related to your dataset

 

1.

Results?

 

3.

Exercise: NLP tasks with LLMs

AI Evaluation

Beyond Prompt and Pray

Benchmarks

Holistic Evaluation

Pelican Evaluation

OpenAI gpt-3.5

Prompt:

Generate an SVG of a pelican riding a bicycle

OpenAI gpt-4o-mini

Prompt:

Generate an SVG of a pelican riding a bicycle

OpenAI gpt-4o

OpenAI gpt-o1 ("reasoning" model)

Google 1.5-flash

Google 1.5-pro

Google Gemini exp 1206

Claude Haiku (smallest Anthropic model)

Claude Sonnet (medium Anthropic model)

Claude Opus (big Anthropic "reasoning" model)

Pull up any model or LLM of your choosing

chatgpt, openai, ollama locally

1.

Create an eval (or set of related evals)

Make it hard. Make it clever. Make it not about pelicans.

2.

Discuss and Iterate

What is your eval actually assessing? What good is it?

3.

Exercise: Create an AI evaluation

Evaluation is a HARD problem

  • Domain-specific
  • Benchmarks can be "juiced"
    • for example, outputs that are more flattering or verbose often do better on "intelligence" benchmarks; but are they actually "better"
    • Sycoph
  • Fabrication (aka "Hallucination") where the model invents things may be impossible to detect reliably
  • Most existing evaluations were made for pre-GPT-level models and systems; they are also likely included in training data for all LLMs
  • Effective evaluation requires a test set that the model has not seen before

Towards Understanding Sycophancy in Language Models: https://arxiv.org/pdf/2310.13548

Opportunity for Humanists?

 

Diverse, domain-specific evaluations and benchmarks as a direct and actionable way to impact the development of AI

Based on the task from before, how would you judge that the output was correct?

 

1.

LLM as Judge

Use an llm to evaluate the output of another llm

2.

How good is your judge?

How would you improve the ability of the second llm to judge the output of the first?

3.

Multipurpose evaluator:
LLM as Judge

In-Context Learning

aka few-shot prompting

Give the LLM a task

Think of a task that you want the LLM to do. Ask question to LLM in the minimal form possible.

e.g. LLM-as-judge tasks, NLP tasks

1.

Context

Add examples of the task you are asking for done well and done poorly. 

2.

Try to improve performance

Does it make a difference if you use obvious examples or unobvious examples? What kinds of examples? How do you get examples?

3.

Exercise: Basic In-context learning 

Synthetic Data and Simulation

Any LLM

 

1.

Define a simulation:

e.g. you are an 18th century resident of San Miniato, 

simulate a conversation in the fifth century BCE

 

2.

Can you make it better based on your expert knowledge?

a. What criteria would you evaluate in the output?

b. Can you build an LLM judge to assess the output?

 

3.

Exercise: Simulation

AI Systems > AI models

From LLMs to Agentic Systems

A2A, MCP

so-called "reasoning" models (aka lots more tokens models)

models that don't follow rewards

AI as an object of study

Provocations from the Humanities for Generative AI Research:

https://arxiv.org/html/2502.19190v1

Everyone is a developer?

On-demand software