Practical Necromancy for Beginners

An archaeology of ghosts

Shawn Graham

Department of History, Carleton U

https://scholar.social/@electricarchaeo

follow along at:

https://bit.ly/sg-necromancy

image via user 'photos_ frompasttofuture' unsplash

Who is 'I'?
On what basis do they have the right to make this call?
How do they know netlogo? And Roman history? What data?
This ghost-in-the-machine assumes the driver's seat - and there is no appeal.
Can't teach how to 'use' such a thing: Must teach how to subvert.

So I wanted to build a simulation...

My first excavation involved a vampire...

Text

Colton Sturgeon, Unsplash

You're now going to get 10 weeks of class compressed into 40 minutes. Buckle Up.

(by the way you can see the class at hist4805.netlify.app)

Łukasz Łada, Unsplash

Nuggan

Clacks

Where do the shadows come from?

The Department of Postmortem Communications

Lesson 1. Correct Use of Shovel

Practical Necromancy begins with networks

Networks can represent the past

Networks are present in archaeological materials

Networks can be a substrate for further simulation

Networks can problem solve

Neural Networks

a potted history

Dash Khatami via unsplash

Good ol' Wikipedia! https://en.wikipedia.org/wiki/Neural_network https://en.wikipedia.org/wiki/Perceptron

https://en.wikipedia.org/wiki/Perceptron; Charles Wightman adjusting the machine

Workable ways of (re)training neural networks in a reasonable amount of time, with a reasonable amount of data*

*terms and conditions apply

LLM are a subset of machine learning.

Machine learning - learning of patterns - is enormously useful in archaeology.

Open Alex Search

Google Scholar Search

...and what used to be difficult is now something you can do in a webpage...

Data Repositories

Data Cemeteries

(and don't get me started about 'sharing' data via a PDF)

Part One

https://huggingface.co/learn/computer-vision-course/unit5/generative-models/diffusion-models/introduction

We become ghosts through decomposition

datasets & ghosts

DH: analyzes an image to say this is a 'red figure kantharos'

GenAI says: here are the conventions that align with the idea of a kantharos rendered as pixel data

Generative AI is digital humanities run in reverse.... AI becomes a system for producing approximations of human media that align with all the data swept together to describe that media.

Salvaggio, Toward a Multi Modal Media Theory

Sam Barber via unsplash

Go explore

Henry Molaison

h/t to Steven Johnson for a piece connecting Molaison & Context Length

The cost of training

GPT-3: Up to 2048 tokens ~ 1500 words
Mistral 7B: Up to 8192 tokens ~6100 words
GPT-4o: From 60K to 128K tokens in some configurations
Claude 3.5: Up to 100K tokens
LLama 3.1: Up to 128K tokens
Gemini 1.5 Pro: Up to 1M tokens

https://datanorth.ai/blog/context-length

"The computational cost scales quadratically with the length of the context window, meaning a model with a context length of 4096 tokens requires 64 times more computational resources than a model with a context length of 1024 tokens."

and we're not even talking

here about the cost(s) of siphoning

all the content in the world to get the

necessary data

...the materiality of digital archaeology...

ChatGPT & The Eliza Effect

Isi Parente Unsplash

Part Two

The TESCREAL Bundle

link

Techno-solutionism
Eschatological beliefs
"AGI will save us"
Religious fervor

sounds like religion to me.

Gods
Interns

Cogs

Part Three

see Drew Breunig: The 3 AI Use Cases

Two Kinds of Necromancy

Impractical Necromancy

Large Language Models
Chatbots
Reverse DH polarity
Easy but destructive
Gods, Interns

Practical Necromancy

Small, targeted models
Curated datasets
Ethical sourcing
Specific questions
Sweep behaviour space
Cogs and Widgets

No gods, no interns. Only cogs.

(which would mean way less money/power for techbro oligarchs/klept)

God - Intern - Cog

Three Notebooks

necromancy for good instead of evil

yes, you've got homework

Homework!

We're going to resurrect Flinders Petrie

https://bit.ly/flinders-petrie

We'll take what Petrie wrote, and fill in the gaps with an LLM.... just like those scientists at Jurassic Park filled in the gaps in dino dna with amphibian DNA.

And that worked out ok.

Right? ... Right...?

(we're using GPT2, a 'completion' model that can have additional layers of training added to it. By playing with this we dispel some of the Eliza effect)

You'd think this would be difficult to do. Nope.

(and now there's a small industry of academic papers on 'digital necromancy' meaning all this)

now we're going to surface some ghosts in the training data

notebook for images-as-infographics

what 'attractors' for 'archaeological excavation' do we see?

Explore here.

We're using a slight modification to Salvaggio's method for

How To Read An AI Image

Ghosts in Text Generation?

generate text with the same prompt and same settings; only thing that changes is the initial random noise
text analysis to surface the 'attractors' that pull the text one way or the other
simulate conversation between models and examine the discourses

Take a copy of the notebook here.

Wrapping Up

same principle - different scale

https://www.youtube.com/watch?v=bxXdGBSDCHQ

DIY run-of-stream hydro electricity rig

https://en.wikipedia.org/wiki/File:Hoover_dam_from_air.jpg

Teaching Practical Necromancy

Is a method for dealing with the ghosts of data
Gives permission to experiment
Embraces productive mistakes
Resists hype

Digital Archaeology Requires Enchantment

Reflection
Playfulness
Attention to impact
Attention to the uncanny and its intrusions
Which will allow us to focus on our digital golems: archaeology has much to say about this current moment

https://archaeo.social/@mrundkvist/112447233936751700

Martin Rundkvist
mrundkvist@archaeo.social

Back when #BigData was the fashionable buzz word, I repeatedly had to explain to enthusiasts that archaeological data are not just Big, they are Confused and Patchy and Hairy.

I can't really see how the current generative algorithms could make me obsolete or even speed up much of the work I do. Because I'm in this really niche activity with no commercial potential that demands constant engagement with wildly non-standardised data as well as creative writing about them.

He's right! all this is slow archaeology; this tech expands limited capabilities when you know which ghosts are haunting it. But that takes a lot to figure out.

BONUS Small Things, Made with Love

Homebrewed software
Specific purposes
Ethical applications
Story-telling power
Datasets
On Home Cooked Software for the Archaeologist or Historian
https://github.com/shawngraham/homecooked-history

curo, curare, curavi, curatum

An example: A Field Notes to Knowledge Graph Pipeline

https://github.com/shawngraham/text-to-kg

sketched the desired workflow
sketched the desired interface
small model to translate into appropriate html
small model to encode the individual desired functions

Personal Image Search

https://github.com/shawngraham/personal-image-search-engine

LLM are no good for specific information retrieval; the ghosts push towards the mean, the average, so you get plausible text, not correct text.

This same property does permit searching by 'vibe' or 'similarity'

A word of thanks to my students

This term, we have spent several weeks dispelling hype, opening the hood, poking at the models, interrogating the ghosts, discussing the harms, and trying to figure out what good any of these things are. I am indebted to their goodwill and good humour.

Thanks, gang.

And Thank You

You are welcome to take, use, re-use, critique, expand, tear-apart, re-build, improve, sneer at, any and all code of mine that I've shared today.

Practical Necromancy

By Shawn Graham

Practical Necromancy

Shawn Graham

Catch me on Scholar Social https://scholar.social/@electricarchaeo

electricarchaeology.ca

Practical Necromancy for Beginners

An archaeology of ghosts

So I wanted to build a simulation...

The Department of Postmortem Communications

Lesson 1. Correct Use of Shovel

Practical Necromancy begins with networks

Neural Networks

a potted history

LLM are a subset of machine learning.

Data Repositories

Data Cemeteries

Part One

We become ghosts through decomposition

datasets & ghosts

The cost of training

ChatGPT & The Eliza Effect

Part Two

The TESCREAL Bundle

sounds like religion to me.

Gods Interns

Cogs

Part Three

Two Kinds of Necromancy

Impractical Necromancy

Practical Necromancy

Three Notebooks

Homework!

We're going to resurrect Flinders Petrie

now we're going to surface some ghosts in the training data

Wrapping Up

Teaching Practical Necromancy

Digital Archaeology Requires Enchantment

BONUS Small Things, Made with Love

curo, curare, curavi, curatum

An example: A Field Notes to Knowledge Graph Pipeline

Personal Image Search

A word of thanks to my students

And Thank You

Practical Necromancy

More from Shawn Graham

Gods
Interns