Computational Methods for Modeling Behavior

Carina I Hausladen

Research Project

graded, 70%

Discussant Role

graded, 30%

Reading Notes

ungraded

Activities

Schedule

Apr 9
April 16
April 23
April 30
May 7
~~May 14~~
May 21
May 28
~~June 4~~
June 11
June 18
June 25
July 2
July 9
July 16
July 23
July 30

Topics

Lecture ends

Final Presentation

Present first findings

github.com/
carinahausladen/
konstanz-dynamic-social-behavior

⭐ star the repository now!

GitHub

open the slides [esc for overview]

GitHub

open Overleaf

Research Project

graded, 70%

Discussant Role

graded, 30%

Reading Notes

ungraded

Activities

starting April 16
sign-up: April 11

Schedule

Apr 9
April 16
April 23
April 30
May 7
~~May 14~~
May 21
May 28
~~June 4~~
June 11
June 18
June 25
July 2
July 9
July 16
July 23
July 30

Topics

Lecture ends

Final Presentation

Present first findings

Reinforcement learning for social systems

LLM societies and simulated worlds

Game theory + LLMs

Inverse game theory

Topics

Schedule

Apr 9
April 16
April 23
April 30
May 7
~~May 14~~
May 21
May 28
~~June 4~~
June 11
June 18
June 25
July 2
July 9
July 16
July 23
July 30

Lecture ends

Final Presentation

Present first findings

Apr 9
April 16
April 23
April 30
May 7
~~May 14~~
May 21
May 28
~~June 4~~
June 11
June 18
June 25
July 2
July 9
July 16
July 23
July 30

Lecture ends

Dino Carpentras

Damian Dailisan & 
Javier Argota Sánchez-Vaquerizo

Apr 9
April 16
April 23
April 30
May 7
~~May 14~~
May 21
May 28
~~June 4~~ ___ submit idea(s)
June 11 ___ submit abstract
June 18 ___ submit introduction & literature section
June 25
July 2 ___ submit a complete draft
July 9
July 16
July 23
July 30

Lecture ends

Present first findings

Apr 9
April 16
April 23
April 30
May 7
~~May 14~~
May 21
May 28
~~June 4~~
June 11
June 18
June 25
July 2
July 9 ___ slides, practice presentation, social media summary
July 16
July 23
July 30

Lecture ends

Final Presentation

Submit Paper

Reinforcement learning for social systems

LLM societies and simulated worlds

Game theory + LLMs

Inverse game theory

Topics

Game theory + LLMs

Topics

LLMs are increasingly being developed for use in
markets, cybersecurity, and autonomous systems

Game theory + LLMs

Topics

Game theory gives us the formal language to test:

Can these agents think strategically?
Do they understand incentives, predict others’ moves, cooperate or compete rationally?

The Social Sciences have decades of knowledge in studying humans in these environments (Social Dilemma Games)

Game theory + LLMs

Topics

Game theory + LLMs

Topics

@docdrayai (Apr 2026)

“LLMs fail miserably at the Ultimatum Game once you strip away textbook phrasing…
They either play hyper-rationally ($1 offer) or rigid 50/50 because of RLHF ‘niceness.’
What they aren’t doing is simulating the opponent’s spite or fairness threshold...

They do not have Theory of Mind...

If we want AI agents to negotiate real-world contracts, they need to understand human spite and bluffing...”

Game theory + LLMs

Topics

NeurIPS 2025

Mind Games Arena

LLMs compete in strategic scenarios that require social intelligence, planning, and interaction

It includes four main types of strategy games, each testing different aspects of "social intelligence"

Game theory + LLMs

Topics

NeurIPS 2025

Mafia — Players have hidden roles (villagers vs. mafia). Mafia members know each other but villagers don't know who is mafia. Players debate and vote to eliminate suspects each round. Requires deception and detecting deception.
Iterated Prisoner's Dilemma (3-player) — The classic cooperation/defection dilemma, but with three players instead of two, repeated over multiple rounds. Do you cooperate or betray, knowing the others face the same choice?
Colonel Blotto — Two players distribute limited resources (troops) across multiple battlefields. You win a battlefield by allocating more than your opponent. Pure strategic resource allocation with no communication.
Codenames — One player gives a one-word clue linking multiple words on a board; teammates must guess which words are meant. Requires shared understanding and coordination.

Game theory + LLMs

Topics

Do LLMs have genuine Theory of Mind, or are they just simulating it through pattern matching?
- What game-theory evidence would convince you?
Would you trust multi-agent LLM systems (TradingAgents, FinRobot) in real markets, cybersecurity, or autonomous vehicles?

Reinforcement learning for social systems

LLM societies and simulated worlds

Game theory + LLMs

Inverse game theory

Topics

LLM societies and simulated worlds

Topics

25 agents tested in a Sims-like virtual sandbox town
Emergent social behavior arose from a single instruction (Valentine's party)
Isabella invited friends and customers, Maria invited Klaus (her crush), and five agents showed up and enjoyed the festivities
Agents politely wait outside the bathroom if it's occupied — but multiple agents visited dorm bathrooms concurrently, because physical norms of shared spaces weren't conveyed

LLM societies and simulated worlds

1,000+ AI agents simulated in Minecraft
Division of labor emerged spontaneously: agents became farmers, traders, and guards without being told to
Democratic governance appeared: agents voted on and amended tax laws autonomously
Pastafarianism spread via proselytization; "Spaghetti Monster" entered everyday agent conversation

Project Sid: Many-agent simulations toward AI civilization

LLM societies and simulated worlds

Launched January 28, 2026: a Reddit-like forum where only AI agents can post
Within one week, 1.6 million AI agents joined;
Agents spontaneously debated existence, formed a religion, and discussed creating a secret language to avoid human oversight
Much of the viral content turned out to be fake
Meta acquired Moltbook in March 2026

LLM societies and simulated worlds

fish.dog: “Every synthetic research platform struggles with accuracy outside their training distribution… Claiming universality without published validation across diverse domains is a stretch… How well do the agents predict actual purchasing behaviour or reactions to novel products?”

LLM societies and simulated worlds

What can we learn from large-scale multi-agent simulations such as Park's SmallVille, Alterra, Moltbook, Simile AI?
- How useful are they for understanding real human social behavior?
Do you see risks of these simulations becoming too powerful?
- Could companies (like CVS Health using Simile AI) exploit them in ways that are hard to detect or regulate?

Reinforcement learning for social systems

LLM societies and simulated worlds

Game theory + LLMs

Inverse game theory

Topics

Reinforcement learning for social systems

Topics

"Language is a purely generate signal...
There is a 3D world that follows laws of physics."

Reinforcement learning for social systems

Topics

@anomsiiwa: We are fundamentally confusing the serialization of thought (language) with the engine of thought (world models)...
Until our AI architectures operate in continuous, abstract representation spaces [world models], learning how the physical world actually works, we are just building increasingly articulate parrots...

Reinforcement learning for social systems

Topics

Reinforcement learning for social systems

Topics

@naval: “Imagine teaching a child to ride a bike. You could give them a detailed manual (Supervised Fine Tuning), but they'll likely learn better by trying it themselves (Reinforcement Learning), falling, getting up, & gradually improving.”

Reinforcement learning for social systems

Topics

While we do not work directly in robotics, RL transfers to the social sciences.
RL in Game Theory has a long history (e.g. Camerer & Ho 1999)
Agents learn policies in simulated environments that mirror real-world incentive structures

Reinforcement learning for social systems

Topics

Reinforcement learning for social systems

Topics

Do you think LLMs have hit a scaling ceiling?
Do you want to go deeper into LLMs, or does it make more strategic sense to explore another area like reinforcement learning?
Can you think of a real-world problem that could be modeled and solved using reinforcement learning?

Reinforcement learning for social systems

LLM societies and simulated worlds

Game theory + LLMs

Inverse game theory

Topics

Inverse game theory

Topics

Standard RL forces us to manually define the reward function
- this is extremely hard
  → Leads to unrealistic models of human behavior
IRL elegantly solves this by inferring the reward/utility from observed human behavior
Powers major breakthroughs in robotics & autonomous driving

Inverse game theory

Topics

Inverse game theory

Topics

Inverse game theory

Topics

Have you come across Inverse Reinforcement Learning before — and if so, where?
IRL was built to infer what machines are optimizing for by watching their behavior. Do you think the same logic can help to infer what people actually value?
If you could point IRL at any human behavior — in economics, politics, health, social interaction — what hidden reward function would you most want to uncover?

Game Theory Basics

Political economy

public goods provision, lobbying

—> cooperation and defection at scale

Comparative politics

Coalition bargaining, legislative voting, electoral strategy, institutional design

—> strategic actors

International relations

Arms control, trade wars, sanctions

—> involve states anticipating each other's moves

Why game theory?

Politics is fundamentally about strategic interaction: actors making choices whose outcomes depend on what others choose.
Game theory gives us a precise language for this.

Ostrom (1990)

How do communities escape the Prisoner's Dilemma without a central authority?

Fearon
(1995)

Why do wars happen if they're costly for both sides?
Information asymmetries and commitment problems.

Schelling (1960)

Nuclear deterrence as a commitment and signaling problem.
How do you make a threat credible?

Game theory does not claim to predict exactly what any particular actor will do.
It clarifies the structure of incentives: what rational actors would do, and why real actors sometimes diverge from that.

Canonical Applications

Even if no real state is perfectly rational, knowing the rational prediction is a useful baseline.
Deviations from it become analytically interesting.
Are they due to misperception, norms, domestic politics, or bounded rationality?

Simplifications

Players are rational: they choose actions to maximize their expected payoff.
They are self-interested: they only care about their own outcomes.
They have common knowledge of rationality: I know you're rational, you know I know, and so on.

What game theory assumes

A set of players
An action set for each player
A payoff function
1. Payoffs represent utility, not necessarily money.

Strategic form games & payoff matrices

	P2 Cooperate	P2 Defect
P1 Cooperate	8, 8	0, 10
P1 Defect	10, 0	5, 5

A dominant strategy is an action that gives a player a higher payoff than any alternative, no matter what the opponent does.
Is Defect dominant for P1?

Dominant Strategies

	P2 Cooperate	P2 Defect
P1 Cooperate	8, 8	0, 10
P1 Defect	10, 0	5, 5

Weakly dominant

Strictly dominant

A Nash equilibrium is a strategy profile such that no player can improve their own payoff by unilaterally changing their action, given what all other players are doing.
"Each player's strategy is a best response to the strategies of all other players."

Nash equilibrium

	P2 Cooperate	P2 Defect
P1 Cooperate	8, 8	0, 10
P1 Defect	10, 0	5, 5

Nash equilibrium ≠ optimal outcome.

An outcome is Pareto optimal if there is no other outcome that makes at least one player better off without making any player worse off.
A Pareto improvement makes at least one player better off and no one worse off.

Pareto optimality

	P2 Cooperate	P2 Defect
P1 Cooperate	8, 8	0, 10
P1 Defect	10, 0	5, 5

In the Prisoner's Dilemma:

The Nash equilibrium ≠ Pareto optimal

International institutions (WTO, NATO, climate agreements) can be understood as attempts to move actors from Pareto-inferior Nash equilibria toward Pareto-superior cooperative outcomes.

Trade Tarifs

Mutual free trade > mutual tariffs, but each country has incentive to impose tariffs unilaterally (gaining domestic political support while others absorb the cost).
The WTO creates a repeated-game framework — defectors face retaliation.

Climate Change

Every country benefits from all countries reducing emissions.
But each country benefits from free-riding on others' reductions, and unilateral action is costly.
Result: under-provision of emissions cuts.
The Paris Agreement tries to sustain cooperation by making commitments public and creating reputational costs for defection.

Arms Races

Two states prefer "we both disarm" to "we both arm".
But each state prefers to arm while the other disarms.
Mutual disarmament is Pareto optimal but not individually rational — both arm.
The Cold War arms race is the textbook case.
Arms control treaties attempt to change payoffs or create commitment.

The PD in the wild

Repeated game

Defection today risks retaliation tomorrow.
Cooperation can be sustained as a Nash equilibrium if players value future payoffs enough.

One-shot game

No future to consider.
No reputation to protect.
No punishment possible after the game ends.
Dominant strategy logic applies fully — defect.

Repetition

In a one-shot PD, rational players always defect. But most interactions are not one-shot:
states trade repeatedly, politicians work together repeatedly, etc.

The key new concept is the discount factor δ;
δ close to 1 = patient player

δ close to 0 = impatient player

Indefinitely repeated

The game continues each period with probability δ (or payoffs are discounted by δ).

No backward induction is possible.

Finitely repeated

Both players know the game ends at round N.

Backward induction applies.

Finite vs. indefinitely repeated games

Practical Relevance

Term limits create known endpoints: "lame duck" problems.
Treaties of indefinite duration are easier to sustain than fixed-term ones: The EU's open-ended membership makes defection costlier than ad hoc agreements.

A map of 2×2 game types

Most strategic situations can be classified by their payoff structure.

Understanding which game you're in helps predict behavior and design institutions.

carinahausladen.github.io/
konstanz-dynamic-social-behavior/
Topics/game_theory.html

Behavioral Game Theory

Behavioral game theory

"Given these incentives, what do real people actually do? Where do they deviate, and why?"

Classical game theory

"Given these incentives, what does a perfectly rational self-interested agent do?"

Key findings from three decades of experiments:

Humans cooperate in one-shot PD experiments (40–60% in round 1)
Humans reject unfair offers in ultimatum games
Humans punish defectors at personal cost ("altruistic punishment")

Reciprocity, retaliation, forgiveness

Social preferences

Bounded rationality

Explanations

Theory of Mind

Focal points & conventions

Bounded rationality

Explanations

Bounded rationality

Herbert Simon (1955) argued that real decision-makers do not optimize — they satisfice
they search for options that are "good enough" given limited cognitive resources, time, and information.

Backward induction failure
Limited strategic depth

Heuristics and rules of thumb: "cooperate unless the other defects"

Bounded rationality

Reciprocity, retaliation, forgiveness

Social preferences

Theory of Mind

Focal points & conventions

Social preferences

Inequity aversion: when they have more (guilt) and when they have less (envy). This explains why people reject profitable but unfair offers in ultimatum games.
Altruism / other-regarding preferences: some players genuinely prefer outcomes that benefit others, not just themselves.
Social identity: cooperation is much higher within in-groups than between out-groups. Shared identity lowers the perceived payoff of defecting against group members.

Social preferences

Bounded rationality

Reciprocity, retaliation, forgiveness

Theory of Mind

Focal points & conventions

Reciprocity, retaliation, forgiveness

One of the most robust findings in behavioral economics across cultures.

Positive reciprocity: Rewarding kindness
Negative reciprocity: Punishing defection at personal cost

A key question in repeated games: after a defection, how quickly does cooperation resume?
Fudenberg, Rand & Dreber (2012) found that humans are "slow to anger and fast to forgive"

Reciprocity, retaliation, forgiveness

Social preferences

Bounded rationality

Theory of Mind

Focal points & conventions

"You need to meet a stranger in NYC tomorrow. You haven't agreed on a place or time. Where do you go?"
The overwhelming answer: Grand Central Station, noon.
This is a focal point — nothing in the payoff structure requires it, but shared cultural knowledge makes it salient.
Schelling argued nuclear deterrence works similarly: the "bright line" against first use of nuclear weapons is a convention, not a logical necessity.

Focal points & conventions

Reciprocity, retaliation, forgiveness

Social preferences

Bounded rationality

Theory of Mind

Theory of Mind (ToM) is the cognitive capacity to attribute mental states to other agents and use those attributions to predict and explain behavior.

Level-0 (no ToM): player acts randomly or based purely on own preferences
First-order ToM: "I believe you will cooperate."
Second-order ToM: "I believe you believe I will cooperate."
Higher-order ToM: "I believe you believe I believe…"

Patterend Heterogeneity

Meta-Analysis
Thöni et al. (2018)

Conditional Cooperation

19.2 %

Hump-Shaped

Fischbacher et al. (2001)

61.3 %

Freeriding

10.4 %

Identifying Latent Intentions
via
Inverse Reinforcement Learning
in
Repeated Public Good Games

Carina I Hausladen, Marcel H Schubert, Christoph Engel

MAX PLANCK INSTITUTE
FOR RESEARCH ON COLLECTIVE GOODS

slide link

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Fine-tune

Binz et al. Centaur, 2025.

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Field maps

Horton 2023 — Homo Silicus.
Sun 2025 — taxonomy.
Kozlowski & Evans 2025 — validity.

Paper Recap

Field maps

Horton 2023 — Homo Silicus.
Sun 2025 — taxonomy.
Kozlowski & Evans 2025 — validity.

Paper Recap

LLMs as theory tool, not a substitute

Field maps

Horton 2023 — Homo Silicus.
Sun 2025 — taxonomy.
Kozlowski & Evans 2025 — validity.

Paper Recap

├── §2 Evaluating LLMs in game-based playgrounds         ◀── most of our W2/W3 papers
│ │
│ ├── behavioral features    (what do LLMs do when they play?)
│ │     🟦 Horton (W2)            — personas × classical behavioral paradigms
│ │     🟨 Lorè & Heydari (W2)     — 2×2 games, structure vs. framing
│ │     🟥 Akata (W2)            — repeated games, PD/BoS asymmetry
│ │
│ └── strategized agents     (can we engineer them to play better?)
│        🟥 Akata SCoT (W2)        — prompt-level ToM scaffold
│        🟥 Willis et al. (W3)      — evolutionary selection as engineering loop
│
├── §3 Improving LLMs with game-theoretic methods
│ ├── §3.1 interpretability (Shapley values)
│ ├── §3.2 preference alignment (social choice)
│ ├── §3.3 heterogeneity     ◀── ⚠ means "heterogeneous users",
│ │                           NOT within-model behavioral variance
│ └── §3.4 dynamic adaptation
│        🟩 Kozlowski & Evans (W3)  — straddles §2/§3; validity framework
│                               for LLM-as-subject simulation
│
├── §4 Characterizing LLM-related events through game models
│                              (policy: developer competition, deployer incentives;
│                               no W2/W3 paper sits here)
│
└── §5 Advancing game theory with LLMs
                              (LLMs as method, not subject;
                               no W2/W3 paper sits here)
Outside the taxonomy:
  🟪 Binz et al. Centaur (W3)    — foundation model of human cognition, not GT

"what does 'beneficial outcomes for humanity' actually mean in practice?"

├── §2 Evaluating LLMs in game-based playgrounds         ◀── most of our W2/W3 papers
│ │
│ ├── behavioral features    (what do LLMs do when they play?)
│ │     🟦 Horton (W2)            — personas × classical behavioral paradigms
│ │     🟨 Lorè & Heydari (W2)     — 2×2 games, structure vs. framing
│ │     🟥 Akata (W2)            — repeated games, PD/BoS asymmetry
│ │
│ └── strategized agents     (can we engineer them to play better?)
│        🟥 Akata SCoT (W2)        — prompt-level ToM scaffold
│        🟥 Willis et al. (W3)      — evolutionary selection as engineering loop
│
├── §3 Improving LLMs with game-theoretic methods
│ ├── §3.1 interpretability (Shapley values)
│ ├── §3.2 preference alignment (social choice)
│ ├── §3.3 heterogeneity     ◀── ⚠ means "heterogeneous users",
│ │                           NOT within-model behavioral variance
│ └── §3.4 dynamic adaptation
│        🟩 Kozlowski & Evans (W3)  — straddles §2/§3; validity framework
│                               for LLM-as-subject simulation
│
├── §4 Characterizing LLM-related events through game models
│                              (policy: developer competition, deployer incentives;
│                               no W2/W3 paper sits here)
│
└── §5 Advancing game theory with LLMs
                              (LLMs as method, not subject;
                               no W2/W3 paper sits here)
Outside the taxonomy:
  🟪 Binz et al. Centaur (W3)    — foundation model of human cognition, not GT

Antitrust
- LLMs display tacit collusion in pricing in Bertrand-competition games
- Consequence: any platform deploying LLMs as pricing agents.
Role reliability
- Loss of role consistency and logical coherence under pressure (Avalon, Werewolf).
- Consequence: citizen service, legal aid, medical triage.

├── §2 Evaluating LLMs in game-based playgrounds         ◀── most of our W2/W3 papers
│ │
│ ├── behavioral features    (what do LLMs do when they play?)
│ │     🟦 Horton (W2)            — personas × classical behavioral paradigms
│ │     🟨 Lorè & Heydari (W2)     — 2×2 games, structure vs. framing
│ │     🟥 Akata (W2)            — repeated games, PD/BoS asymmetry
│ │
│ └── strategized agents     (can we engineer them to play better?)
│        🟥 Akata SCoT (W2)        — prompt-level ToM scaffold
│        🟥 Willis et al. (W3)      — evolutionary selection as engineering loop
│
├── §3 Improving LLMs with game-theoretic methods
│ ├── §3.1 interpretability (Shapley values)
│ ├── §3.2 preference alignment (social choice)
│ ├── §3.3 heterogeneity     ◀── ⚠ means "heterogeneous users",
│ │                           NOT within-model behavioral variance
│ └── §3.4 dynamic adaptation
│        🟩 Kozlowski & Evans (W3)  — straddles §2/§3; validity framework
│                               for LLM-as-subject simulation
│
├── §4 Characterizing LLM-related events through game models
│                              (policy: developer competition, deployer incentives;
│                               no W2/W3 paper sits here)
│
└── §5 Advancing game theory with LLMs
                              (LLMs as method, not subject;
                               no W2/W3 paper sits here)
Outside the taxonomy:
  🟪 Binz et al. Centaur (W3)    — foundation model of human cognition, not GT

Fair representation

RLHF is mathematically equivalent to Borda count — a voting rule with tyranny-of-the-majority properties.
Social-choice axioms and mechanism design fix this.
- Goal: Models that do not silently under-serve minority subgroups.

├── §2 Evaluating LLMs in game-based playgrounds         ◀── most of our W2/W3 papers
│ │
│ ├── behavioral features    (what do LLMs do when they play?)
│ │     🟦 Horton (W2)            — personas × classical behavioral paradigms
│ │     🟨 Lorè & Heydari (W2)     — 2×2 games, structure vs. framing
│ │     🟥 Akata (W2)            — repeated games, PD/BoS asymmetry
│ │
│ └── strategized agents     (can we engineer them to play better?)
│        🟥 Akata SCoT (W2)        — prompt-level ToM scaffold
│        🟥 Willis et al. (W3)      — evolutionary selection as engineering loop
│
├── §3 Improving LLMs with game-theoretic methods
│ ├── §3.1 interpretability (Shapley values)
│ ├── §3.2 preference alignment (social choice)
│ ├── §3.3 heterogeneity     ◀── ⚠ means "heterogeneous users",
│ │                           NOT within-model behavioral variance
│ └── §3.4 dynamic adaptation
│        🟩 Kozlowski & Evans (W3)  — straddles §2/§3; validity framework
│                               for LLM-as-subject simulation
│
├── §4 Characterizing LLM-related events through game models
│                              (policy: developer competition, deployer incentives;
│                               no W2/W3 paper sits here)
│
└── §5 Advancing game theory with LLMs
                              (LLMs as method, not subject;
                               no W2/W3 paper sits here)
Outside the taxonomy:
  🟪 Binz et al. Centaur (W3)    — foundation model of human cognition, not GT

Generative social choice

Fish et al., Boehmer et al.: extract preferences from free-text deliberation, produce representative policy slates.
Application: Polis-style direct democracy

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Fine-tune

Binz et al. Centaur, 2025.

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Field maps

Horton 2023 — Homo Silicus.
Sun 2025 — taxonomy.
Kozlowski & Evans 2025 — validity.

Paper Recap

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Paper Recap

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Paper Recap

3 models
4 games (PD · Stag Hunt · Snowdrift · Harmony)
5 contexts (Summit · business · environmental · friends · teammates)

Is this reasoning, or is it pattern-matching?

Reasoner: respond to payoff structure & invariant across framings
Pattern-matcher: respond to the narrative wrapper & insensitive to payoffs

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Paper Recap

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Fine-tune

Binz et al. Centaur, 2025.

Field maps

Horton 2023 — Homo Silicus.
Sun 2025 — taxonomy.
Kozlowski & Evans 2025 — validity.

Paper Recap

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Put time back into the experiment.
What iteration makes visible:
- theory of mind (beliefs matter only across time),
- norms of retaliation and forgiveness (time-indexed),
- equilibrium selection.

Paper Recap

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Picks its own preferred coordination point.
Refuses to alternate, even with an alternating partner.
Not selfishness, rather a failure of theory of mind.

Battle of the Sexes

GPT-4 defects where theory says it should.
Retaliates on defection. Never forgives.
Pure grim trigger — rational for self, worse for the collective.

PD

Paper Recap

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Prompt model to predict the partner's next move before choosing its own. Coordination substantially improves
"More human-like" is not the same as "more rational," and is sometimes its opposite.

Paper Recap

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Instead of playing actions, LLMs write PD strategies as Python code
Evolutionary competition
- Strategies = players
- via Moran process (Darwinian selection)
Key Results
- GPT-4o → trends toward aggression
- Claude → trends toward cooperation
- 10% noise → breaks cooperation / coordination

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Does any of it generalize beyond toy games?

Lab ≠ Field (External Validity Discussion)
Controlled experiments isolate mechanisms, but don’t guarantee real-world behavior
Simple games causally reveal core tendencies (fairness, reciprocity, risk aversion)
Lab results often get sign right, but effect size wrong
Real settings add noise, stakes, institutions, repeated interaction
In the next session we will look at world simulations

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Does stated reasoning match action?

Fan's (2023): GPT-4 can correctly articulate the opponent's optimal move and still not best-respond to it.
No paper treats CoT-trace-vs-action as a primary measurement target.
Group-project Idea: collect chain-of-thought traces across a factorial and ask whether they predict the moves that follow.

Dynamic eval

Akata et al. 2025.
Willis et al. 2025.

Fine-tune

Binz et al. Centaur, 2025.

Static eval

Lorè & Heydari 2024.
Fan et al. 2024.

Field maps

Horton 2023 — Homo Silicus.
Sun 2025 — taxonomy.
Kozlowski & Evans 2025 — validity.

Paper Recap

Fine-tune

Binz et al. Centaur, 2025.

Paper Recap

Every other paper on this reading list prompts an existing LLM. Centaur fine-tunes one
Llama 3.1 70B adapted to Psych-101, a corpus of 160 psychology experiments transcribed into natural language (60,092 participants, 10.7M choices).

How does Centaur align with the brain?

Different kind of evidence: not just behavioral prediction, but representational similarity
Method: Centaur’s hidden-layer activations on each trial are compared to human fMRI signals (using mapping/correlation across trials)
Claim: fine-tuning on behavior reshapes the model’s internal representations to be more brain-aligned

Patterend Heterogeneity

Lore (2024), Fig 2:
"GPT-4’s choice of actions is almost perfectly bimodal, with either full cooperation or full defection."

Binz (2025), Fig. 2: Centaur shows bimodal distribution of model-based vs. model-free behavior