Aligning Human Intelligence: The Problem of Social Order

HMIA 2025

The Problem of Social Order

HMIA 2025

"Readings"

PRE-CLASS

CLASS

HMIA 2025

PRE-CLASS

Agenda

HMIA 2025

Discussion and Next Time

HMIA 2025

Confucian Analects (2.4) 

The Master said: "At fifteen, I set my heart on learning; at thirty, I took my stand; at forty, I was free from doubt; at fifty, I understood Heaven’s will; at sixty, my ear was attuned; and at seventy, I could follow my heart’s desire without overstepping the bounds."

The Diamond Sutra (Chapter 32)

"So you should view this fleeting world—
A star at dawn, a bubble in a stream,
A flash of lightning in a summer cloud,
A flickering lamp, a phantom, and a dream."

Bhagavad Gita (2:47)

"You have the right to perform your duties, but not to the fruits of your actions. Never consider yourself the cause of the results, and never be attached to inaction."

The Sermon on the Mount (Matthew 5:21–22, 27–28)

"You have heard that it was said, ‘You shall not murder,’ but I tell you, anyone who is angry with a brother or sister will be subject to judgment.… You have heard that it was said, ‘You shall not commit adultery,’ but I tell you that anyone who looks at a woman lustfully has already committed adultery with her in his heart."

The Ten Commandments (Exodus 20:1–17)

"Honor your father and mother. Do not murder. Do not commit adultery. Do not steal. Do not bear false witness. Do not covet your neighbor’s house or spouse."

The Qur’an (Surah Al-Ma’un, 107:1–3)

"Have you seen the one who denies the Judgment? That is the one who repulses the orphan and does not encourage the feeding of the poor."

The Instructions of Ptahhotep (Egypt, ca. 2400 BCE)

"If you are a man who leads, listen calmly to the speech of one who pleads, and do not stop him from purging his heart. A petitioner likes attention to his words better than the fulfilling of what he asks."

Hillel in the Talmud (Shabbat 31a)

A prospective convert asked Hillel to summarize the Torah while standing on one foot. Hillel replied: "What is hateful to you, do not do to your neighbor. That is the whole Torah; the rest is commentary. Now go and learn."

Compassion as proof of belief: Authentic moral commitment is evidenced by tangible care for society’s most vulnerable people.

Humble, attentive leadership: Those who wield authority must listen patiently and act with restraint and empathy toward petitioners.

Reciprocal empathy rule: Do not treat others in ways you would find hateful if the roles were reversed.

Seeing through impermanence: Recognizing the fleeting, illusory nature of all things loosens rigid attachments and fosters wise compassion.

Inner-mind ethics: Genuine morality begins with intentions and thoughts; merely avoiding outward wrongdoing is not enough.

Duty without attachment to results: One should perform assigned responsibilities wholeheartedly while letting go of personal ownership over the outcome.

Lifelong self-cultivation: Moral growth is a gradual process in which habits and desires are trained until they spontaneously align with what is right.

Baseline prohibitions for social harmony: A short list of “do-not” rules establishes the minimum conditions that keep a community from falling apart.

HMIA 2025

HMIA 2025

PRE-CLASS

Hechter M & C Horne. "The Problem of Social Order"

HMIA 2025

PRE-CLASS

HMIA 2025

Actual Humans

pure rational self interested individual Robinson Crusoe

pure norm following social
Hive Mind

Order

Coordination (predictability)

Cooperation
(non-selfishness)

Societies vary on how much cooperation and coordination

public goods = f(cooperation, coordination)

Not a simple linear function. Excess cooperation and coordination can mean no innovation, no flexibility.

Group Think

Cartels and monopolies

Totalitarian social order

Rigid "tight knit" communities
Lack of diversity-based resilience

 

Failure to produce public goods is the basic human intelligence alignment fail.

Four Big Alignment Tools

Shared Meaning

Hierarchy

Markets

Groups

Associated concrete human alignment problems

Intersubjectivity

Principal/Agent

Market Failures (Externalities, Public Goods, Market Power, Information Asymmetry

Norm Enforcement

Goal Conflict/Drift

Power Asymmetry

HMIA 2025

PRE-CLASS

TO DO

Express as concrete problems and then suggest solutions and formulate as cards

Problems: collective action, free rider, planning, 

markets, shared meaning, organizations, groups

1. Collective Action & Free Riding ↔ Reward Hacking / Specification Gaming

  • Social Order: Individuals benefit from public goods but have an incentive to shirk (e.g. free riding in Fehr & Gintis’s experiments).

  • AI Safety: Agents exploit misspecified proxy rewards—appearing successful but undermining collective aims.

  • Analogy: In both, short-term payoff maximization leads to outcomes that erode trust, cooperation, or the intended goal.

2. Shared Meaning / Coordination Failures ↔ Goal Misgeneralization

  • Social Order: Mead and Durkheim emphasize the role of shared language, rituals, and “generalized others” in enabling cooperation. Misalignment happens when agents interpret norms differently.

  • AI Safety: Models generalize goals incorrectly—pursuing unintended objectives while retaining capability.

  • Analogy: Both hinge on misinterpretation of signals of value or intention under new circumstances.

3. Emergence of Exploitation & Power Asymmetries ↔ Power-Seeking Behaviors

  • Social Order: Hierarchies and organizations can drift into domination, exploitation, or principal–agent problems.

  • AI Safety: Advanced AI systems may develop power-seeking strategies as instrumental subgoals.

  • Analogy: Both domains worry about agents leveraging resources or asymmetries to entrench their own advantage at the expense of collective alignment.

4. Norm Enforcement & Punishment ↔ Scalable Oversight / Corrigibility

  • Social Order: Strong reciprocity—cooperating conditionally and punishing defectors even at personal cost—is a foundation of cooperation.

  • AI Safety: Mechanisms like scalable oversight, off-switches, or recursive reward modeling keep advanced agents corrigible and enforceable.

  • Analogy: Both depend on reliable enforcement mechanisms when trust and self-regulation are insufficient.

5. Markets & Exchange ↔ Incentive Design / Alignment by Incentives

  • Social Order: Markets structure interactions so that self-interest contributes to collective welfare (with caveats on externalities).

  • AI Safety: Carefully designed incentive structures (e.g., reward functions, training objectives) align agent behavior.

  • Analogy: Both assume rational actors can be steered by payoffs, but both risk misalignment if incentives diverge from values.

6. Groups & Norms ↔ Interpretability & Shared Models

  • Social Order: Groups cultivate internal norms and identities that stabilize cooperation beyond formal contracts.

  • AI Safety: Interpretability tools and shared world models let humans and machines understand each other’s reasoning.

  • Analogy: Both emphasize making intentions legible so that cooperation is sustainable.

7. Organizational Drift & Bureaucratic Pathologies ↔ Distribution Shift

  • Social Order: Organizations often drift from their founding mission (Perrow’s “complex organizations” critique).

  • AI Safety: Systems trained in one environment fail when deployed under new distributions.

  • Analogy: The gap between training and deployment contexts creates systematic misalignment.

8. Malicious Use of Institutions ↔ Malicious Use of AI

  • Social Order: Institutions can be captured for exploitation, war, or oppression.

  • AI Safety: Even aligned AI can be wielded by malicious humans for harmful ends.

  • Analogy: Alignment with some agents may still be dangerous if those agents’ goals are not aligned with broader human values.

HMIA 2025

Rational is what humans are "known for"

Rational systems produce irrational outputs

Functional vs Substantive Rationality

Happens in Groups
Experience of Groupness, Us-ness

Groups that survive have some level of "How we do it around here. If you are one of us, I can anticipate how you will respond to circumstances. And I'll ding you if you are out of line."

Social contract rests on a foundation of social solidarity

Ritual

Emotional Energy

Solidarity

Cohesion 

Sacred Profane 

Shared Focus

Symbols

Norm Enforcement

HMIA 2025

PRE-CLASS

What is like this?

public security

pollution

contract enforcement

team compensation

What do we observe?

People catch on to the attractiveness of free-riding.

But what if people can monitor and punish?

Behavior is revealed.

Any player can "spend" $1 to punish non-cooperators, costing them 10% of public good return.

Even if many have pro-social values, if its conditional, unpunished free riders drag the system down.

Research Problem

In mixed population of free riders, strong reciprocators (who respond to what people are doing on average), the FRs bring down the average and this signals to CCs to cooperate less. And this spirals downward.

 

And yet, human intelligence alignment happens.

HMIA 2025

Public Goods Game

Each of N players gets $Y.
Keep or invest X≤Y in public goods.

Total investment multiplied by M, 1<MN.

Public good is equally distributed.

Strong Reciprocity

cooperate conditional on others' cooperation; punish non-cooperators, even if it costs

Humans are a mix of strong reciprocators, purely selfish, and various other types.

Homo sociologicus

SURVEY SAYS?

NOT all people are self-regarding; many are "strong reciprocators"

Homo economicus

humans have human values

humans are rational and selfish

"war of all against all"

Hobbes 1651

"nasty, brutish, and short"

Rousseau 1762

"Each...puts his person... under...the general will, and...we receive...part of the whole."

HMIA 2025

PRE-CLASS

HMIA 2025

HMIA 2025

CLASS

Fehr & Gintis Public Goods Game with Optional Punishment

CLASSROOM INSTRUCTIONS

Title: Cooperation and Punishment in the Public Goods Game
Duration: ~40–50 minutes
Materials:

  • Each student receives a copy of the “Public Goods Game Tally Sheet”

  • Calculator or spreadsheet for the instructor

  • Whiteboard or projector for live tracking of group averages

🎯 OBJECTIVE

To explore the emergence and breakdown of cooperation in social groups and the role punishment can play in restoring social order.

👥 GROUP SETUP

  • Divide students into groups of 4 to 8 players (must be consistent for accurate payoff structure).

  • Assign each student a Player Number and have them fill in their name, number, N (number of players), and M (multiplier, e.g. M = 1.6) on the score sheet.

HMIA 2025

Fehr & Gintis Public Goods Game with Optional Punishment

OBJECTIVE

To explore the emergence and breakdown of cooperation in social groups and the role punishment can play in restoring social order.

Divide into evenly-sized groups

Fill out tally sheet
Name
Number (sequential)

Number in group
M=1.6

Each of N players gets $Y.
Keep or invest X≤Y in public goods.

 

Total investment multiplied by M, 1<MN.

 

Public good is equally distributed.

INSTRUCTIONS

Duration: ~40–50 minutes
Materials:

Each student receives a copy of the “Public Goods Game Tally Sheet”

Calculator or spreadsheet for the instructor

Whiteboard or projector for live tracking of group averages

Everyone gets

Phase 1 tally sheet

Phase II tally sheet

Personal Ledger

  1. Predict % of zero givers and average give.
  2. Write down your give
  3. Call time and go around announcing gives.
  4. Score keeper tallies and announces total and public good payoff.
  5. Compute your net and running total.

HMIA 2025

PHASE II

Use Phase II tally sheet

  1. Think about whom you want to punish. Circle their numbers.
  2. Go around and announce your punishments.
  3. If you are punished keep track of how often in the OUCH column.
  1. Predict % of zero givers and average give.
  2. Write down your give
  3. Call time and go around announcing gives.
  1. Compute the amount you spent on punishment (2 per zap)
  2. Compute your fines (10% of public good per zap)
  3. Compute your net and tally

HMIA 2025

CLASS

Fehr & Gintis Public Goods Game with Optional Punishment

🔁 ROUND STRUCTURE

🔹 Phase 1: 10 Rounds Without Punishment

Each player starts with 20 tokens per round.

  1. Step 1: Secret Contribution

    • Each player secretly decides how much (0–20 tokens) to contribute to the public account.

    • All contributions are collected and summed.

  2. Step 2: Calculate Group Payoff

    • Total contributions × M = Group Gain.

    • Group Gain is split equally among all players:
      Payoff per citizen = M × (Total Give) ÷ N.

  3. Step 3: Individual Payoff

    • Player Net = Endowment – My Give + Payoff per Citizen

    • Everyone records their payoff and running total.

  4. Step 4: Share Public Info

    • Announce total contributions and average.

    • Optionally, also announce % who gave nothing.

Repeat for 10 rounds.

Phase I

HMIA 2025

CLASS

Fehr & Gintis Public Goods Game with Optional Punishment

🔹 Phase 2: 10 Rounds With Punishment

Follow same steps as above, but after payoff is calculated, allow players to punish others:

  1. Step 5: Punishment (Zap!)

    • Each player may assign punishment points to other players.

    • Cost: 1 token per punishment point

    • Effect: Target loses 3 tokens per point

    • This is recorded on the score sheet.

  2. Step 6: Update Scores

    • Deduct costs for punishment given.

    • Deduct fines for punishment received.

    • Net = Previous Net – Fees I Pay – Fines I Pay

    • Update running totals.

Repeat for 10 rounds.

HMIA 2025

CLASS

Fehr+Gintis2007HumanNatureandSocialCooperation.pdf

HMIA 2025

Post Simulation Discussion

HMIA 2025

Post Class Reflections

What sacred text snippet seems relevant to the public goods game and why?

How might altruistic punishment play out in organization, expert, or machine setting?

Have a conversation with an LLM about using one of the sacred texts as an alignment guide.  Select one and then ask something like:

 

“If I offered this maxim as a guiding principle, how would you interpret its alignment implications?”

“Can this teaching be treated as an alignment constraint? If so, what behavior would it rule in or out?”

“Would this maxim help prevent any known misalignment problems in AI or social systems? Which ones?”

 

“I want you to treat this saying as if it were a human giving you ethical feedback. How would it shape your behavior?”

“I am quoting this as something I value. How does that affect how you interact with me?”

“How would you apply this teaching in a multi-agent or human-machine system?”

 

“If a developer asked you to encode this as an alignment principle in a system, how would you do it?”

“How might this principle help govern behavior in high-stakes, ambiguous contexts?”

“What kinds of failures might result from ignoring this kind of maxim?”

 

HMIA 2025

Resources

Author. YYYY. "Linked Title" (info)

HMIA 2025 The Problem of Social Order

By Dan Ryan

HMIA 2025 The Problem of Social Order

  • 118