Hierarchy

HMIA 2025

Class Title

HMIA 2025

"Readings"

Video: x [3m21s]

Activity: TBD

PRE-CLASS

CLASS

HMIA 2025

TRIVIAL COOPERATION: shared goals and information - pick a goal and execute

Hobbes' observation: scarcity + similar agents→competition life is brutish&short

Hobbes' fix: cede sovereignty to boss with credible enforcement. Command→order

Command Failure Mode (preferences)
Agents retain autonomy→effort substitution & selective obedience

Command Failure Mode (information)
Orders incomplete & ambiguous, environments shift.

Principals and Agents

From commands to contracts. Alignment by design: selection, monitoring, incentives to align autonomy with principals goals.

Agent as RL learner.
Naked RL is a clean micro-model: the agent updates a policy to maximize rewards.

Goodhart risk: m(·) omits what drives V(·), maximizing T(m(a)) reduces Us. Gaming, reward hacking, short termism.

Requires governance and guardrails. Lagged, hard-to-game proxies, HITL overrides, team rewards, culture, the "alignment stack"

Incentives are transfers on signals.

 

 

 

\text{Let measurements be } 𝑚(𝑎_𝐴); \text{incentives } 𝑇(𝑚(𝑎_𝐴) \\ \text{Agent utility:} 𝑈_𝐴(𝑎)=𝑇(𝑚(𝑎))−𝐶_𝐴(𝑎)+𝐼_𝐴(𝑎) \\ \text{Principal utility: } 𝑈_𝑆(𝑎)=𝑉(𝑎)−𝑇(𝑚(𝑎))−𝐾_𝑆(𝑎)

As soon as behavior is driven by 𝑇(𝑚(𝑎)) T(m(a)), the problem is no longer obedience—it’s measurement.

But T(m(a)) is always a lossy compression of what matters.

HMIA 2025

PRE-CLASS

HMIA 2025

PRE-CLASS

Lecture Title

HMIA 2025

CLASS

HMIA 2025

CLASS

HMIA 2025

Resources

Author. YYYY. "Linked Title" (info)

NEXT Markets