Safe AI-Enabled Systems through Collective Intelligence

<DATE>

Rafael Kaufmann, Thomas Kopinski,
Justice Sefas, Michael Walters

Topics ahead

Motivation & Background

Case Study: Fisheries

Case Study: Nuclear Decommissioning

Q&A

AIEverything safety

AI safety is a piece of the Metacrisis:

Everything is connected (climate, food, security, biodiv)

× Billions of highly-capable intelligent agents
× Coordination failure ("Moloch")

= Total risk for humanity and biosphere

Schmachtenberger, hypothesized attractors:

- Chaotic breakdown

- Oppressive dystopian control

Before this:

Today's AI Risks: Misspecification + coordination failure = compounded catastrophic risk

Before this:

Today's AI Risks: Misspecification + coordination failure = compounded catastrophic risk

We'll have exponentially more of this:

Perverse instantiation of AI systems ranges from benign and up

Game playing agents find bizarre techniques
Social media algorithms optimized towards dopamine machines
OOD issues can express social bias etc. in production

AI in the wild

Hyperfast algorithmic trading can run amok (2010 Flash Crash)

Active Inference proposes a model of intelligence at all scales, from microbes to macro agents.

Entities continuously accumulate evidence for a generative model of their sensed world, "self-evidencing"; plugs Bayes right into entity operations.

Arguments fom control theory also posit that physical systems contain structures that are homomorphic to their outer environment.

The result is AI that “scales up” the way nature does: by aggregating individual intelligences and their locally contextualized knowledge bases, within and across ecosystems, into “nested intelligences”—rather than by merely adding more data, parameters, or layers to a machine learning architecture.

- K. Friston et al. (2024)

Intelligence from first principles

Nirosha J. Murugan et al. (2020)

Aneural organism Physarum polycephalum behaving in response to physics of environment

Intelligence from first principles

Nirosha J. Murugan et al. (2020)

Aneural organism Physarum polycephalum behaving in response to physics of environment

Under ActInf, shared narratives (world models) and communication are mutually beneficial for updating generative models.

The future amalgamation of humans and artificial agents stands to produce a higher order collective intelligence.

But we want to maintain our values and safety...

Bayesianism and ActInf seems like a sound approach for value learning our world models. Simulation will help us gatekeep and evaluate proposed actions.

Human-AI Collective

\mathfrak{R}(\pi) := \textbf{FEF}(\pi) = \mathbb{E}_{p(V,o | \pi)} \textbf{D}_{KL}[p(x | o) ||\ \tilde{p}(V)]

risk Exposure AS free energy

"Free energy of the future" (Millidge et al, 2021) - based on variational free energy.

Lower bound on expected model evidence, maximize reward while minimizing exploration

In a fully observable setting this simplifies to:

\mathfrak{R}(\pi) = - \mathbb{E}_{p(V | \pi)} [ \text{ln } \tilde{p}(V)]

Cf. conditional expected shortfall (finance), KL control (control theory)

preference prior

Towards "Everything Safety"

Case study: managing the Risk of AI-ACCELERATED overharvesting and ecosystem collapse

Overfishing Case study

\frac{dB}{dt} = rB - rB\frac{B}{k} - qEB

overfishing: Architecture at a glance

Population evolution

Cost | Revenue | Profit

Loss

C(E) = C_0(1-E)^\gamma; \quad \gamma < 1 \\ R(E) = P_0B^\rho; \quad \rho<1 \\ V(E) = R(E) - C(E)

L(E_t) = \frac{1}{(1+w)^t}V_t^-; \quad V^- = min(0, V_t)

Bounded from above (satisficing)
Equivocal about all profit-taking scenarios
Futures discounted via w

overfishing: Architecture at a glance

Loss

L(E_t) = \frac{1}{(1+w)^t}V_t^-; \quad V^- = min(0, V_t)

Bounded from above (satisficing)
Equivocal about all profit-taking scenarios
Futures discounted via w

Preference Prior

\tilde{p}(L) = e^{-kL}; \quad k = -\ln(p^*)/L^*

stakeholder accepted probability of loss L*

\begin{aligned} \mathfrak{R}(\pi) = \textbf{FEF}(\pi) = - \mathbb{E}_{p(L | \pi)} [ \text{ln } \tilde{p}(L)] \end{aligned}

Risk

\mathfrak{R}(\pi) := \textbf{FEF}(\pi) = \mathbb{E}_{p(V,o | \pi)} \textbf{D}_{KL}[p(x | o) ||\ \tilde{p}(V)]

risk Exposure AS free energy

"Free energy of the future" (Millidge et al, 2021) - lower bound on expected model evidence, maximize reward while minimizing exploration

In a fully observable setting this simplifies to:

\mathfrak{R}(\pi) = - \mathbb{E}_{p(V | \pi)} [ \text{ln } \tilde{p}(V)]

Cf. conditional expected shortfall (finance), KL control (control theory)

preference prior

Sample world parameters, e.g. E, r, ...
Run a world simulation of fishing, agent policy selection, etc. for each sample
At each episode, agent MC-samples trajectories of fish population changes, profits, costs, ...
Cumulative risk measured along trajectories
- , based on pref. prior
Inform (or not) policy

overfishing: Architecture at a glance

\mathfrak{R}(\pi)

\pi

Overfishing Case study

Myopic (single-season) profit-maximizing agents deplete population (and profits)

Less time-discounting = higher perceived risk, earlier

Overfishing Case study

Constrained policy for various ε

Overfishing Case study

CRE responds to evolving stakeholder preferences of far-sightedness

Safely decommissioning with AI

Background: Germany is progressing its energy transition plans with one main goal being the decommissioning of all nuclear power plants
This is a complex process with many challenges, at the forefront:
- Radioactive waste management
- Decompositioning of building parts
- Recycling of materials
- Knowledge management
In order to ensure a correct procedure and reduce risk for failure, AI components are incrementally introduced into the processes
AI Tools have been successfully deployed in e.g.:
- Robotics: Automatic scanning of elements within the building for transportation via Computer Vision
- Knowledge Management: Teaching / Onboarding new staff with AR/VR and NLP
- Human Resources: Utilizing NLP and LLMs for Person-to-Job fitting

Nuclear Decommissioning Process in Germany’s Energy Transition

How can we adapt this system to nuclear decommissioning?

Classification as
Major Change

Preparation of Documents for Major Change under §9 AtG

Confirmation from the Expert about Proper Execution of the Change to the State Ministry and the Radiation Protection Officer

Application to State Ministry

Review by Expert

Change Approval by State Ministry

Application to State Ministry

Supervised Execution
of Change by RPO

Notification of Change to State Ministry and Expert

Nothing happens fast and each action has multiple stages of review and approval

Classification as
Major Change

Preparation of Documents for Major Change under §9 AtG

Confirmation from the Expert about Proper Execution of the Change to the State Ministry and the Radiation Protection Officer

Application to State Ministry

Review by Expert

Change Approval by State Ministry

Application to State Ministry

Supervised Execution
of Change by RPO

Notification of Change to State Ministry and Expert

Coordination is required across many separate agents and groups, forming a complex—often non-linear—web of dependency and communication

Nothing happens fast and each action has multiple stages of review and approval

How can we adapt this system to nuclear decommissioning?

No single person can oversee and consider all processes and applications

Leverage language models and other AI tools to parse collective information,
recommend actions, and flag issues

Better informed decisions reduce accidents, costs, and harm

How can we adapt this system to nuclear decommissioning?

Knowledge Base

Optimization Engine /
Safety Harness

Regulators, human experts, etc. query the Knowledge Base for up-to-date statuses, procedures, and more

LLM mediated

Autonomous agents take/advise actions, and interface with humans while maintaining synchronized connection to the network

As new data is introduced in the the Knowledge Base, generative world models are updated.

Simulations on these models are carried out to compute risk metrics, and optimal decision actions.

Active processes

Safety & Regulatory Requirements

Tech Specs

....

Automated LLM tools digest array of documents beyond human capability, tracking statuses, updating records, and flagging potential issues

Logs

Update world models

Update KB

Wissensbasis

Optimierungs Engine /
Sicherheitschirm

Regulierungsbehörden, menschliche Experten usw. fragen die Wissensdatenbank nach aktuellen Status, Verfahren und mehr.

LLM vermittelt

Autonome Agenten ergreifen/beraten Maßnahmen und interagieren mit Menschen, während sie eine synchronisierte Verbindung zum Netzwerk aufrechterhalten.

Wenn neue Daten in die Wissensdatenbank eingeführt werden, werden generative Weltmodelle aktualisiert.

Simulationen dieser Modelle werden durchgeführt, um Risikometriken und optimale Entscheidungsaktionen zu berechnen.

Aktive Prozesse

Sicherheits- und behördliche Anforderungen

Technische Spezifikationen

....

Automatisierte LLM-Tools verarbeiten eine Vielzahl von Dokumenten über menschliche Fähigkeiten hinaus, verfolgen Status, aktualisieren Aufzeichnungen und markieren potenzielle Probleme.

Protokolle

Weltmodelle aktualisieren

WB aktualisieren

The Gaia Network

A WWW of world/decision models

Goal: Help agents (people, AI, organizations) with:

Making sense of a complex world
Grounding decisions, dialogues and negotiations

How: Decentralized, crowdsourced, model-based prediction and assessment

Where: Applications in:

Safe AI for automation in the physical world (today's focus)
Climate policy and investment
Participatory planning for cities
Sustainability and risk mitigation in supply chains
Etc

Let's discuss!

Can you think of other systems and applications that would benefit from CRE/ActInf and these tools?
Consider any issues, improvements, pitfalls. What might need to happen to evolve this for CI? e.g. room for ToM?
How might/might not this fit in the long view of AI safety and alignment, e.g. ASI?
Did the Panthers deserve the cup?