Martin Biehl (Cross Labs)
Nathaniel Virgo (Earth-Life Science Institute)
Made possible via funding by:
agent!
agents?
What is the difference?
non-agents?
Physics:
Daniel Dennett:
Two stances (ignore design stance):
Daniel Dennett:
Two stances (ignore design stance):
Our approach:
Given stochastic Moore machine (without any environment):
Question: When is it justified to call it an agent?
Given stochastic Moore machine (without any environment):
Ask: Is it an agent?
We propose: Justified to call it "agent" if together with
Given stochastic Moore machine (without any environment):
Question: When is it justified to call it an agent?
We propose answer:
Because this provides well defined notions of
stochastic machine:
write:
Agent:
Agent:
Agent:
Agent:
So agent is a pair of system and interpretation.
Dennett: any interpretation is fine, predictive performance matters.
We: only consistent interpretations are fine
Possibly many interesting pairs of
Here choose:
stochastic Moore machine:
notation:
Given stochastic Moore machine (without any environment):
Question: When is it justified to call it an agent?
Proposal: Justified to call it "agent" if find
Formal answer:
consistent interpretation as a solution to a POMDP
this consists of
a POMDP and
its solution (optimal policy on beliefs)
interpretation function
satisfying
a consistency equation (Bayesian updating)
optimal action equation
Partially observable Markov decision process (POMDP) consists of:
notation:
Partially observable Markov decision process (POMDP)
Solution consists of:
Standard solution of POMDP (well known) induces a Moore machine:
Solution of POMDP (well known):
Given a POMDP with hidden state space \(\mathcal H\)
Standard solution of POMDP (well known) induces a Moore machine:
This machine has beliefs, a goal, and acts optimally \(\Rightarrow\) can call it an agent!
Standard solution of POMDP (well known) induces a Moore machine:
But most machines don't have states that are probability distributions \(\Rightarrow\) can they be agents too?
Interpretation function:
Interpretation function:
belief \(\psi(h|m)\) before update
belief \((\psi\circ \mu)(h|i,m)\) after update
Consistency equation:
Optimal action equation:
Consistency equation:
Optimal action equation:
Consistency equation:
Optimal action equation:
Consistency equation:
Can show:
Reason:
So interpretation of Moore machine as solution to POMDP has:
Justified to call a Moore machine with such an interpretation an agent!
Consistency equation:
Optimal action equation:
Moore machine analogous to dynamics of autonomous states in FEP:
Moore machine here plays similar role to dynamics of autonomous states:
Moore machine here plays similar role to dynamics of autonomous states:
Why are Moore machines interesting?
if we have two can combine them:
and chain them to get processes:
So what is a
consistent interpretation in terms of
For a Moore machine?
But:
One possible answer:
consistent interpretation as a solution to a POMDP
this consists of
a POMDP i.e.:
interpretation function \(\psi: \mathcal M \to P\mathcal H\)
satisfying
a consistency equation
optimal action equation
consistency equation
optimal action equation
Moore machine
POMDP transition kernel
interpretation map
consistency equation
optimal action equation
One possible answer:
consistent interpretation as a solution to a POMDP
this consists of
a POMDP
interpretation function \(\psi: \mathcal M \to P\mathcal H\)
satisfying
a consistency equation
optimal action equation
One possible answer:
consistent interpretation as a solution to a POMDP
this consists of
a POMDP
interpretation function \(\psi: \mathcal M \to P\mathcal H\)
satisfying
a consistency equation
optimal action equation
Lucky observation:
Then:
Argumentation:
Argumentation:
is an agent!
Argumentation:
also an agent!
Given a POMDP with hidden state space \(\mathcal H\)
Given a POMDP with hidden state space \(\mathcal H\)
Given a POMDP with hidden state space \(\mathcal H\)
Given a POMDP with hidden state space \(\mathcal H\)
Given a POMDP construct Moore machine
Justified to call this a rational agent with beliefs and a goal!
Yey! We got some agent! But we can find more! Some machines
Simplified notation:
Informal image:
Main statement:
Given a Moore machine a consistent interpretation as a solution to a POMDP is given by:
such that
Recall Bayes rule (for any random variables \(A,B\)):
\[p(b\,|\,a) = \frac{p(a\,|\,b)}{p(a)} \;p(b)\]
Bayesian inference:
Bayesian belief updating:
Bayesian belief updating with conjugate priors:
Bayesian belief updating with conjugate priors:
Bayesian belief updating with conjugate priors:
Then could
Consider designing an artificial agent where you know
Then want to find
We show how
Then can (often but not always) formally represent this in Bayesian network with
To find unknown kernels \(p_U:=\{p_u: u \in U\}\)
[*] Matthew Botvinick and Marc Toussaint. Planning as inference. Trends in cognitive sciences, 16(10):485–488, 2012.
Two options:
Then result of planning as inference will
Two situations:
Explicitly reflect either by
Then
If we have found the agent that solves the problem
How to fix memory dynamics \(p_M(m_t|s_t,m_{t-1})\) such that it has consistent Bayesian interpretation w.r.t chosen model
in 2-armed bandit:
Then can make uncertainty explicit:
Then by construction
Note:
Underlying perspective:
Underlying perspective:
Underlying perspective:
\(\Rightarrow\) if we formulate agent design problems as planning problems they become inference problems
Underlying perspective:
design as planning \(\to\) planning as inference
\(\Rightarrow\) design as inference?
Assume
Then
Formalize as POMDP:
Terminology:
What is it good for?
Automatically find a probabilistic policy to achieve a goal.
What do you need to use it?
Combination:
\(\Rightarrow\) can use max. likelihood to solve planning!
Given:
Find parameter \(\phi^*\) that maximizes likelihood of the observations:
\[\phi^*=\text{arg} \max_\phi p_\phi(\bar x)\]
Example: Maximum likelihood inference of coin bias
Then:
\[\phi^*=\text{arg} \max_\phi p_\phi(\bar x)=\frac{c_{\text{heads}}(\bar x)}{c_{\text{heads}}(\bar x)+c_{\text{tails}}(\bar x)}\]
Example: Maximum likelihood inference of coin bias
Find parameter \(\phi^*\) that maximizes likelihood of the observations:
\[\phi^*=\text{arg} \max_\phi p_\phi(\bar x)\]
Example: Maximum likelihood inference of coin bias
Find parameter \(\phi^*\) that maximizes likelihood of the observations:
\[\phi^*=\text{arg} \max_\phi p_\phi(\bar x)\]
Example: Maximum likelihood inference of coin bias
Find parameter \(\phi^*\) that maximizes likelihood of the observations:
\[\phi^*=\text{arg} \max_\phi p_\phi(\bar x)\]
Note that for maximum lik
So we can use :
Bayesian network
goal
Bayesian network
goal
policies
Practical side of original framework:
Bayesian network
goal
policies
Bayesian network
goal
policies
Bayesian network
goal
policies
Bayesian network
goal
policies
Multiple, possibly competing goals
Coordination and communication from an information theoretic perspective
Dynamic scalability of multi-agent systems
Dynamically changing goals that depend on knowledge acquired through observations
Example multi agent setups:
Two agents interacting with same environment
Two agents with same goal
Two agents with different goals
Example non-cooperative game: matching pennies
Example non-cooperative game: matching pennies
joint pdists \(p(a_1,a_2)\)
disjoint goal manifolds
agent manifold
\(p(a_1,a_2)=p(a_1)p(a_2)\)
EM
EM
Text
Text
2. Planning to learn / uncertain MDP, bandit example.
if x=1
if x=1
But for adding and removing agents probably needed
Thank you for your attention!