Martin Biehl and Nathaniel Virgo
In nutshell:
To find unknown kernels \(p_A:=\{p_a: a \in A\}\)
Practical side of original framework:
Bayesian network
goal
policies
Bayesian network
goal
Bayesian network
goal
policies
Bayesian network
goal
policies
Bayesian network
goal
policies
Bayesian network
goal
policies
Use planning as inference for agent design:
Note: Bayesian network structure of
Controllable kernel structure can be used to make internal structure of designed agent explicit!
Consider designing an artificial agent for a POMDP i.e. you know
Then find
via planning as inference.
Get three kernels:
But in practice (on a robot) time passes between \(S_0\) and \(A_2\):
Text
Action maybe produced by three kernels:
In practice time passes between \(S_0\) and \(A_2\):
So also need to find
Also want constant memory and action kernels
In following "agent" usually means
Two situations:
Explicitly reflect either by
Then
Consider agent that solves a problem in uncertain environment.
Alternatively:
Recall Bayes rule (for any random variables \(A,B\)):
\[p(b\,|\,a) = \frac{p(a\,|\,b)}{p(a)} \;p(b)\]
Bayesian inference:
Bayesian belief updating:
Bayesian belief updating with conjugate priors:
Bayesian belief updating with conjugate priors:
Bayesian belief updating with conjugate priors:
One way:
Then by construction
Note:
Consider agent that solves a problem in uncertain environment.
Note:
Example multi agent setups:
Two agents interacting with same environment
Two agents with same goal
Two agents with different goals
Example non-cooperative game: matching pennies
Example non-cooperative game: matching pennies
joint pdists \(p(a_1,a_2)\)
disjoint goal manifolds
agent manifold
\(p(a_1,a_2)=p(a_1)p(a_2)\)
EM
EM
Text
Text
Training \(n=1\) agents
Running \(m=2\) agents
2. Agent number can change from \(n\) to \(m\) depending on events at runtime
Can't be combined in one Bayesian network!
Can't be combined in one Bayesian network!
Can't be combined in one Bayesian network!
?
?
?
?
2. Agent number can change from \(n\) to \(m\) depending on events at runtime
Can be combined in this
probabilistic program!
2. Agent number can change from \(n\) to \(m\) depending on events at runtime
2. Planning to learn / uncertain MDP, bandit example.
if x=1
if x=1
But for adding and removing agents probably needed
Thank you for your attention!