How Julia Makes New Decision-Making AI Possible

Zachary Sunberg, PhD

Assistant Professor

University of Colorado Boulder

Sequential Decision Making under Uncertainty

Waymo Image By Dllu - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64517567

Markov Model

  • \(\mathcal{S}\) - State space
  • \(T:\mathcal{S}\times\mathcal{S} \to \mathbb{R}\) - Transition probability distributions

Markov Decision Process (MDP)

  • \(\mathcal{S}\) - State space
  • \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
  • \(\mathcal{A}\) - Action space
  • \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward

Solving MDPs - The Value Function

$$V^*(s) = \underset{a\in\mathcal{A}}{\max} \left\{R(s, a) + \gamma E\Big[V^*\left(s_{t+1}\right) \mid s_t=s, a_t=a\Big]\right\}$$

Involves all future time

Involves only \(t\) and \(t+1\)

$$\underset{\pi:\, \mathcal{S}\to\mathcal{A}}{\mathop{\text{maximize}}} \, V^\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s_t, \pi(s_t)) \bigm| s_0 = s \right]$$

$$Q(s,a) = R(s, a) + \gamma E\Big[V^* (s_{t+1}) \mid s_t = s, a_t=a\Big]$$

Value = expected sum of future rewards

Online Decision Process Tree Approaches

Time

Estimate \(Q(s, a)\) based on children

$$Q(s,a) = R(s, a) + \gamma E\Big[V^* (s_{t+1}) \mid s_t = s, a_t=a\Big]$$

\[V(s) = \max_a Q(s,a)\]

Partially Observable Markov Decision Process (POMDP)

  • \(\mathcal{S}\) - State space
  • \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
  • \(\mathcal{A}\) - Action space
  • \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
  • \(\mathcal{O}\) - Observation space
  • \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution

POMDP Sense-Plan-Act Loop

Environment

Belief Updater

Planner

\(o\)

\(b\)

\(a\)

Aggressive: 63%

Normal: 34%

Timid:   3%

\(x, y, v\)

Turn Left

2011

2013

2014

Also ~2011: Improving TCAS

ACAS X

POMDP Models

+

=

Optimization

Specification

POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia

[Egorov, Sunberg, et al., 2017]

Challenges for POMDP Software

  1. POMDPs are computationally difficult.

Julia - Speed

Celeste Project

1.54 Petaflops

Challenges for POMDP Software

  1. POMDPs are computationally difficult.
  2. There is a huge variety of
    • Problems
      • Continuous/Discrete
      • Fully/Partially Observable
      • Generative/Explicit
      • Simple/Complex
    • Solvers
      • Online/Offline
      • Alpha Vector/Graph/Tree
      • Exact/Approximate
    • Domain-specific heuristics

Explicit

Black Box

("Generative" in POMDP lit.)

\(s,a\)

\(s', o, r\)

Previous C++ framework: APPL

"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."

[Egorov, Sunberg, et al., 2017]

  • A POMDP is an MDP on the Belief Space but belief updates are expensive
  • POMCP* uses simulations of histories instead of full belief updates
  • Each belief is implicitly represented by a collection of unweighted particles

[Ross, 2008] [Silver, 2010]

*(Partially Observable Monte Carlo Planning)

POMCP

POMCP-DPW

POMCPOW

General Sum Differential Games

Joint Dynamics:
$$\dot{x} = f(t, x, u_1, \ldots, u_N)$$

Cost for player \(i\):
$$J_i = \int_0^T{g_i(t, x, u_1, \ldots, u_N) dt}$$

Strategy of player \(i\):
$$u_i(t) = \gamma_i(t, x)$$

Continuous Action Spaces

(sp, r), back = pullback((s,a)->@gen(:sp,:r)(m, s, a, rng), s, a)

Acknowledgements

The content of my research reflects my opinions and conclusions, and is not necessarily endorsed by my funding organizations.

Thank You!

image/svg+xml

POMDP Sense-Plan-Act Loop

Environment

Belief Updater

Policy

\(o\)

\(b\)

\(a\)

\[b_t(s) = P\left(s_t = s \mid a_1, o_1 \ldots a_{t-1}, o_{t-1}\right)\]

Laser Tag POMDP

Types of Uncertainty

ALEATORY

MODEL (Epistemic, Static)

STATE (Epistemic, Dynamic)

Julia 5280

By Zachary Sunberg

Julia 5280

  • 485