MW 2:45-4pm
255 Olin Hall
Outline:
Participation point: PollEV.com/sarahdean011
Infinite Horizon Discounted MDP
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, r, P, \gamma\}\)
Finite Horizon MDP
\(\mathcal M = \{\mathcal{S}, \mathcal{A}, r, P, H, \mu_0\}\)
ex - Pac-Man as MDP
Optimal Control Problem
ex - UAV as OCP
examples:
Policy results in a trajectory \(\tau = (s_0, a_0, s_1, a_1, ... )\)
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
\(+\gamma\)
\(+\gamma^2\)
\(+\quad ...\quad=\)
\(1\)
\(1-p_1\)
\(p_1\)
\(0\)
\(1\)
Example: \(\pi(s)=\)stay and \(\mu_0\) is each state with probability \(1/2\).
$$P_\pi = \begin{bmatrix}1& 0\\ 1-p_1 & p_1\end{bmatrix}$$
Food for thought:
\(s_0\)
\(a_0\)
\(s_1\)
\(a_1\)
\(s_2\)
\(a_2\)
...
examples:
...
...
...
Recursive Bellman Expectation Equation:
...
...
...
Recall: Icy navigation (PSet 2, lecture example)
Recall: Verifying optimality in Icy Street example
Food for thought: For a fixed point contraction by \(\gamma\), how many iterations are necessary to guarantee \(\epsilon\) error?
ex - UAV
Finite Horizon LQR: Application of Dynamic Programming
Basis for approximation-based algorithms (local linearization and iLQR)