Prof. Sarah Dean
MW 2:45-4pm
255 Olin Hall
1. Recap: Bandits & MBRL
2. MBRL with Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
A simplified setting for studying exploration
ex - machine make and model affect rewards, so context \(x=(\)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\()\)
UCB-type Algorithms
Algorithm:
Analysis: \(\widehat \pi\) vs. \(\pi^*\)
1. Recap: Bandits & MBRL
2. MBRL with Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
\(\neq 1\)
\(1\)
\(0\)
\(1\)
\(2\)
\(H-1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
1. Recap: Bandits & MBRL
2. MBRL with Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
UCB-VI
\(\neq 1\)
\(1\)
\(0\)
\(1\)
\(2\)
\(H-1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
DP
\(\neq 1\)
\(1\)
\(0\)
\(1\)
\(2\)
\(H-1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
1. Recap: Bandits & MBRL
2. MBRL with Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
\(\neq 1\)
\(1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)