Sarah Dean PRO
asst prof in CS at Cornell
Prof. Sarah Dean
MW 2:55-4:10pm
255 Olin Hall
1. Recap: Bandits
2. MBRL and Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
A simplified setting for studying exploration
ex - machine make and model affect rewards, so context \(x=(\)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\(, \)•\()\)
UCB-type Algorithms
1. Recap: Bandits
2. MBRL and Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
\(\neq 1\)
\(1\)
\(0\)
\(1\)
\(2\)
\(H-1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
1. Recap: Bandits
2. MBRL and Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
UCB-VI
\(\neq 1\)
\(1\)
\(0\)
\(1\)
\(2\)
\(H-1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
DP
\(\neq 1\)
\(1\)
\(0\)
\(1\)
\(2\)
\(H-1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
1. Recap: Bandits
2. MBRL and Exploration
3. UCB Value Iteration
4. UCB-VI Analysis
\(\neq 1\)
\(1\)
...
\(\neq 1\)
\(\neq 1\)
\(\neq 1\)
\(1\)
\(1\)
\(1\)
\(1\)
By Sarah Dean