lecturer: Pavel Temirchev
PI \ VI:
- Any MDP
- Known MDP
Bandits:
- Simple 1-step MDP
- Unknown MDP
- Worst-case regret
- Bayesian Regret
- Class of environments