Bandit Algorithms

 

lecturer: Pavel Temirchev

Slide Title

PI \ VI:

- Any MDP

- Known MDP

Bandits:

- Simple 1-step MDP

- Unknown MDP

Regret

- Worst-case regret

- Bayesian Regret

- Class of environments

Made with Slides.com