http://web.stanford.edu/class/cs234/index.html
Reinforcement Learning
What is the main difference between RL and other learning approaches?
By Yamaguchi先生 at the English language Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=57295504
Armed Bandits Problem
Exploration vs exploitation?
Markov Decision Process
Why MDP and POMDP?
source: wiki
Grid World
Discrete vs Continuous states
https://mpatacchiola.github.io/blog/
Bellman Equation
https://dnddnjs.gitbooks.io/
Credit Assignment?
Algorithms
Multi-Agent RL
Temporal Credit Assignment?