Agent POV
An agent follows policy
Policy: strategy to pick action from current state
The optimal policy maximizes the sum of future rewards
returns sum of rewards from state s wrt policy
computable if given
SARSA
TD-LAMBDA
Q-LEARNING
EPSILON GREEDY
EFFICIENT EXPLORATION
By Ruben Fiszel