Deep Reinforcement Learning and Efficient exploration

Reinforcement Learning

Agent POV

An agent follows policy

Policy: strategy to pick action from current state

The optimal policy maximizes the sum of future rewards

returns sum of rewards from state s wrt policy

\pi

\pi

\pi^*

\pi^*

V^\pi(s)

V^\pi(s)

\pi

\pi

computable if given

\pi^*

\pi^*

V^{\pi^*}(s)

V^{\pi^*}(s)

SARSA

TD-LAMBDA

Q-LEARNING

EPSILON GREEDY

EFFICIENT EXPLORATION

By Ruben Fiszel