Deep Reinforcement Learning and Efficient exploration

Reinforcement Learning

Agent POV

An agent follows policy

Policy:  strategy to pick action from current state

The optimal policy      maximizes the sum of future rewards 

returns sum of rewards from state s wrt policy 

\pi
π\pi
\pi^*
π\pi^*
V^\pi(s)
Vπ(s)V^\pi(s)
\pi
π\pi

computable if given 

\pi^*
π\pi^*
V^{\pi^*}(s)
Vπ(s)V^{\pi^*}(s)

Neural Network

 

Deep Reinforcement Learning

Policy iteration

Update rule

SARSA

TD-LAMBDA

Q-LEARNING

 

Different exploration

EPSILON GREEDY

EFFICIENT EXPLORATION

 

Results

Conclusion

 

Made with Slides.com