PyRL : Reinforcement Learning 

Antonin RAFFIN - Imad EL HANAFI

Understanding code 

Tools

SublimeText

Git

Libraries :

                             MatplotLib

                             TKinder

http://Gitlab.ensta.fr

Steps 

Step 1 :

 

Stupid agents : To understand the code

 V value agent : Incremental and Batch

 Q value agent : Incremental and Batch

 

 

Steps 

Step 2 :

2D environment  

 

Step 3 : 

Temporal Differencing agent 

 

 

Steps 

Step 4 :

 

GUI environments

Tetris simple 

 

Difficulties and solutions

  • How to represent an environment ? Matrix or list ...
  • Understand equations
  • How to choose good rewards ?                      on walls, environment limit ...
  • Decay rate on different environment ?

Details results and demonstrations : 

2D with walls :

Tetris

Details results and demonstrations : 

Learning curves : 

 

Qvalue : 2D - 20 cells 

TD : 2D - 20 cells 

Details results and demonstrations : 

Learning curves : 

 

TD : 2D - static walls

TD : 2D -Moving walls

Details results and demonstrations : 

Learning curves on TETRIS simple : 

 

TD : Tetris - 100actions - 3rows

TD : Tetris -100actions - 5rows

Details results and demonstrations : 

Learning curves on TETRIS simple : 

 

TD : Tetris - 100actions - 3rows

TD : Tetris -100actions - 5rows

Details results and demonstrations : 

Learning curves on TETRIS simple : 

 

TD : Tetris - 100actions - Bad Rewards

TD : Tetris -1000actions - 5rows

Conclusion

 

4 weeks project

Discovering  

Title Text

Made with Slides.com