PyRL : Reinforcement Learning 

with Python

Antonin RAFFIN - Imad EL HANAFI

Understanding code 

Tools

SublimeText

Git

Libraries

MatplotLib

TKinter

http://gitlab.ensta.fr

Steps 

Step 1: Improving agents

Stupid agent : To understand the code

V value agent : Incremental and Batch

Q value agent : Incremental and Batch

Steps 

Step 2: new environment

 

 

Step 3: a better agent

 

 

 

2D environment  

Temporal Differencing agent 

Steps 

GUI for environments

Variations of 2D grid environment

Tetris simple

Step 4: new environments and GUI

Difficulties and solutions

Decay rate on different environment ?

How to represent an environment ? Matrix or list ...

Understand equations

How to choose good rewards ? on walls, environment limits ...

2D with walls

Tetris

Results

GUI

Comparing agents

Qvalue : 2D - 20 cells 

TD : 2D - 20 cells 

Results

Comparing environments

TD : 2D - static walls

TD : 2D -Moving walls

Results

Learning curves on TETRIS simple

TD : Tetris - 100 actions - 3 columns

TD : Tetris -100 actions - 5 columns

Results

More informations taken into account

TD : Tetris - state base on 3 rows

TD : Tetris - state base on 4 rows

Results

Results

Different rewards

TD : Tetris - constant Bad Reward

TD : Tetris - reward base on the action (good/bad choice)

Conclusion

Discovering  RL

First medium project with another person

Deepening knowledge of using Python

Working in english

Made with Slides.com