By Osvaldo Castellanos
Motivation
Ref: Machine Learning for Everyone
Ref: https://xkcd.com/1838/
RL Model
Ref: https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#what-is-reinforcement-learning
Markov Decision Processes
Ref: https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#what-is-reinforcement-learning
Important Concepts:
Ref: https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#what-is-reinforcement-learning
Ref: https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#what-is-reinforcement-learning
Ref: https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#what-is-reinforcement-learning
Backup Diagram
Ref: https://medium.freecodecamp.org/diving-deeper-into-reinforcement-learning-with-q-learning-c18d0db58efe
A Taxonomy of RL Algorithms
Ref: Spinning up RL
Approaches:
Deep Q-Network
Ref: URL: https://2.bp.blogspot.com/-bZERYUNyjao/Wa98yt7GjhI/AAAAAAAACt8/SYQjUNrbe1YDtKTMKR6LPt68C0pPqkoowCLcBGAs/s1600/DRL.JPG
import gym
env = gym.make("CartPole-v1")
env = gym.wrappers.Monitor(env, "cart-pole")
observation = env.reset()
for _ in range(1000):
env.render()
action = env.action_space.sample() # your agent here (this takes random actions)
observation, reward, done, info = env.step(action)
if done:
observation = env.reset()
env.close()OpenAI Gym
Main Functions Needed in a Custom Environment to Interface with Gym:
Step returns:
import numpy as np
import os
import gym
from gym import error, spaces
from gym import utils
from gym.utils import seeding
from gym_traffic.envs import traffic_simulator
import pygame
class TrEnv(gym.Env):
metadata = {'render.modes': ['human', 'rgb_array']}
def __init__(self):
self.sim = traffic_simulator.TrafficSim()
self.action_space = spaces.Discrete(n=2)
self.observation_space = spaces.Box(low=0, high=1, shape=(4,7), dtype=np.uint8)
def step(self, action):
ob, reward, done, signal = self.sim.step(action)
return ob, reward, done, signal
def _get_obs(self):
return self.sim.getGameState()
def reset(self):
self.sim.reset()
return self.sim.getGameState()https://github.com/oscastellanos/gym-traffic/blob/master/gym_traffic/envs/TrEnv.py
Ref: https://www.pygame.org/wiki/about
traffic_simulator.py
https://github.com/oscastellanos/gym-traffic/blob/master/gym_traffic/envs/traffic_simulator.py
"Deep Reinforcement Learning for Traffic Light Control in Vehicular Networks," Liang et al., (2018), arxiv.org/abs/1803.11115
Ref: https://openai.com/blog/faulty-reward-functions/
| State | Action | Reward |
|---|---|---|
| Position of cars per lane. Up to 7 cars, bumper-to-bumper |
0 - Horizontal Lane is green, Vertical is Red 1 - Horizontal is Red, Vertical is Green |
r = W(t) - W(t+1) |
"Deep Reinforcement Learning for Traffic Light Control in Vehicular Networks," Liang et al., (2018), arxiv.org/abs/1803.11115
# Position Matrix Velocity Matrix
# Vehicles Heading East
([(array([[0., 0., 0., 0., 0., 0., 0.]]), array([[0., 0., 0., 0., 0., 0., 0.]])),
# Vehicles Heading West
(array([[0., 0., 0., 0., 0., 0., 0.]]), array([[0., 0., 0., 0., 0., 0., 0.]])),
# Vehicles Heading North
(array([[0., 0., 0., 0., 0., 0., 0.]]), array([[0., 0., 0., 0., 0., 0., 0.]])),
# Vehicles Heading South
(array([[0., 0., 0., 0., 0., 0., 0.]]), array([[0., 0., 0., 0., 0., 0., 0.]]))],
# Current Total Reward of Episode
0,
# Is the episode done?
False)Ref: https://github.com/sarcturus00/Tidy-Reinforcement-learning/blob/master/Pseudo_code/DQN.png
"Deep Learning for Video Game Playing," Justesen et al. (2019), arxiv.org/pdf/1708.07902.pdf
"Deep Reinforcement Learning for Traffic Light Control in Vehicular Networks," Liang et al., (2018), arxiv.org/abs/1803.11115
A To-Do list of upcoming changes to simulator/environment:
For the Poster:
Final report:
References:
Recommendations:
THANK YOU!