Cristian Vargas
Taken from: http://pngimg.com/download/36890
import gym
from gym import wrappers
from random_agent import RandomAgent
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
agent = RandomAgent()
env = gym_super_mario_bros.make('SuperMarioBros-1-1-v0')
env = BinarySpaceToDiscreteSpaceEnv(env, SIMPLE_MOVEMENT)
env = gym.wrappers.Monitor(env, agent.get_recording_folder(),
video_callable=lambda episode_id: episode_id % 2 == 0,
force=True)
done = True
episode = 1
while True:
if done:
print('Restarting env. Attempt # {}'.format(episode))
state = env.reset()
episode += 1
state, reward, done, info = agent.run(env)
env.close()
class RandomAgent(Agent):
"""
Randomly executes an action from the action space.
Has no memory nor intelligence at all.
"""
def get_recording_folder(self):
return './random'
def run(self, env):
state, reward, done, info = env.step(env.action_space.sample())
return state, reward, done, inf
Is this applicable to the Mario game?
Possible number of states, assuming we feed a single image at once, resized to 84x84 and changed to grayscale:
84*84*4*256 = 7'225,344
Preprocessing steps
I followed the approach proposed in the original deepmind paper and based earlier work on a toptal post.
The approach is:
Preprocessing steps
In short terms, we approximate the q-table using neural networks
Challenge: Input data is highly correlated, breaking the assumption for gradient descent that the input must be independent and identically distributed random variables( i.i.d)
In general terms, using neural networks with reinforcement learning is an unstable process
Network Architecture
class NeuralNetwork(nn.Module):
def __init__(self, number_of_actions):
super(NeuralNetwork, self).__init__()
self.conv1 = nn.Conv2d(4, 32, 8, 4)
self.relu1 = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(32, 64, 4, 2)
self.relu2 = nn.ReLU(inplace=True)
self.conv3 = nn.Conv2d(64, 64, 3, 1)
self.relu3 = nn.ReLU(inplace=True)
self.fc1 = nn.Linear(3136, 512)
self.relu4 = nn.ReLU(inplace=True)
self.fc2 = nn.Linear(512, self.number_of_actions)
DQN suffers of overconfidence on its estimations. This means that it can propagate noisy estimation values across the entire Q-table.
This affects the stability of the algorithm.
A proposed solution named Double Learning attacks this problem.
Keep 2 different tables. Take the action from table #1, and the value for the proposed action from the table #2.
EPOCH 0
EPOCH 10000
EPOCH 16000