Deep Reinforcement Learning

Matt Lamm (S2 '18) & Sid Shanker (S1 '18)

What is reinforcement learning?

You have some agent in an environment.

 

The agent can take actions to change the "state" of the environment

 

At each step, the environment provides "feedback" to the agent, in the form of a reward.

State = s =

Action = a =

Reward = R(s,a) = 

Q Functions

Representing Q Functions

For tic-tac-toe, a simple table will do.

Mario has way too many states to be represented in a table

Neural Networks

Cat: .7

Dog: .1

Human: .2

...

For games like Mario, you can use Neural Networks to approximate Q functions.

Jump: 100

Continue: 5

Deep Q Functions

How to learn about the world?

  • Early on, you know very little about how the game works, and don't have a sense of what actions are good.
  • Take mostly random steps!
  • Then, as your sense of reward in the world improves, you draw actions from your own model.

Deep RL for congestion 

 

  • State: history of packet round-trip times on the connection
  • Action: increase or decrease window (of packets to send out at a given point in time)
  • Reward: Penalize slower round trip times but reward increases in the congestion window.

Some results

Traditional Approach

RL Episode 1

RL Episode 198

deck

By Sid Shanker

deck

  • 795