Matt Lamm (S2 '18) & Sid Shanker (S1 '18)
You have some agent in an environment.
The agent can take actions to change the "state" of the environment
At each step, the environment provides "feedback" to the agent, in the form of a reward.
State = s =
Action = a =
Reward = R(s,a) =
For tic-tac-toe, a simple table will do.
Mario has way too many states to be represented in a table
Cat: .7
Dog: .1
Human: .2
...
For games like Mario, you can use Neural Networks to approximate Q functions.
Jump: 100
Continue: 5
Traditional Approach
RL Episode 1
RL Episode 198