- Not about Neuronal Networks
- Not about Mathematics
- Reinforcement Learning
- Not about Neuronal Networks
- Not about Mathematics
- Reinforcement Learning
- But were still doing it
- But were still doing it
- Were creating AI that learns a Game
- Were not creating AI to be in a Game
- Important for reasoning
- Unavoidable for certain problems
0
RL
Intro
1
Dynamic
Programming
0
RL
Intro
2
Monte
Carlo
1
Dynamic
Programming
0
RL
Intro
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
0
RL
Intro
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
4
Function
Approximation
0
RL
Intro
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
4
Function
Approximation
5
Deep
QLearning
0
RL
Intro
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
4
Function
Approximation
5
Deep
QLearning
0
RL
Intro
6
Policy
Gradient
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
4
Function
Approximation
5
Deep
QLearning
0
RL
Intro
6
Policy
Gradient
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
4
Function
Approximation
5
Deep
QLearning
0
RL
Intro
6
Policy
Gradient
2
Monte
Carlo
1
Dynamic
Programming
3
Temporal
Difference
4
Function
Approximation
5
Deep
QLearning
2
Monte
Carlo
Blackjack
1
Dynamic
Programming
Icy Lake
3
Temporal
Difference
Taxi
4
Function
Approximation
Mountain Car
5
Deep
QLearning
Atari 2600
2
Monte
Carlo
Blackjack
1
Dynamic
Programming
Icy Lake
3
Temporal
Difference
Taxi
4
Function
Approximation
Mountain Car
5
Deep
QLearning
Atari 2600
20%
20%
20%
20%
20%
5
5
5
5
5
2
Monte
Carlo
Blackjack
1
Dynamic
Programming
Icy Lake
3
Temporal
Difference
Taxi
4
Function
Approximation
Mountain Car
5
Deep
QLearning
Atari 2600
20%
20%
20%
20%
20%
5
5
5
5
5
Note 5
Note 4
Note 3
Note 2
Note 1
2
Monte
Carlo
Blackjack
1
Dynamic
Programming
Icy Lake
3
Temporal
Difference
Taxi
4
Function
Approximation
Mountain Car
5
Deep
QLearning
Atari 2600
20%
20%
20%
20%
20%
5
5
5
5
5
Note 5
Note 4
Note 3
Note 2
Note 1
00 - 04
05 - 10
11 - 15
16 - 20
21 - 25
https://openai.com/research/openai-five
https://deepmind.google/technologies/alphago/
https://arxiv.org/pdf/1801.05086.pdf
https://www.roboticsproceedings.org/rss05/p27.pdf
https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
Supervised
Unsupervised
Machine Learning
Reinforcement
Supervised
Unsupervised
Machine Learning
Reinforcement
Supervised
Unsupervised
Machine Learning
Reinforcement
Supervised
Unsupervised
Machine Learning
Reinforcement
Supervised
Unsupervised
Machine Learning
Reinforcement
Agent
Agent
Environment
Agent
Environment
Time
Action
Agent
Environment
Time
State
Action
Agent
Environment
Time
State
Action
Reward
Agent
Environment
Time
State
Action
Reward
Trail:
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
Agent
Environment
The probability of transitioning to the next state depends only on the current state, not on the sequence of events that preceded it.
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
Agent
Environment
Time
State
Action
Reward
Trail:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
The actual Outcome for t
Goal:
State Value:
Action Value:
The actual Outcome for t
Goal:
State Value:
Action Value:
What is the expected Outcome at the given state
The actual Outcome for t
Goal:
State Value:
Action Value:
What is the expected Outcome at the given state
What is the expected Outcome at the given state taking an action
The actual Outcome for t
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Goal:
State Value:
Action Value:
Policy:
Goal:
State Value:
Action Value:
Policy:
optimal Policy:
Goal:
State Value:
Action Value:
Policy:
optimal Policy:
Goal:
State Value:
Action Value:
Policy:
optimal Policy:
Goal:
State Value:
Action Value:
Policy:
optimal Policy:
Goal:
State Value:
Action Value:
Policy:
optimal Policy:
Goal:
State Value:
Action Value:
Policy:
optimal Policy: