A new paradigm in Artificial Intelligence
1. Monte Carlo Tree SearchΒ π²π³π
2. Neural Networks π§ πΈ
What are Decision Trees?
2
5
1
2
2
2
2
2
5
5
1
1
1
1
1
A
1. Very difficult to write, could sneak in bias
2. Not transferrable to other games
3. Only as good as the best humans
1. Our solution can just be "good enough"
3. Given an solution, we can score how good it is
2. We can produce random "solutions" easily
π§
π
π©
1. Our solution can just be "good enough"
2. We can produce random solutions easily
π§
3. Given an solution, we can score it
-10
5
3
0
Do this gazillion* times, take the highest scoring solutions
π§
π
π§
π©
5
3
8
π
π
β
x10
x10
x10
10 sims
1 win
10 sims
6 wins
10 sims
3 wins
10 sims
1 win
10 sims
6 wins
10 sims
3 wins
x10
x10
10 sims
4 wins
10 sims
6 wins
30 sims
16 wins
50 sims
20 wins
1. Best win ratio (greedy)
2. Low sim count (curious)
(Avg Sims - # of sims) * curiosity bias
+
(# of wins - Avg wins) * greediness bias
B
10 sims
1 win
10 sims
3 wins
10 sims
4 wins
10 sims
6 wins
30 sims
16 wins
50 sims
20 wins
Avg Wins = 8
Avg Sims = 20
-18
2
-2
5
8
6
C
How do we know when to actually play a move?
How many random games should we play each time?
How "curious" should it be? How "greedy"?
This doesn't seem to "remember" from game to game... How does it improve/learn?
(but outside the scope of this talk)
Curiosity + Greediness +Β
Experience
Learning How to Learn
Similar to...
Vs. ?
* Animation from 3Blue1Brown's Video
2
0.1
-1
0.4
0.8
-0.2
0.5
4
-1
-0.2
2.9
-0.1
-0.8
-0.13
[2, 4, -1] => 1
D
-0.9
0.1
-1
0.4
0.8
-0.2
0.5
.27
-.23
0.9
-.11
-0.1
-0.8
1.13
Target - Result = Blame
1 - (-0.13) = 1.13
E
Weight - % of Blame => New Weight
0.1 - (10% of -0.9) => 0.19
-0.9
0.19
-0.9
0.37
0.77
-0.17
0.52
.27
-.23
0.9
-.11
-0.08
-0.9
How do we encode our inputs?
How big should our network be?
How much should we tweak the weights?
How much training is enough?
(but outside the scope of this talk)
Bringing it together
π
...
1.
2.
3.
π
π
x10
x10
x10
10 sims
1 win
10 sims
6 wins
10 sims
3 wins
13%
75%
59%
Curiosity + Greediness + Experience = Score
F
Wrapping up
Learned about decision trees and how computers play games
Covered Monte Carlo Tree Search, a novel way using randomness to efficiently search a large decision tree
How neural networks categorize information, and how we can train them
How we can combine the two to explore a massive amount of strategies independent of the kind of game
A - Alpha-beta Pruning
B - UCT
C - Sub selection
D - Sigmoid curve
E - Back-propagation, gradient descent
F - Fourth attribute, policy network