Machine Learning
James L. Weaver
Developer Advocate
Email: jweaver@pivotal.io
http://JavaFXpert.com
@JavaFXpert
Java Champion, JavaOne Rockstar, plays well with others, etc :-)
Developer Advocate & International Speaker for Pivotal
@JavaFXpert
Author of several Java/JavaFX/RaspPi books
From introductory video in Machine Learning course (Stanford University & Coursera) taught by Andrew Ng.
@JavaFXpert
@JavaFXpert
@JavaFXpert
@JavaFXpert
@JavaFXpert
@JavaFXpert
@JavaFXpert
(e.g. market segment discovery, and social network analysis)
@JavaFXpert
@JavaFXpert
@JavaFXpert
@JavaFXpert
(using the Iris flower data set)
@JavaFXpert
@JavaFXpert
(inputs)
(output)
@JavaFXpert
(aka Deep Belief Network when multiple hidden layers)
@JavaFXpert
@JavaFXpert
@JavaFXpert
Spring makes REST services and WebSockets easy as π
@JavaFXpert
@JavaFXpert
@JavaFXpert
forward propagation
@JavaFXpert
For each layer:
Multiply inputs by weights:
(1 x 8.54) + (0 x 8.55) = 8.54
Add bias:
8.54 + (-3.99) = 4.55
Use sigmoid activation function:
1 / (1 + e
-4.55
) = 0.99
@JavaFXpert
back propagation (minimize cost function)
@JavaFXpert
(Uses gradient descent to iteratively minimize the cost function)
@JavaFXpert
In iterationDone(), iteration: 0, score: 1.0726
In iterationDone(), iteration: 300, score: 0.2017
In iterationDone(), iteration: 600, score: 0.0482
In iterationDone(), iteration: 900, score: 0.0266
Examples labeled as 0 classified by model as 0: 9 times
Examples labeled as 1 classified by model as 1: 14 times
Examples labeled as 1 classified by model as 2: 3 times
Examples labeled as 2 classified by model as 2: 27 times
==========================Scores========================
Accuracy: 0.9434
Precision: 0.9667
Recall: 0.9412
F1 Score: 0.9538
@JavaFXpert
@JavaFXpert
@JavaFXpert
Let’s use 65% of the 8378 rows for training and 35% for testing
@JavaFXpert
@JavaFXpert
In this example, all features are continuous, and output is a one-hot vector
@JavaFXpert
Note that input layer neuron values are normalized
@JavaFXpert
Features are continuous values, output is continuous value
@JavaFXpert
@JavaFXpert
Input layer: 9 one-hot vectors (27 nodes)
Hidden layer: 54 sigmoid neurons
Output layer: One-hot vector (9 nodes)
Client developed in JavaFX with Gluon mobile
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
1
0
/player?gameBoard=XXOOXIIOI
&strategy=neuralNetwork
"gameBoard": "XXOOXXIOI",
{
}
...
Java/Spring REST microservice
@JavaFXpert
0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0
3, 0,1,0, 0,0,1, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0
3, 0,1,0, 1,0,0, 0,0,1, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0
1, 0,1,0, 1,0,0, 1,0,0, 0,0,1, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0
1, 0,1,0, 1,0,0, 1,0,0, 1,0,0, 0,0,1, 1,0,0, 1,0,0, 1,0,0, 1,0,0
2, 0,1,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 0,0,1, 1,0,0, 1,0,0, 1,0,0
1, 0,1,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 0,0,1, 1,0,0, 1,0,0
2, 0,1,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 0,0,1, 1,0,0
2, 0,1,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 1,0,0, 0,0,1
4, 0,1,0, 0,0,1, 1,0,0, 0,1,0, 1,0,0, 1,0,0, 0,0,1, 1,0,0, 1,0,0
...
Play cell
Game board cell states before play
@JavaFXpert
https://github.com/JavaFXpert/tic-tac-toe-minimax written in Java by @RoyVanRijn using guidance from the excellent Tic Tac Toe: Understanding The Minimax Algorithm article by @jasonrobertfox
@JavaFXpert
@JavaFXpert
Excellent article by Preetham V V on neural networks and choosing hyperparameters
@JavaFXpert
@JavaFXpert
@JavaFXpert
@JavaFXpert
Challenge: Given that there is only one state that gives a reward, how can the agent work out what actions will get it to the reward?
(AKA the credit assignment problem)
Goal of an episode is to maximize total reward
@JavaFXpert
From BasicBehavior example in https://github.com/jmacglashan/burlap_examples
@JavaFXpert
In this example, all actions are deterministic
@JavaFXpert
@JavaFXpert
Left | Right | Up | Down | |
---|---|---|---|---|
... | ||||
2, 7 | 2.65 | 4.05 | 0.00 | 3.20 |
2, 8 | 3.65 | 4.50 | 4.50 | 3.65 |
2, 9 | 4.05 | 5.00 | 5.00 | 4.05 |
2, 10 | 4.50 | 4.50 | 5.00 | 3.65 |
... |
Q-Learning table of expected values (cumulative discounted rewards) as a result of taking an action from a state and following an optimal policy. Here's an explanation of how calculations in a Q-Learning table are performed.
Actions
States
@JavaFXpert
@JavaFXpert
Low discount factors cause agent to prefer immediate rewards
@JavaFXpert
How often should the agent try new paths vs. greedily taking known paths?
@JavaFXpert
Learning to win from experience rather than by being trained
@JavaFXpert
@JavaFXpert
X
O
Our learning agent is the "X" player, receiving +5 for winning, -5 for losing, and -1 for each turn
The "O" player is part of the Environment. State and reward updates that it gives the Agent consider the "O" play.
@JavaFXpert
States | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|---|---|
O I X I O X X I O, O won | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
I I I I I I O I X, in prog | 1.24 | 1.54 | 2.13 | 3.14 | 2.23 | 3.32 | N/A | 1.45 | N/A |
I I O I I X O I X, in prog | 2.34 | 1.23 | N/A | 0.12 | 2.45 | N/A | N/A | 2.64 | N/A |
I I O O X X O I X, in prog | +4.0 | -6.0 | N/A | N/A | N/A | N/A | N/A | -6.0 | N/A |
X I O I I X O I X, X won | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
... |
Q-Learning table of expected values (cumulative discounted rewards) as a result of taking an action from a state and following an optimal policy
Actions (Possible cells to play)
Unoccupied cell represented with an I in the States column
@JavaFXpert
@JavaFXpert
Andrew Ng video:
https://www.coursera.org/learn/machine-learning/lecture/zcAuT/welcome-to-machine-learning
Iris flower dataset:
https://en.wikipedia.org/wiki/Iris_flower_data_set
Visual neural net server:
http://github.com/JavaFXpert/visual-neural-net-server
Visual neural net client:
http://github.com/JavaFXpert/ng2-spring-websocket-client
Deep Learning for Java: http://deeplearning4j.org
Spring initializr: http://start.spring.io
Kaggle datasets: http://kaggle.com
@JavaFXpert
Tic-tac-toe client: https://github.com/JavaFXpert/tic-tac-toe-client
Gluon Mobile: http://gluonhq.com/products/mobile/
Tic-tac-toe REST service: https://github.com/JavaFXpert/tictactoe-player
Java app that generates tic-tac-toe training dataset:
https://github.com/JavaFXpert/tic-tac-toe-minimax
Understanding The Minimax Algorithm article:
http://neverstopbuilding.com/minimax
Optimizing neural networks article:
https://medium.com/autonomous-agents/is-optimizing-your-ann-a-dark-art-79dda77d103
@JavaFXpert
BURLAP library: http://burlap.cs.brown.edu
BURLAP examples including BasicBehavior:
https://github.com/jmacglashan/burlap_examples
Markov Decision Process:
https://en.wikipedia.org/wiki/Markov_decision_process
Q-Learning table calculations: http://artint.info/html/ArtInt_265.html
Exploitation vs. exploration:
https://en.wikipedia.org/wiki/Multi-armed_bandit
Reinforcement Learning: An Introduction:
https://webdocs.cs.ualberta.ca/~sutton/book/bookdraft2016sep.pdf
Tic-tac-toe reinforcement learning app:
https://github.com/JavaFXpert/tic-tac-toe-rl
@JavaFXpert
Machine Learning
James L. Weaver
Developer Advocate
Email: jweaver@pivotal.io
http://JavaFXpert.com
@JavaFXpert