Machine Learning Introduction

Introduction

Fundamentals
Introduction Supervised Learning
- Neural Nets with AND
- MNIST
Introduction Unsupervised Learning
- K-Means Clustering
Introduction Reinforcement Learning
- Q-Learning
Summary

Structure

Fundamentals

Fundamentals

What is ML?

Why do we need it?

What are the use cases?

Do I need a Ph.D to understand all of this?

Supervised Learning

Supervised Learning

Mammal

Not a mammal

Mammal?

Supervised Learning

Logical AND - Problem

x1	x2	∧
0	0	0
0	1	0
1	0	0
1	1	1

$\land = 0$

$\land = 1$

Supervised Learning

Logical AND - Problem

$h < 0$

$h >= 0$

h = 0 => x_2 = \frac{a + b \cdot x_1}{-c}

h = 0 => x_2 = \frac{a + b \cdot x_1}{-c}

h = a + b \cdot x_1 + c \cdot x_2

h = a + b \cdot x_1 + c \cdot x_2

h

How to learn such a function?

Supervised Learning

Perceptron

\sum

\sum

x_1

x_1

x_2

x_2

\cdot w_1

\cdot w_1

\cdot w_2

\cdot w_2

\sigma

\sigma

\sigma(\sum_{i=0} x_i \cdot w_i)

\sigma(\sum_{i=0} x_i \cdot w_i)

x = input

x = input

w = weights

w = weights

\sigma = activation\; function

\sigma = activation\; function

x_n

x_n

\cdot w_n

\cdot w_n

1

\cdot w_0

\cdot w_0

Supervised Learning

Perceptron for AND

\sum

\sum

x_1

x_1

x_2

x_2

\cdot\; w_1

\cdot\; w_1

\cdot \; w_2

\cdot \; w_2

w_0 + x_1\cdot w_1 + x_2 \cdot w_2

w_0 + x_1\cdot w_1 + x_2 \cdot w_2

1

\cdot \; w_0

\cdot \; w_0

< 0: 0 (false)

>= 0: 1 (true)

h = a + x_1\cdot b + x_2 \cdot c

h = a + x_1\cdot b + x_2 \cdot c

\sigma

\sigma

Supervised Learning

Perceptron for AND

-0.3 + 0 \cdot 0.3 + 1 \cdot 0.5 = 0.2

-0.3 + 0 \cdot 0.3 + 1 \cdot 0.5 = 0.2

0 = false

1 = true

Input:

x_1 = 0, \; x_2 = 1

x_1 = 0, \; x_2 = 1

Randomly choosen weights:

w_0 = -0.3, \; w_1 = 0.3, \; x_2 = 0.5

w_0 = -0.3, \; w_1 = 0.3, \; x_2 = 0.5

\sigma(0.2) = 1

\sigma(0.2) = 1

Supervised Learning

Cost Function

x_1	x_2	y	^y	C
0	0	0	0	0
0	1	0	1	-1
1	0	0	1	-1
1	1	1	1	0

\hat{y} = our\;result

\hat{y} = our\;result

y = desired\;result

y = desired\;result

cost = - \sum_{i\in\mathcal{M}} | y_i - \hat{y_i}|

cost = - \sum_{i\in\mathcal{M}} | y_i - \hat{y_i}|

-1

\mathcal{M} := misclassified\; patterns

\mathcal{M} := misclassified\; patterns

= -2

= -2

Supervised Learning

Backpropagation

\hat{y} = \sigma (\sum_{i = 0} x_i \cdot w_i)

\hat{y} = \sigma (\sum_{i = 0} x_i \cdot w_i)

cost = -\sum_{i\in\mathcal{M}} | y_i - \hat{y_i}|

cost = -\sum_{i\in\mathcal{M}} | y_i - \hat{y_i}|

\frac{\partial cost}{\partial w_i} \approx -\sum_{i \in \mathcal{M}} - x_i

\frac{\partial cost}{\partial w_i} \approx -\sum_{i \in \mathcal{M}} - x_i

-1

- \sum |y_i - \hat{y_i} |

- \sum |y_i - \hat{y_i} |

| \mathcal{M} |

| \mathcal{M} |

w_i \leftarrow w_i + \eta \cdot (\sum_{i\in\mathcal{M}} -x_i)

w_i \leftarrow w_i + \eta \cdot (\sum_{i\in\mathcal{M}} -x_i)

Update rule:

Supervised Learning

Backpropagation

x_0 = 1, x_1 = 0, x_2 = 1

x_0 = 1, x_1 = 0, x_2 = 1

w_i \leftarrow w_i + \eta \cdot (\sum_{i\in\mathcal{M}} -x_i)

w_i \leftarrow w_i + \eta \cdot (\sum_{i\in\mathcal{M}} -x_i)

Misclassified:

Update rule:

\eta = 0.1

\eta = 0.1

Learning rate:

Weights:

w_0 = -0.3, w_1 = 0.3, w_2 = 0.5

w_0 = -0.3, w_1 = 0.3, w_2 = 0.5

w_0 = -0.3 + 0.1 \cdot (-1 - 1) = -0.5

w_0 = -0.3 + 0.1 \cdot (-1 - 1) = -0.5

w_1 = \;\;\;0.3 + 0.1 \cdot (0 - 1)\;\;\: = \;\;0.2

w_1 = \;\;\;0.3 + 0.1 \cdot (0 - 1)\;\;\: = \;\;0.2

w_2 = \;\;\;0.5 + 0.1 \cdot (-1 - 0) = \;\;0.4

w_2 = \;\;\;0.5 + 0.1 \cdot (-1 - 0) = \;\;0.4

Supervised Learning

Test

\sigma(-0.5 + 0.2 \cdot 0 + 0.4 \cdot 0) = \sigma(-0.5) = 0 \;

\sigma(-0.5 + 0.2 \cdot 0 + 0.4 \cdot 0) = \sigma(-0.5) = 0 \;

\sigma(-0.5 + 0.2 \cdot 0 + 0.4 \cdot 1) = \sigma(-0.1) = 0 \;

\sigma(-0.5 + 0.2 \cdot 0 + 0.4 \cdot 1) = \sigma(-0.1) = 0 \;

\sigma(-0.5 + 0.2 \cdot 1 + 0.4 \cdot 0) = \sigma(-0.3) = 0 \;

\sigma(-0.5 + 0.2 \cdot 1 + 0.4 \cdot 0) = \sigma(-0.3) = 0 \;

\sigma(-0.5 + 0.2 \cdot 1 + 0.4 \cdot 1) = \sigma(0.1) = 1 \;

\sigma(-0.5 + 0.2 \cdot 1 + 0.4 \cdot 1) = \sigma(0.1) = 1 \;

x_1 = 0, x_2 = 1:

x_1 = 0, x_2 = 1:

x_1 = 0, x_2 = 0:

x_1 = 0, x_2 = 0:

x_1 = 1, x_2 = 0:

x_1 = 1, x_2 = 0:

x_1 = 1, x_2 = 1:

x_1 = 1, x_2 = 1:

Supervised Learning

Summary

Get labeled data (AND - Table)
Run the data and calculate the error
Use partial deriviative of cost function to create a learning rule
For every mislabled sample, apply learning rule
Hope that it's linear separable

XOR

Supervised Learning

Example - MNIST

?

13% #0

0% #1

5% #2

1% #3

67% #4

2% #5

2% #6

3% #7

3% #8

4% #9

Supervised Learning

Example - MNIST

?

13% #0

0% #1

5% #2

1% #3

67% #4

2% #5

2% #6

3% #7

3% #8

4% #9

Supervised Learning

Multiple Perceptrons

x_1

x_1

x_2

x_2

x =

x =

x_n

x_n

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

b/w pixel data

softmax

softmax

\sigma(\textbf{z})_j = \frac{e^{\textbf{z}_j}}{\sum^{K}_{k = 1} e^{\textbf{z}_k}}

\sigma(\textbf{z})_j = \frac{e^{\textbf{z}_j}}{\sum^{K}_{k = 1} e^{\textbf{z}_k}}

13% #0

0% #1

5% #2

1% #3

67% #4

2% #5

2% #6

3% #7

3% #8

4% #9

\textbf{z}

\textbf{z}

Supervised Learning

Linear Separable

:= 0,1,0

:= 0,1,0

:= 1,1,1

:= 1,1,1

:= 1,1,0

:= 1,1,0

:= 0,1,0

:= 0,1,0

:= 1,1,1

:= 1,1,1

:= 1,1,0

:= 1,1,0

:= 0,1,1

:= 0,1,1

:= 1,1,0

:= 1,1,0

0

1

Index

b/w

= 6

= 6

= 4

= 4

= 5

= 5

Supervised Learning

Multiple Layers of Perceptrons

x_1

x_1

x_2

x_2

x =

x =

x_n

x_n

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

b/w pixel data

softmax

softmax

\sigma(\textbf{z})_j = \frac{e^{\textbf{z}_j}}{\sum^{K}_{k = 1} e^{\textbf{z}_k}}

\sigma(\textbf{z})_j = \frac{e^{\textbf{z}_j}}{\sum^{K}_{k = 1} e^{\textbf{z}_k}}

13% #0

0% #1

5% #2

1% #3

67% #4

2% #5

2% #6

3% #7

3% #8

4% #9

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

\sum

Supervised Learning

Additional Information

Convolutional Networks
Recurrent Networks
LSTM Neurons

Supervised Learning

Neural Style Transfer

https://handong1587.github.io/deep_learning/2015/10/09/fun-with-deep-learning.html

Supervised Learning

Neural Photorealistic Style Transfer

https://github.com/luanfujun/deep-photo-styletransfer

Supervised Learning

Text to Speech

Normal text

Randomly generated text

Music

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Unsupervised Learning

Unsupervised Learning

Feature 1

Feature 2

Feature 1

Feature 2

Unknown structure

Known structure

Unsupervised Learning

K - Means Clustering

Feature 1

Feature 2

0

1

2

3

1

2

3

X = \{ (1,2), (1,1), (2,3), (3,3) \}

X = \{ (1,2), (1,1), (2,3), (3,3) \}

C_1 = \{ (1,1) \}

C_1 = \{ (1,1) \}

C_2 = \{ (3,3) \}

C_2 = \{ (3,3) \}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(x_1, C_1) = \sqrt{(1-2)^2 + (2-1)^2} = 1.41

euclid(x_1, C_1) = \sqrt{(1-2)^2 + (2-1)^2} = 1.41

euclid(x_1, C_2) = \sqrt{(1-3)^2 + (2-3)^2} = 2.23

euclid(x_1, C_2) = \sqrt{(1-3)^2 + (2-3)^2} = 2.23

Unsupervised Learning

K - Means Clustering

Feature 1

Feature 2

0

1

2

3

1

2

3

X = \{ (1,2), (1,1), (2,3), (3,3) \}

X = \{ (1,2), (1,1), (2,3), (3,3) \}

C_1 = \{ (1,1) \}

C_1 = \{ (1,1) \}

C_2 = \{ (3,3) \}

C_2 = \{ (3,3) \}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(x_1, C_1) = \sqrt{(1-2)^2 + (2-1)^2} = 1.41

euclid(x_1, C_1) = \sqrt{(1-2)^2 + (2-1)^2} = 1.41

euclid(x_1, C_2) = \sqrt{(1-3)^2 + (2-3)^2} = 2.23

euclid(x_1, C_2) = \sqrt{(1-3)^2 + (2-3)^2} = 2.23

C_1 = \{ (\frac{1 + 1}{2} \cdot , \frac{1 + 2}{2}) \} = \{(1, 1.5)\}

C_1 = \{ (\frac{1 + 1}{2} \cdot , \frac{1 + 2}{2}) \} = \{(1, 1.5)\}

Unsupervised Learning

K - Means Clustering

Feature 1

Feature 2

0

1

2

3

1

2

3

X = \{ (1,2), (1,1), (2,3), (3,3) \}

X = \{ (1,2), (1,1), (2,3), (3,3) \}

C_1 = \{ (1,1.5) \}

C_1 = \{ (1,1.5) \}

C_2 = \{ (3,3) \}

C_2 = \{ (3,3) \}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(x_3, C_1) = \sqrt{(2-1)^2 + (3-1)^2} = 2.23

euclid(x_3, C_1) = \sqrt{(2-1)^2 + (3-1)^2} = 2.23

euclid(x_3, C_2) = \sqrt{(2-3)^2 + (3-3)^2} = 1

euclid(x_3, C_2) = \sqrt{(2-3)^2 + (3-3)^2} = 1

C_2 = \{ (\frac{2 + 3}{2}, \frac{3 + 3}{2}) \} = \{(2.5, 3)\}

C_2 = \{ (\frac{2 + 3}{2}, \frac{3 + 3}{2}) \} = \{(2.5, 3)\}

Unsupervised Learning

K - Means Clustering

Feature 1

Feature 2

0

1

2

3

1

2

3

X = \{ (1,2), (1,1), (2,3), (3,3) \}

X = \{ (1,2), (1,1), (2,3), (3,3) \}

C_1 = \{ (1,1.5) \}

C_1 = \{ (1,1.5) \}

C_2 = \{ (2.5,3) \}

C_2 = \{ (2.5,3) \}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

euclid(\textbf{x}, \textbf{y}) = \sqrt{(\sum_{i = 0}^n (x_i - y_i)^2)}

\forall x \in X: euclid(x, C_1) < euclid(x,C_2): x \in C_1

\forall x \in X: euclid(x, C_1) < euclid(x,C_2): x \in C_1

\forall x \in X: euclid(x, C_1) > euclid(x,C_2): x \in C_2

\forall x \in X: euclid(x, C_1) > euclid(x,C_2): x \in C_2

\text{if updated, recompute centroid of } C_{1,2}

\text{if updated, recompute centroid of } C_{1,2}

Unsupervised Learning

K - Means Clustering - Caveats

Feature 1

Feature 2

0

1

2

3

1

2

3

Amount of clusters

Unsupervised Learning

K - Means Clustering - Caveats

Feature 1

Feature 2

0

1

2

3

1

2

3

Amount of clusters

Similarity measure

Unsupervised Learning

K - Means Clustering - Caveats

Feature 1

Feature 2

0

1

2

3

1

2

3

Amount of clusters

Similarity measure

No convergence

Unsupervised Learning

Additional Information

Principal Component Analysis
Support Vector Machines
Autoencoder

Unsupervised Learning

K-Means Clustering of 40K samples of homework

http://practicalquant.blogspot.de/2013/10/semi-automatic-method-for-grading-a-million-homework-assignments.html

Reinforcement Learning

Reinforcement Learning

Environment

Agent

Action

State

Reward

Reinforcement Learning

Formalizing RL

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

s_0

s_0

r_1

r_1

a_0

a_0

\rightarrow

\rightarrow

s_1

s_1

,

,

t \;\;= \text{timestep}

t \;\;= \text{timestep}

Reinforcement Learning

Formalizing RL

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

s_0

s_0

r_1

r_1

a_0

a_0

\rightarrow

\rightarrow

s_1

s_1

,

,

r_2

r_2

a_1

a_1

\rightarrow

\rightarrow

s_2

s_2

,

,

t \;\;= \text{timestep}

t \;\;= \text{timestep}

s_{n-1}

s_{n-1}

r_n

r_n

a_{n-1}

a_{n-1}

\rightarrow

\rightarrow

s_n

s_n

,

,

Reinforcement Learning

Reward

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

R = r_1 + r_2 + \ldots + r_n

R = r_1 + r_2 + \ldots + r_n

t \;\;= \text{timestep}

t \;\;= \text{timestep}

Reinforcement Learning

Timed Reward

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

R_t = r_t + r_{t+1} + \ldots + r_n

R_t = r_t + r_{t+1} + \ldots + r_n

t \;\;= \text{timestep}

t \;\;= \text{timestep}

Reinforcement Learning

Discount Rate

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

t \;\;= \text{timestep}

t \;\;= \text{timestep}

R_t = r_t +

R_t = r_t +

\gamma \;= \text{discount rate} \; [0,1]

\gamma \;= \text{discount rate} \; [0,1]

(r_{t+1} + \ldots + r_n)

(r_{t+1} + \ldots + r_n)

\gamma \cdot

\gamma \cdot

Reinforcement Learning

Discount Rate

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

t \;\;= \text{timestep}

t \;\;= \text{timestep}

R_t = r_t +

R_t = r_t +

\gamma \;= \text{discount rate} \; [0,1]

\gamma \;= \text{discount rate} \; [0,1]

r_{t+1} +

r_{t+1} +

\gamma \cdot

\gamma \cdot

r_{t+2} + \ldots +

r_{t+2} + \ldots +

r_n

r_n

\gamma^2 \cdot

\gamma^2 \cdot

\gamma^{n-t} \cdot

\gamma^{n-t} \cdot

Reinforcement Learning

Short-Sighted Reward

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

t \;\;= \text{timestep}

t \;\;= \text{timestep}

R_t = r_t +

R_t = r_t +

\gamma \;= 0

\gamma \;= 0

r_{t+1} +

r_{t+1} +

0 \cdot

0 \cdot

r_{t+2} + \ldots +

r_{t+2} + \ldots +

r_n

r_n

0 \cdot

0 \cdot

0 \;\cdot

0 \;\cdot

\rightarrow R_t = r_t

\rightarrow R_t = r_t

Reinforcement Learning

Balanced Rewards

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

t \;\;= \text{timestep}

t \;\;= \text{timestep}

R_t = r_t +

R_t = r_t +

\gamma \;= 0.9

\gamma \;= 0.9

r_{t+1} +

r_{t+1} +

0.9 \cdot

0.9 \cdot

r_{t+2} + \ldots +

r_{t+2} + \ldots +

r_n

r_n

0.81 \cdot

0.81 \cdot

(\gamma^{n-t} \ll 0.9) \cdot

(\gamma^{n-t} \ll 0.9) \cdot

Reinforcement Learning

Q(uality) - Learning

Environment

Agent

Action

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

State

Reward

t \;\;= \text{timestep}

t \;\;= \text{timestep}

R_{t+1}

R_{t+1}

\gamma \;= \text{discount rate}

\gamma \;= \text{discount rate}

Q(

s_t

s_t

,

a_t

a_t

) = max (

) = max (

Represents the quality of an action in the current state, while continuing to play optimally from that point on

)

)

Reinforcement Learning

Q(uality) - Learning

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

t \;\;= \text{timestep}

t \;\;= \text{timestep}

R_{t+1}

R_{t+1}

\gamma \;= \text{discount rate}

\gamma \;= \text{discount rate}

Q(

s_t

s_t

,

a_t

a_t

) = max (

) = max (

)

)

\pi \;= \text{policy}

\pi \;= \text{policy}

\pi

\pi

(

(

s_t

s_t

) = argmax_a[Q(

) = argmax_a[Q(

s_t

s_t

,

a

)]

)]

Problem: How to construct such a Q function?

Reinforcement Learning

Bellmann Equation

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

t \;\;= \text{timestep}

t \;\;= \text{timestep}

\gamma \;= \text{discount rate}

\gamma \;= \text{discount rate}

Q

(

(

s_t

s_t

) =

) =

s_{t+1}

s_{t+1}

,

a_{t+1}

a_{t+1}

)]

)]

,

a_t

a_t

r_{t+1}

r_{t+1}

+ \; \gamma \cdot max_{a_{t+1}}[Q(

+ \; \gamma \cdot max_{a_{t+1}}[Q(

Maximal reward is defined as immediate reward + maximum future reward for next state

Reinforcement Learning

Learning Q-Function

s_t = \text{current state}

s_t = \text{current state}

r_t = \text{reward}

r_t = \text{reward}

a_t = \text{action}

a_t = \text{action}

t \;\;= \text{timestep}

t \;\;= \text{timestep}

\gamma \;= \text{discount rate}

\gamma \;= \text{discount rate}

s_0

s_0

a_0

a_0

a_1

a_1

\ldots

\ldots

a_n

a_n

s_1

s_1

s_n

s_n

\vdots

\vdots

0

0

\ldots

\ldots

0

0

0

\ldots

\ldots

0

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\ddots

\ddots

0

0

\ldots

\ldots

0

s_0

s_0

\rightarrow

\rightarrow

Reinforcement Learning

Learning Q-Function

s_0

s_0

a_0

a_0

a_1

a_1

\ldots

\ldots

a_n

a_n

s_1

s_1

s_n

s_n

\vdots

\vdots

0.1

0.1

0

\ldots

\ldots

0

0

0

\ldots

\ldots

0

\vdots

\vdots

\vdots

\vdots

\vdots

\vdots

\ddots

\ddots

0

0

\ldots

\ldots

0

s_0

s_0

\rightarrow

\rightarrow

\pi

\pi

(

(

s_0

s_0

) = argmax_a[Q(

) = argmax_a[Q(

s_0

s_0

,

a

)]

)]

s_1

s_1

,

r_1 = 0.1

r_1 = 0.1

\leftarrow \text{execute}

\leftarrow \text{execute}

a_0

a_0

Q[

s_0

s_0

] =

] =

,

a_0

a_0

0.1

0.1

Q[

s_0

s_0

]

]

,

a_0

a_0

+ \; \alpha \cdot(

+ \; \alpha \cdot(

+ \; \gamma \cdot max_{a}[Q(

+ \; \gamma \cdot max_{a}[Q(

s_1

s_1

,

a

)]

)]

-\;Q(

-\;Q(

s_0

s_0

,

a_0

a_0

)])

)])

\text{restart with}

\text{restart with}

s_1

s_1

0

=

a_0

a_0

Reinforcement Learning

Learning Q-Function with NNs

s_0

s_0

\vdots

\vdots

s_0

s_0

\rightarrow

\rightarrow

\pi

\pi

(

(

s_0

s_0

) = \text{feedforward}

) = \text{feedforward}

s_0

s_0

s_1

s_1

,

r_1 = 0.1

r_1 = 0.1

\leftarrow \text{execute}

\leftarrow \text{execute}

a_0

a_0

0.1

0.1

+ \; \gamma \cdot max_{a}[Q(

+ \; \gamma \cdot max_{a}[Q(

s_1

s_1

,

a

)]

)]

\text{restart with}

\text{restart with}

s_1

s_1

Q(

\rightarrow

\rightarrow

\rightarrow

\rightarrow

s_0

s_0

,

a_0

a_0

)

)

Q(

\rightarrow

\rightarrow

s_0

s_0

,

a_1

a_1

)

)

Q(

\rightarrow

\rightarrow

s_0

s_0

,

a_n

a_n

)

)

\text{feedforward}

\text{feedforward}

s_1

s_1

\rightarrow

\rightarrow

max_{a}[Q(

max_{a}[Q(

s_1

s_1

,

a

)]

)]

\text{backprop for }

\text{backprop for }

s_0

s_0

\text{with}

\text{with}

a_0

a_0

,

Reinforcement Learning

Additional Information

Experience Replay

Exploration - Exploitation

\epsilon - \text{greedy}

\epsilon - \text{greedy}

Slides adapted from excellent tutorial

https://www.nervanasys.com/demystifying-deep-reinforcement-learning/

Reinforcement Learning

TORCS

https://yanpanlau.github.io/2016/10/11/Torcs-Keras.html

Reinforcement Learning

Buzzwords we learned today

Perceptron

Backpropagation

Neural Nets

Supervised Learning

MNIST Dataset

Linear Separable

Unsupervised Learning

Clustering

K-Means

Distance Measures

Convergence

RL Learning

Policy

Q-Learning

Discount Rate

TORCS

Reinforcement Learning

Imagesources

Cetacea - http://www.toggo.de/media/slider-wal-3-14295-10110.jpg
Orca - http://elelur.com/mammals/orca.html
Pinniped - http://www.interestingfunfacts.com/amazing-facts-about-pinniped.html
Deep Sea Frill Shark - http://images.nationalgeographic.com/wpf/media-live/photos/000/181/cache/deep-sea01-frill-shark_18161_600x450.jpg
Shark - http://www.livescience.com/55001-shark-attacks-increasing.html
Dolphin - http://weknownyourdreamz.com/dolphin.html