Recurrent Neural Network

By INFOR 28th 李睦樂、洪啟勳、王冠人

What is

Machine Learning?

What is Machine Learning?

getting computer to learning
mimic human brain learn

How can Machine Learning do?

speech recognition
email anti-spam
handwriting recognition
Natural Language Process(NLP)
Computer Vision

Basic Machine Learning Categories

Supervised Learning
Unsupervised Learning

Supervised Learning

We give the algorithm a data set in which the "right answers" were given, which means in supervised learning the examples must be labeled.

Unsupervised Learning

In unsupervised learning the examples are not labeled. That is, you don't say anything. And the algorithm will cluster the data into different group.

Supervised vs Unsupervised Learning

Regression Analysis

Linear regression
- Least squares
- cost function
- minimize

Logistic regression
- Sigmoid function
- cost function
- minimize

Linear Regression

Linear Regression predicts a real-valued output based on an input value. For example, predict the housing price prediction

h_\theta(x)=\theta_0+\theta_1x

h ​ θ ​ ​ (x) = θ ​ 0 ​ ​ + θ ​ 1 ​ ​ x

h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n

h ​ θ ​ ​ (x) = θ ​ 0 ​ ​ + θ ​ 1 ​ ​ x ​ 1 ​ ​ + θ ​ 2 ​ ​ x ​ 2 ​ ​ + . . . + θ ​ n ​ ​ x ​ n ​ ​

Least Squares

Least Squares means that overall solution minimizes the sum of the square of the errors made in the result of every single equation.

\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2

\sum ​ i = 1 ​ m ​ ​ (h ​ θ ​ ​ (x ​ (i) ​ ​) - y ​ (i) ​ ​) ​ 2 ​ ​

Cost Function

A cost function (or loss function) that maps events or values of one or more variables onto a real number intuitively representing some "cost" associated the event. Optimize problem seeks to minimize a cost function.

J(\theta)=\frac{1}{m} \sum_{i=1}^m\frac{1}{2}(h_\theta(x^{(i)})-y^{(i)})^2

J (θ) = \frac{​ 1 ​ ​}{​ m ​} \sum ​ i = 1 ​ m ​ ​ \frac{​ 1 ​ ​}{​ 2 ​} (h ​ θ ​ ​ (x ​ (i) ​ ​) - y ​ (i) ​ ​) ​ 2 ​ ​

Logistic Regression

Logistic regression is a method for classifying data discrete outcomes. We would like our classifier to output value that between 0 and 1. So the hypothesis we use is "logistic function" (sigmoid function).

h_\theta(x)=\frac{1}{1+e^{-{\theta}^Tx}}

h ​ θ ​ ​ (x) = \frac{​ 1 ​ ​}{​ 1 + e ​ - θ ​ T ​ ​ x ​ ​ ​}

J(\theta)=-\frac{1}{m}[\sum_{i=1}^my^{(i)}\log h_\theta(x^{(i)}) +(1-y^{(i)})\log (1-h_\theta(x^{(i)}))]

J (θ) = - \frac{​ 1 ​ ​}{​ m ​} [\sum ​ i = 1 ​ m ​ ​ y ​ (i) ​ ​ lo g h ​ θ ​ ​ (x ​ (i) ​ ​) + (1 - y ​ (i) ​ ​) lo g (1 - h ​ θ ​ ​ (x ​ (i) ​ ​))]

Minimize--

Gradient Descent

We keep change parameters to reduce cost function. Until we end up at minimum. So we can find the best line to our data.

\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)\space\space\space(for\space j=1, 2, ..., n)

θ ​ j ​ ​ : = θ ​ j ​ ​ - α \frac{​ \partial ​ ​}{​ \partial θ ​ j ​ ​ ​} J (θ) (f o r j = 1, 2, . . ., n)

repeat until convergence {

}

Neural Network

Non-linear Hypothesis

When doing classification, sometimes we have to fit a more sophisticated hypothesis, and neural network provides a great idea of how to do that.

How Brains Work

The Model

Multi-layer
Neurons
Connected Architecture