Deep Learning Book - Ch 5
Disclaimer
There is math.
There is theory.
There is pseudocode.
A lot of content in 60min
Interact with us!
Training set
Learning Algorithm
hypothesis
input
predicted output
Training set
Learning Algorithm
hypothesis
input
predicted output
What do we
mean by that?
Learning Algorithm
"A program learns from an experience E with respect to some class of tasks T and performance measure P, if P at T improves with E"
How a ML system should perform an example
Example - Collection of features (quantitative measured)
Quantitative performance measure of the algorithm
Factors
a) Accuracy
b) Error Rate
P is specific to a task T
Kind of experience during the learning process
Supervised Learning
Unsupervised Learning
Reinforcement Learning
We have an idea of the right answer for what we are asking. Example: Given a picture of a person, predict how old is he/she
We have no idea of the right answer for what we are asking. Example: Given a collection of items you don't know, try to group them by similarity.
Let the machine take control of the context and you provide input feedback.
Example: Reduce items in stock by creating dynamic promotions
Sell used cars. Find the best price to sell them (not considering people who collect old cars)
Do we have any idea for know relations?
older -> cheaper
unpopular brands -> cheaper
too many kms -> cheaper
Sell used cars. Find the best price to sell them (not considering people who collect old cars)
What kind of M.L. Algorithm would you use here? (E)
Supervised Learning
Chose one variable to analyse vs what you want to predict (example: year x price). Price is the variable you want to set a prediction. (T)
Come up with a training set to analyse these variables
input variable or features - x
output variable or target - y
Training set
Learning Algorithm
hypothesis
input
predicted output
m
x
h
y
Linear equation
h = ax + b
How do you choose a and b?
From the training set, we have expected values y for a certain x:
Come up with a hypothesis that gives you
the smallest error for all the training set:
h(x)
y
Your hypothesis
for an input x
The output of the training set
-
Measure the difference
h(x)
y
Your hypothesis
for an input x
The output of the training set
-
Measure the difference
for the entire training set (P)
Your hypothesis
for an input x
The output of the training set
-
Measure the difference
for the entire training set
(
)
Your hypothesis
for an input x
The output of the training set
-
We don't want to cancel positive and negative values
Average
Mean Square Erros (MSE)
-
Cost Function
J =
We want to minimize the difference
We can come up with different hypothesis (slopes for h function)
We can come up with different hypothesis (slopes for h function)
That's the difference
for a certain cost function
That's the difference
for another cost function
Minimum value
on h = ax +b,
we are varying a
J(a)
h = ax + b
J(a,b)
Minimize any Cost Function.
plotted in Jan/17
Minimize any Cost Function.
Min
We start with a guess
And 'walk' to the min value
Or:
Min
Towards the min value
We start with
a guess
Walk on
the
graph
Partial Derivatives
Partial derivatives
(min value)
We start with
a guess
Walk on
the
graph
Learning rate
(another guess)
Repeat until convergence
We are only analysing year vs price. We have more factors: model, how much the car was used before, etc
Training set
Learning Algorithm
hypothesis
input
predicted output
m
x
h
Consider multiple variables: a, b, c,... (or using greek letters)
Repeat until convergence
We can rewrite this as
Predicted
Output
Parameters
Vector
Input
Vector
(how did we get here?)
Does it look good?
The predicted output is not even fitting the training data!
Does it look good?
The model fits everything
Capacity: Ability to fit a wide variety of functions
Overfitting
Underfitting
It is difficult to know the true probability distribution that generates the data, so we want the lowest possible error rate (Bayes error)
The training set works fine; new predictions are terrible
The problem might be the cost function (comparing the predicted values with the training set)
-
J =
-
J =
Regularization param
It controls the tradeoff / lowers the variance
(another mechanism to control the capacity)
Select setting that the algorithms can't learn to be hyperparameters
Your Dataset
Training set
Learn the parameter θ; compute J(θ)
Test set
Compute the test set error (identify [under,over]fitting)
70%
30%
(cross-validation)
Error
Capacity
J(θ) test error
J(θ) train error
Error
Capacity
J(θ) test error
J(θ) train error
Bias problem (underfitting)
J(θ) train error is high
J(θ) train ≈ J(θ) test
J(θ) train error is low
J(θ) test >> J(θ) train
Variance problem (overfitting)
A: Maximum likelihood estimation or Bayesian statistics
Repeat until convergence
source: https://github.com/stedy/Machine-Learning-with-R-datasets/blob/master/usedcars.csv
data = load('used_cars.csv'); %year x price
y = data(:, 2);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1); %linear function
iterations = 1500;
alpha = 0.01;
m = length(y);
J = 0;
predictions = X*theta;
sqErrors = (predictions - y).^2;
J = 1/(2*m) * sum(sqErrors);
J_history = zeros(iterations, 1);
for iter = 1:iterations
x = X(:,2);
delta = theta(1) + (theta(2)*x);
tz = theta(1) - alpha * (1/m) * sum(delta-y);
t1 = theta(2) - alpha * (1/m) * sum((delta - y) .* x);
theta = [tz; t1];
J_history(iter) = computeCost(X, y, theta);
end
Initialise
Cost
Function
Gradient
Descent
Linear regression with C++
mlpack_linear_regression --training_file used_cars.csv
--test_file used_cars_test.csv -v
Questions?
gustavoergalves@gmail.com
hannelita@gmail.com
@hannelita