Data Analysis for Machine Learning Systems Design

Cristina Morariu

PhD, PMP, Mo2

AGENDA

Types of Machine Learning

Today's Environment

Evaluating a Learning Algorithm

Useful Resources and Conclusions

Let me introduce myself...

PMP since 2009

PhD in Systems Engineering since 2013

Mother of Two

Proud founder of the SV AI Community

Today's Environment

“ML is a core, transformative way by which we’re rethinking how we’re doing everything”

“AI is the new electricity. Just as electricity transformed many industries roughly 100 years ago, AI will also now change nearly every major industry — healthcare, transportation, entertainment, manufacturing — enriching the lives of countless people.”
Andrew Ng, Standford

Machine Learning

Arthur Samuel (1959): Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.

Tom Mitchell (1998): Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Supervised Learning

Supervised learning is the machine learning task of inferring a function from labeled training data.

The training data consist of a set of training examples. Each example is a pair consisting of an input object (typically a vector) and a desired output value.

Supervised Learning

Regression

(Linear Regression)

Classification

(Logistic Regression)

Unsupervised Learning

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.

One of the widely used unsupervised machine learning algorithms is K-means, that splits the data in K clusters.

Unsupervised Learning

Input:

K - number of clusters

Training set

Random initialise K cluster centroids u1, u2, ... uk
Repeat {
for i=1 to m
c(i) := index(from 1 to K) of cluster centroid to x(i)
for k=1 to K
uk := average (mean) of points assigned to cluster k
}

Feature Scaling

Usage:

In gradient descent algorithms decreases the time to run (to find the local minimum)

in Principal Component Analysis must be done before running the algorithm

size = 0 - 2000 sqm    x1=size(sqm)/2000

rooms = 0 - 10            x2=number of rooms/10

Mean Normalization

The objective of mean normalisation is to get the mean close to 0.

x1= (x1-mean value)/range

range=max-min

-0.5<=x<=0.5

Features Selection

Improving the prediction performance

Faster and more cost effective predictors

Better understanding of the underlying process

Too many features

Independence of the features

Redundant features

Overfitting and Underfitting in Linear Regression

Overfitting and Underfitting in Logistic Regression

Corrective measures

Overfitting situation (High Variance)

- Get more training examples

- Try smaller set of features

- Try increasing Λ, the regularisation parameter

Underfitting situation (High Bias)

- Try decreasing Λ

Training/ Test datasets (70/30)

1. Learn parameters using training data

2. Compute test error

Training/ Validation/ Test datasets (60/20/20)

1. Learn parameters using training data for different hypothesis

2. Choose the hypothesis with the lowest validation error

3. Test the chosen hypothesis using the test dataset

Conclusion

Before choosing an algorithm make sure:

- you have a clearly defined problem you want to resolve