Data Analysis for Machine Learning Systems Design

Cristina Morariu

PhD, PMP, Mo2

AGENDA

Types of Machine Learning

Today's Environment

Data Analysis and Adjustments

Evaluating a Learning Algorithm

Useful Resources and Conclusions

Let me introduce myself...

 

PMP since 2009

PhD in Systems Engineering since 2013

Mother of Two

Passionate about AI

Proud founder of the SV AI Community

Today's Environment

“ML is a core, transformative way by which we’re rethinking how we’re doing everything”
Sundar Pichai, Google

“AI is the new electricity. Just as electricity transformed many industries roughly 100 years ago, AI will also now change nearly every major industry — healthcare, transportation, entertainment, manufacturing — enriching the lives of countless people.”
Andrew Ng, Standford

Machine Learning

Arthur Samuel (1959): Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.

 

 

Tom Mitchell (1998): Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Supervised Learning

Supervised learning is the machine learning task of inferring a function from labeled training data.

 

The training data consist of a set of training examples. Each example is a pair consisting of an input object (typically a vector) and a desired output value.

Supervised Learning

Regression

(Linear Regression)

Classification

(Logistic Regression)

Unsupervised Learning

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.

 

One of the widely used unsupervised machine learning algorithms is K-means, that splits the data in K clusters.

Unsupervised Learning

Input:

K - number of clusters

Training set

Random initialise K cluster centroids u1, u2, ... uk
Repeat {
    for i=1 to m
        c(i) := index(from 1 to K) of cluster centroid to x(i)
    for k=1 to K
        uk := average (mean) of points assigned to cluster k
}

Feature Scaling

 

Usage:

In gradient descent algorithms decreases the time to run (to find the local minimum)

in Principal Component Analysis must be done before running the algorithm

 

size = 0 - 2000 sqm    x1=size(sqm)/2000

rooms = 0 - 10            x2=number of rooms/10

 

 

Mean Normalization

The objective of mean normalisation is to get the mean close to 0.

 

x1= (x1-mean value)/range

range=max-min

 

-0.5<=x<=0.5

Features Selection

Improving the prediction performance

Faster and more cost effective predictors

Better understanding of the underlying process

 

Too many features

Independence of the features

Redundant features

Overfitting and Underfitting in Linear Regression

Overfitting and Underfitting in Logistic Regression

Corrective measures

Overfitting situation (High Variance)

- Get more training examples

- Try smaller set of features

- Try increasing Λ, the regularisation parameter

 

Underfitting situation (High Bias)

- Getting additional features

- Adding polynomial features

- Try decreasing Λ

 

Selecting & Evaluating Your Hypothesis

Training/ Test datasets (70/30)

1. Learn parameters using training data

2. Compute test error

 

Training/ Validation/ Test datasets (60/20/20)

1. Learn parameters using training data for different hypothesis

2. Choose the hypothesis with the lowest validation error

3. Test the chosen hypothesis using the test dataset

Conclusion

Before choosing an algorithm make sure:

- you have a clearly defined problem you want to resolve

- you understand your dataset

Useful materials

Thank you!

 

cristina.morariu@softvision.ro

morariu-cristina

Made with Slides.com