Data Analysis for Machine Learning Systems Design

Cristina Morariu

PhD, PMP, Mo2

AGENDA

Types of Machine Learning

Today's Environment

Data Analysis and Adjustments

Evaluating a Learning Algorithm

Useful Resources and Conclusions

Let me introduce myself...

PMP since 2009

PhD in Systems Engineering since 2013

Mother of Two

Passionate about AI

Proud founder of the SV AI Community

Today's Environment

“ML is a core, transformative way by which we’re rethinking how we’re doing everything”
Sundar Pichai, Google

“AI is the new electricity. Just as electricity transformed many industries roughly 100 years ago, AI will also now change nearly every major industry — healthcare, transportation, entertainment, manufacturing — enriching the lives of countless people.”
Andrew Ng, Standford

Machine Learning

Arthur Samuel (1959): Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.

Tom Mitchell (1998): Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Supervised Learning

Supervised learning is the machine learning task of inferring a function from labeled training data.

The training data consist of a set of training examples. Each example is a pair consisting of an input object (typically a vector) and a desired output value.

Supervised Learning

Regression

(Linear Regression)

Classification

(Logistic Regression)

Unsupervised Learning

Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.

One of the widely used unsupervised machine learning algorithms is K-means, that splits the data in K clusters.

Unsupervised Learning

Input:

K - number of clusters

Training set

Random initialise K cluster centroids u1, u2, ... uk
Repeat {
    for i=1 to m
        c(i) := index(from 1 to K) of cluster centroid to x(i)
    for k=1 to K
        uk := average (mean) of points assigned to cluster k
}

Feature Scaling

Usage:

In gradient descent algorithms decreases the time to run (to find the local minimum)

in Principal Component Analysis must be done before running the algorithm

size = 0 - 2000 sqm x1=size(sqm)/2000

rooms = 0 - 10 x2=number of rooms/10

Mean Normalization

The objective of mean normalisation is to get the mean close to 0.

x1= (x1-mean value)/range

range=max-min

-0.5<=x<=0.5

Features Selection