Data Analysis for Machine Learning Systems Design
Cristina Morariu
PhD, PMP, Mo2
AGENDA
Types of Machine Learning
Today's Environment
Data Analysis and Adjustments
Evaluating a Learning Algorithm
Useful Resources and Conclusions
Let me introduce myself...
PMP since 2009
PhD in Systems Engineering since 2013
Mother of Two
Passionate about AI
Proud founder of the SV AI Community
Today's Environment
“ML is a core, transformative way by which we’re rethinking how we’re doing everything”
Sundar Pichai, Google
“AI is the new electricity. Just as electricity transformed many industries roughly 100 years ago, AI will also now change nearly every major industry — healthcare, transportation, entertainment, manufacturing — enriching the lives of countless people.”
Andrew Ng, Standford
Machine Learning
Arthur Samuel (1959): Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell (1998): Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Supervised Learning
Supervised learning is the machine learning task of inferring a function from labeled training data.
The training data consist of a set of training examples. Each example is a pair consisting of an input object (typically a vector) and a desired output value.
Supervised Learning
Regression
(Linear Regression)
Classification
(Logistic Regression)
Unsupervised Learning
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses.
One of the widely used unsupervised machine learning algorithms is K-means, that splits the data in K clusters.
Unsupervised Learning
Input:
K - number of clusters
Training set
Random initialise K cluster centroids u1, u2, ... uk
Repeat {
for i=1 to m
c(i) := index(from 1 to K) of cluster centroid to x(i)
for k=1 to K
uk := average (mean) of points assigned to cluster k
}
Feature Scaling
Usage:
In gradient descent algorithms decreases the time to run (to find the local minimum)
in Principal Component Analysis must be done before running the algorithm
size = 0 - 2000 sqm x1=size(sqm)/2000
rooms = 0 - 10 x2=number of rooms/10
Mean Normalization
The objective of mean normalisation is to get the mean close to 0.
x1= (x1-mean value)/range
range=max-min
-0.5<=x<=0.5
Features Selection
Improving the prediction performance
Faster and more cost effective predictors
Better understanding of the underlying process
Too many features
Independence of the features
Redundant features
Overfitting and Underfitting in Linear Regression
Overfitting and Underfitting in Logistic Regression
Corrective measures
Overfitting situation (High Variance)
- Get more training examples
- Try smaller set of features
- Try increasing Λ, the regularisation parameter
Underfitting situation (High Bias)
- Getting additional features
- Adding polynomial features
- Try decreasing Λ
Selecting & Evaluating Your Hypothesis
Training/ Test datasets (70/30)
1. Learn parameters using training data
2. Compute test error
Training/ Validation/ Test datasets (60/20/20)
1. Learn parameters using training data for different hypothesis
2. Choose the hypothesis with the lowest validation error
3. Test the chosen hypothesis using the test dataset
Conclusion
Before choosing an algorithm make sure:
- you have a clearly defined problem you want to resolve
- you understand your dataset
Useful materials
Thank you!
cristina.morariu@softvision.ro
morariu-cristina