Intro to Machine Learning

Agenda
- What is Machine Learning
- Machine Learning types of problems
- Intro to some basic ML algorithms
- Case study: apply ML techniques
What is ML
is the study of computer algorithms that improve automatically through experience and by the use of data
(source: https://en.wikipedia.org/wiki/Machine_learning)
Types of ML

Supervised learning
It is defined by its use of labeled datasets to train algorithms that to classify data or predict outcomes accurately
- Classification uses an algorithm to accurately assign test data into specific categories.
- Regression is used to understand the relationship between dependent and independent variables.
Unsupervised learning
Unsupervised learning is very much the opposite of supervised learning. It features no labels. Instead, our algorithm would be fed a lot of data and given the tools to understand the properties of the data
Clustering - Clustering is a data mining technique which groups unlabeled data based on their similarities or differences.
Dimensionality reduction - While more data generally yields more accurate results, it can also impact the performance of machine learning algorithms (e.g. overfitting) and it can also make it difficult to visualize datasets
Types of ML

Regression
Regression is used to understand the relationship between dependent and independent variables.
Examples: predict the outputs, forecasting the data, analyzing the time series, and finding the causal effect dependencies between the variables
Price house
Problem statement: We have a dataset with the price of some sold houses. For each house we also have some characteristics: number of rooms, size (m^2).
What can we do with this data?
Build a linear regression model to predict the price.
Price house
Problem statement: We have a dataset with the price of some sold houses. For each house we also have some characteristics: number of rooms, size (m^2).
What can we do with this data?
Build a linear regression model to predict the price.
If the price is a linear function of the number of rooms and size then we can build a model to predict price house. The model is simply a linear function:
Price(house) = a * number of rooms + b * size
Linear regression
Linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).
In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data.

Price house
What if the response variable is not a linear combination of the predictor variables (features) ?
Let's assume that the price is quadratic increasing with the number of rooms.
Price(house) = a * (number of rooms) ^ 2 + b * size
Can we still use linear regression?
Price house
What if the response variable is not a linear combination of the predictor variables (features) ?
Let's assume that the price is quadratic increasing with the number of rooms.
Price(house) = a * (number of rooms) ^ 2 + b * size
Can we still use linear regression?
Yes, instead of using the features (predictors): number of rooms and size we can use new features (predictors): (number of rooms) ^ 2 and size for the linear regression algorithm.
Working with ML
Hint 1: Feature engineering is an important part of ML

Working with ML
Hint 1: Feature engineering is an important part of ML
Success of a ML algorithm very often depends on what features you choose.
We can gather useful information to build the features from a "domain expert".
Features != raw data
Linear regression alg
Linear regression can be implemented using the Least-squares estimation algorithm.
Linear regression alg
Imagine you have some points, and want to have a line that best fits them like this:

Linear regression alg
Text
We have 2 vectors: one is static and the other one is variable. We have to find the best variable vector that minimize the distance between the 2 vectors.
Linear regression alg
Text
We have 2 vectors: one is static and the other one is variable. We have to find the best variable vector that minimize the distance between the vector.
The best fit in the least-squares sense minimizes the sum of squared residuals (a residual being: the difference between an observed value, and the fitted value provided by a model).
https://en.wikipedia.org/wiki/Least_squares
Linear regression alg
https://www.mathsisfun.com/data/least-squares-calculator.html
https://setosa.io/ev/ordinary-least-squares-regression/
Text
https://en.wikipedia.org/wiki/Euclidean_distance
https://www.mathsisfun.com/data/least-squares-regression.html
Linear regression alg
Text
Short Demo: https://www.codingame.com/playgrounds/3771/machine-learning-with-java---part-1-linear-regression
Working with ML
Text
Hint 2: In ML we usually try to minimize a loss function.
https://algs4.cs.princeton.edu/code/edu/princeton/cs/algs4/LinearRegression.java.html
Clustering
Text
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters)
The notion of a "cluster" cannot be precisely defined, which is one of the reasons why there are so many clustering algorithms

Clustering
Text
Homework :)
https://en.wikipedia.org/wiki/Cluster_analysis
https://en.wikipedia.org/wiki/K-means_clustering
https://en.wikipedia.org/wiki/DBSCAN
Other terminology in ML
Text
f(x) = a+b*x1 + c*x2
Features: x1, x2; need to be linear independent between each other; y needs to be linear dependent on them
Weights: a, b, c
Overfitting: learning too well a specific data set and failing to generalize
Train set = data set on which we train the model
Cross validation set = data set on which we test different algs/params
Test set = the final data set on which we evaluate the choosen alg
Working in ML
Text
Hint 3: ML is not about using a specific algorithm. It is more about modeling the problem as a the right type of ML problem e.g. for the exact same problem we can use different ML techniques and for each ML technique we can use different algorithms.
Anomaly detection
Text
Anomaly detection (aka outlier analysis) is a use case of ML.
Is the problem of finding the data points that deviate from a dataset's normal behavior.
It can be modelled using several ML techniques:
- Regression: you build a prediction model and you compare the predicted value with the real one.
- Clustering: you cluster the data points based on density and data points remaining outside the clusters (outliers) are anomalies.
- Classification: you need 2 balanced (same size) data-sets (one with normal data points and another with anomalies data points) and you train a classification model to distinguish between them.
- Statistically: you infer the probability distribution of the data set and any point with a low probability is an anomaly (example: use a normal distribution)
Case study
Text
Problem statement "Short term excessive risk":
- detect excessive betting risk/activity that Kambi takes in a short period of time
Case study
Text
Problem statement "Short term excessive risk":
- detect excessive betting risk/activity that Kambi takes in a short period of time.
Solution: apply an aggregation sliding time window. Check if the aggregated result is greater than a pre-defined threshold.

Case study
Text
Let's slightly change the problem statement :
- detect suspicious betting risk/activity that Kambi takes in a short period of time.
Case study
Text

Current solution problems:
We may have false positive "suspicious" detection
Case study - sol 1 (simple data transformation)
Text
Why feature engineering is important:


Volume derivate = Volume(t) - Volume(t-1)
Define a threshold for the derivative above which we have an anomaly.
Case study - sol 2 (linear regression on raw data points)
Text
- Divide the last hour into 6 equal time intervals: 0 - 10, 10-20...
- Aggregate the risk taken into each interval.
- Build a linear regression alg to predict the "aggregated risk" in the next time interval (next 10 minutes).
- Compare the predicted value with the real aggregated risk of the next 10 minutes.
- If the difference is too big then we have an anomaly.
So...we have an anomaly detection problem solved with linear regression!
Case study - sol 3 (linear regression on derivatives points)
Text
- Divide the last hour into 6 equal time intervals: 0 - 10, 10-20...
- Aggregate the risk taken into each interval.
- Calculate the derivative for each point (relatively to prev point).
- Build a linear regression alg to predict the "derivative risk" in the next time interval (next 10 minutes).
- Compare the predicted value with the real derivative risk of the next 10 minutes.
- If the difference is too big then we have an anomaly.
As opposed to the last version we will not have problems when the trend is changing (e.g. from ascending to descending)
Case study - sol 3 (linear regression on derivatives points)
Text


Case study - sol 4 (classification)
Text
Pre-requisites: for this solution we need historical data for both anomalies and normal cases, equally sized.
Offline (build model):
- Calculate the historical betting activity for a fixed time interval
- Label each data point as anomaly or normal
- Train a binary classification algorithm
Online (apply model):
- Take the current betting activity in that fixed time interval
- Predict its class using the trained model
Case study - sol 5 (statistically)
Text
Pre-requisites: for this solution we need historical data for normal cases.
Advantage vs sol 4: we don't need many historical anomaly cases.
Offline (build model):
- Infer from data the probability distribution (usually it is a Gaussian distribution) e.g calculate the mean and standard deviation.
Online (apply model):
- Take the current betting activity in that fixed time interval
- Apply the probability function to get the probability the current point to appear in the distribution
- If the probability is low => we have an anomaly
Summary
Text
- ML problem types: supervised, unsupervised, reinforcement
- A specific optimisation alg (minimize a function) can be used to different classes of ML problems (e.g: gradient descend is used for both linear regression and classification)
- Data transformation and feature engineering is very important
- We can model a problem with different ML techniques...The art is to do this right :) not just writing/choosing an algorithm
Intro to Machine Learning
By Bogdan Posa
Intro to Machine Learning
- 798