A gentle intro to Machine Learning

UBC Math Grad Seminar

Bernhard Konrad

14 April 2015

Why machine learning?

We live in very exciting times! We have datasets at our hands that we could not conceive of even a few years ago.

"Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate."

Why machine learning?

Uber: How many cab riders and drivers you will have and need at time X?
Facebook: Out of all the possible stories, which do you show on a user's timeline?
Airbnb: How is growth/update in city A different from city B, why, and what can do about it?
LinkedIn: Detect and stop fraud (non-normal behaviour).
Khan Academy: Millions of math problems are attempted
online per day. When is someone proficient?
Netflix: What new show should we make?
Enlitic: Let algorithms help doctors make more accurate diagnoses.
....

Why machine learning?

Storage is cheap
Computer time is cheap
Machine learning algorithms embarrassingly easy to use (libraries in R, Python, Spark, C++, ...)
ML mathematically appealing (intuition, at least)
Surprisingly powerful
Fun (and you sound like you're fancy)

Types of machine learning

Regression (supervised)

Look for quantitative prediction, given input features. Training set with correct outcome.

Linear regression
Random forest
Neural Networks
K nearest neighbors
...

Classic example: predict housing prices (eg. Opendoor)

Types of machine learning

Linear Regression (supervised)

y^{(i)} = \theta_1x_1^{(i)} + \theta_2x_2^{(i)} + \ldots + \theta_nx_n^{(i)} + \varepsilon

y ​ (i) ​ ​ = θ ​ 1 ​ ​ x ​ 1 ​ (i) ​ ​ + θ ​ 2 ​ ​ x ​ 2 ​ (i) ​ ​ + \dots + θ ​ n ​ ​ x ​ n ​ (i) ​ ​ + ε

from sklearn import linear_model
X = [[50, 3000],
     [60, 2500],
     [150, 4200]]
y = [20, 25, 17]
regr = linear_model.LinearRegression().fit(X, y)
regr.predict([70, 3200])  #-> 19.74

Types of machine learning

Classification (supervised)

Make a yes-or-no-decision, based on features.

Training set with correct labels.

Logistic regression
Support Vector Classifier
Neural Networks
Random Forest
K nearest neighbors
...

Classic example: hand-written-digit recognition

Types of machine learning

Logistic regression classification (supervised)

h_\theta(x) = \frac1{1 + e^{-\theta^T x}}

h ​ θ ​ ​ (x) = \frac{​ 1 ​ ​}{​ 1 + e ​ - θ ​ T ​ ​ x ​ ​ ​}

from sklearn import linear_model
X = [[50, 3000],
     [60, 2500],
     [150, 4200]]
y = [0, 1, 1]
regr = linear_model.LogisticRegression().fit(X, y)
print(regr.predict([70, 3200]))  # -> 1

Types of machine learning

Clustering (unsupervised)

Find similar data points (for exploration and plotting).

No labels available.

K means
Hierarchical clustering
DBSCAN
Expectation maximization
...

Classic example: Social network analysis

Types of machine learning

K-means clustering (unsupervised)

Randomly set K cluster centroids.

Repeat until convergence:

Assign each point to the closest centroid
Update centroids as mean of points in cluster

Two practical examples

1. Classification:

Given the statement of a math problem, predict the corresponding topic.

Natural language processing (NLP) and data from Math Education Resources

2. Clustering:

Compress input image by using fewer colours.

K-means algorithm on pixel values of input image

Other things to talk about

Recommender systems
Online learning
Reinforcement learning
Anomaly detection
Natural language processing
Big data
Deep learning (eg in image recognition)