UBC Math Grad Seminar
Bernhard Konrad
14 April 2015
We live in very exciting times! We have datasets at our hands that we could not conceive of even a few years ago.
"Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate."
Uber: How many cab riders and drivers you will have and need at time X?
Facebook: Out of all the possible stories, which do you show on a user's timeline?
Airbnb: How is growth/update in city A different from city B, why, and what can do about it?
LinkedIn: Detect and stop fraud (non-normal behaviour).
Khan Academy: Millions of math problems are attempted
online per day. When is someone proficient?
Netflix: What new show should we make?
Enlitic: Let algorithms help doctors make more accurate diagnoses.
....
Regression (supervised)
Look for quantitative prediction, given input features. Training set with correct outcome.
Classic example: predict housing prices (eg. Opendoor)
Linear Regression (supervised)
from sklearn import linear_model
X = [[50, 3000],
[60, 2500],
[150, 4200]]
y = [20, 25, 17]
regr = linear_model.LinearRegression().fit(X, y)
regr.predict([70, 3200]) #-> 19.74Classification (supervised)
Make a yes-or-no-decision, based on features.
Training set with correct labels.
Classic example: hand-written-digit recognition
Logistic regression classification (supervised)
from sklearn import linear_model
X = [[50, 3000],
[60, 2500],
[150, 4200]]
y = [0, 1, 1]
regr = linear_model.LogisticRegression().fit(X, y)
print(regr.predict([70, 3200])) # -> 1Clustering (unsupervised)
Find similar data points (for exploration and plotting).
No labels available.
Classic example: Social network analysis
K-means clustering (unsupervised)
Randomly set K cluster centroids.
Repeat until convergence:
1. Classification:
Given the statement of a math problem, predict the corresponding topic.
Natural language processing (NLP) and data from Math Education Resources
2. Clustering:
Compress input image by using fewer colours.
K-means algorithm on pixel values of input image