Machine LEarning
"A breakthrough in machine learning would be worth ten Microsofts" ~Bill Gates
Samarth Bansal
m.Paani | IIT Kanpur
Data Science
Everyone is talking about BIG Data! What is it?
3 V's : Volume, Velocity, Variety
Collection and Storage ==> Analysis ==> Visualization
HIRING!
What is
Artificial Intelligence?
Artificial Intelligence
-
One of the most dynamic scientific subjects.
- Chess playing bot is no more an interesting problem!
- AI Today - Machine Learning Oriented
Computer Vision, Natural Language Processing, Robotics, Information Retrieval
Computer Vision
COMPUTER VISION
Natural Language Processing
Making the computer understand natural language, and derive meaning out of text!
APPLICATIONS
Sentiment Analysis, Auto summary, Language Translation
Machine Learning
Data. Data. Data. Its all about data!
Mathematical Background Required
Linear Algebra
Probability and Statistics
Multi-variable Calculus
Convex Optimization
SUPERVISED LEARNING
vs
UNSUPERVISED LEARNING
Housing Price Prediction
Area |
Price |
200 |
1000K |
300 |
3000K |
400 |
7000K |
500 |
? |
Linear Regression
Draw a curve fitting this data, and can predict for any area!
We had 3 data points! Not a significant statistical measure.
What about 100,000 data points?
That is why, DATA!
"Data is the new oil!"
ONLY AREA?
What about number of bedrooms, locality, theft history, age of house etc?
These parameters are called FEATURES, and the set of features is called a FEATURE VECTOR!
Multiple Features. Large Data Sets. Better Predictions.
JUST REGRESSION?
Nope!
Neural Networks!
Another class of problems
Classification ProblemS
Spam/Non-Spam
Cancer Patient or Not?
Cat/Bike/Elephant/House?
CLASSIFICATION
Text |
Spam |
You have a million dollars! Give your credit card... |
1 |
? ~Bezos |
0 |
Love calculator |
1 |
Exam tomorrow! |
0 |
We have huge datasets classifying text as spam/non-spam. So given any new text, we can probablistically determine the category!
ALGORITHMS - CLASSIFICATION
Logistic Regression
Naive-Bayes Classifier
Support Vector Machines
Clustering
k-Means Clustering.
Quick Check
1. Cartesian Plane
2. Euclidean Distance
UNSUPERVISED APPROACH
Date
|
mclasses
|
Voda Recharge
|
Kirana Spent
|
Redemptions
|
Balance
|
Help
|
4/3/14
|
3
|
100
|
2500
|
0
|
1700
|
4
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
..
|
Mr.Bansal, we want to segment our customers.
Okay, no issues. k-Means Clustering to the rescue!
RECOMMENDATION Engines
Netflix Million Dollar Challenge!
Recommendations make a great business case!
Amazon : Which books to show to user which he can buy?
Approach : Predict the rating user will give to that product.
Everywhere
Music, Movies, News, Search Queries, Online Dating, Twitter Followers, Facebook Friends
m.Paani
Reward recommendation. Greater Impact!
Collaborative Filtering
Method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating)
User-Based
Item-Based
THANK YOU!
Slides : slides.com/samarthbansal/machine-learning