Machine LEarning




"A breakthrough in machine learning would be worth ten Microsofts" ~Bill Gates



Samarth Bansal
m.Paani | IIT Kanpur

Data Science


Everyone is talking about BIG Data! What is it?

3 V's : Volume, Velocity, Variety


Collection and Storage ==> Analysis ==> Visualization


HIRING!



What is 

Artificial Intelligence?

Artificial Intelligence


  • One of the most dynamic scientific subjects. 
  • Chess playing bot is no more an interesting problem!
  • AI Today - Machine Learning Oriented



Computer Vision, Natural Language Processing, Robotics, Information Retrieval


Computer Vision



COMPUTER VISION



Natural Language Processing


Making the computer understand natural language, and derive meaning out of text!


APPLICATIONS
Sentiment Analysis, Auto summary,  Language Translation

Machine Learning


Data. Data. Data. Its all about data!




Mathematical Background Required
Linear Algebra
Probability and Statistics
Multi-variable Calculus
Convex Optimization



SUPERVISED LEARNING

vs 

UNSUPERVISED LEARNING

Housing Price Prediction




Area Price
200 1000K
300 3000K
400 7000K
500 ?



Linear Regression


Draw a curve fitting this data, and can predict for any area!



We had 3 data points! Not a significant statistical measure. 
What about 100,000 data points?
That is why, DATA!


"Data is the new oil!"


ONLY AREA?

What about number of bedrooms, locality, theft history, age of house etc?

These parameters are called FEATURES, and the set of features is called a FEATURE VECTOR! 



Multiple Features. Large Data Sets. Better Predictions. 

JUST REGRESSION?

Nope!




Neural Networks!



Another class of problems 

Classification ProblemS



Spam/Non-Spam
Cancer Patient or Not?
Cat/Bike/Elephant/House?

CLASSIFICATION

Text Spam
You have a million dollars! Give your credit card... 1
? ~Bezos 0
Love calculator 1
Exam tomorrow! 0

We have huge datasets classifying  text as spam/non-spam. So given any new text, we can probablistically determine the category!

ALGORITHMS - CLASSIFICATION


Logistic Regression

Naive-Bayes Classifier

Support Vector Machines

Clustering


k-Means Clustering. 



Quick Check
1. Cartesian Plane
2. Euclidean Distance

UNSUPERVISED APPROACH


Date mclasses Voda Recharge Kirana Spent Redemptions Balance Help
4/3/14 3 100 2500 0 1700 4
.. .. .. .. .. .. ..
.. .. .. .. .. .. ..
.. .. .. .. .. .. ..

Mr.Bansal, we want to segment our customers. 
Okay, no issues. k-Means Clustering to the rescue!


RECOMMENDATION Engines


Netflix Million Dollar Challenge!


Recommendations make a great business case!

Amazon : Which books to show to user which he can buy?
Approach : Predict the rating user will give to that product.

Everywhere
Music, Movies, News, Search Queries, Online Dating, Twitter Followers, Facebook Friends

m.Paani 
 Reward recommendation. Greater Impact!




Collaborative Filtering
Method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating)

User-Based
Item-Based





THANK YOU!


Slides : slides.com/samarthbansal/machine-learning

Machine LEarning

By Samarth Bansal

Machine LEarning

  • 685