Machine Learning

Frank Qiu

Chris Pang

Haider Shah

Stephanie Zhang

Intro

Algorithms

Business Value

Best Practices

The science of getting computers to act without being explicitly programmed.

-Stanford University CS 229 (Machine Learning) 

What is Machine Learning?

Ng, A, 2015

The Restaurant Case

John

Jack used to come a lot, why he hasn't showed up for a while?

Why Bill's bar is more popular?

Is Roy a slow server?

Vancouver's best selling beer

Average cost to run a restaurant

Transaction records related to Jack

......

Popular sauces

Popular restaurants

Query From Computer

Busiest hour of the week

Data Driven Decision

John

People like the happy hour in Bill's Bar so much that the keep going there even when it's not happy hour.

 

Roy is a great server, he sells more food, and that's why it seems like he is slow.

 

The Honey BBQ Rib is the best seller, and it has 10% chance of sold out during weekend. So get enough stock before weekend.

But,is that good enough?

Machine Can do it Again and Again.

 MachineLearning.py

 

 

import os

import scipy as sp

import matplotlib.pyplot as plt

 
print("I'm MachineLearning")
def ML(models, x, y):

 if models:

   if mx is None and x = 0

       for x, y in zip(models):

# print "Model:",model

# print "Coeffs:",model.coeffs

          plt.plot(model(x), c=color)

          plt.legend(["d=%i" % m.order for m in models], loc="upper left") Python Hi, Python.

Top Restaurants

 

Other Restaurants 

Factor 2

Factor 1

Machine Can do it Again and Again.

 MachineLearning.py

 

 

import os

import scipy as sp

import matplotlib.pyplot as plt

 
print("I'm MachineLearning")
def ML(models, x, y):

 if models:

   if mx is None and x = 0

       for x, y in zip(models):

# print "Model:",model

# print "Coeffs:",model.coeffs

          plt.plot(model(x), c=color)

          plt.legend(["d=%i" % m.order for m in models], loc="upper left") Python Hi, Python.

Top Restaurants

 

Other Restaurants 

Factor 4

Factor 3

Let the Machine Decide 

 MachineLearning.py



import os

import scipy as sp

import matplotlib.pyplot as plt

 
print("I'm MachineLearning")
def ML(models, x, y):

 if models:

   if mx is None and x = 0

       for x, y in zip(models):

# print "Model:",model

# print "Coeffs:",model.coeffs

          plt.plot(model(x), c=color)

          plt.legend(["d=%i" % m.order for m in models], loc="upper left") Python Hi, Python.

Factor 1

Factor 4

Factor 567

Factor 3294923

Factor 1412331232412

Factor 314235312431215

Factor 214321043214321431321

Factor 41890-234890231408923

Machine Is Better

KPI

View Span

John's View

Bill's View

Machine's View

Artificial Intelligence

Machine Learning

Data Mining

Machine Learning VS Data Mining

Machine learning is a science that involves development of self-learning algorithms. These algorithms are more generic in nature that it can be applied to various domain related problems

       

 

 

 

   Data mining is a practice of applying algorithms (mostly Machine learning algorithms) with the data available from domain to solve domain related problems

 

               

 

 

 

Machine Learning Algorithms

Supervised Learning

Unsupervised Learning

Reinforced Learning

Supervised Learning

Algorithms are trained using labelled examples

 

Historical data predicts likely future events

 

Example : Credit Card Transactions

 

 

Unsupervised Learning

No Historical Labels

 

Goal is to explore data and find some structure within

 

Example : Customer Segmentation

 

 

Reinforced Learning

Learn by trial and error

 

Agent, Environment, Action

 

Example: Robotics, Gaming, Navigation

 

 

Summary

 

Supervised Learning

Mapping Inputs to Outputs

Unsupervised Learning

Clustering Data

Reinforced Learning

Maximize Rewards

 

Machine Learning

is

everywhere

Customized

Recommendation

Ranking

webpages

Face

Recognition

Handwriting

Recognition

Machine

Learning

Big Data

Can't

Program

by Hand

Beyond

Human

Capability

 $48
billion
/year

Online Advertising

 $11.5
/year
billion

Fraud detection systems

Gene prediction for cancer

nature disaster prediction

self-driving cars 

...

ML Workflow: Best Practices

Historical Data

Model Building

Model Optimization

Model Evaluation

Answers

New Data

  1. Clearly understand the business problem
     
  2. Pair the best algorithm for the problem type
     
  3. Employ data discipline
     
  4. Consider scalability in production early
     
  5. Actively monitor performance metrics for model

Evans, J, 2014
SAS, Machine Learning, 2015
Wood, D, 2013

Available Software

Muenchen, R, 2015.

  1. Intuitive GUIs for building models and interactive data exploration and visualization
     
  2. Automated model evaluation to identify the best performers
     
  3. Easier model deployment so you can get repeatable, reliable results more quickly

Recent Software Improvements

SAS, Machine Learning, 2015.

Kaggle Competitions

Kaggle, All Competitions, 2015.

http://www.research.ibm.com/foiling-financial-fraud.shtml

http://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf

http://www.nytimes.com/2015/02/12/technology/personaltech/googles-time-at-the-top-may-be-nearing-its-end.html

http://www.quora.com/What-are-some-interesting-possible-applications-of-machine-learning

http://www.forbes.com/sites/85broads/2014/01/06/six-novel-machine-learning-applications/

 

http://www.wsj.com/articles/SB10001424052748703834604575365310813948080

Made with Slides.com