Welcome

Baruch College - Zicklin School of Business

6/9/17

Analytics Seminar

Prof. Arturo Castellanos
Information Systems and Statistics

​arturo.castellanos@baruch.cuny.edu

First Hour

  • Motivation for Analytics
  • Artificial Intelligence
    • Use cases
  • Data Mining techniques
  • Case Study: Santander Bank. Predicting Churn.
  • Discussion: How can your organization leverage on analytics?

Second Hour

  • Hands-on exercise:
    • Download RapidMiner
    • Titanic dataset
  • Group exercise:
    • Predicting patient readmission
  • If time permits, we are going to visualized data in Tableau.

Agenda

Did you notice the ratio of man/machine?

Hidden Figures

Is technology taking away jobs?

Marketing

Marketing

A/B Testing

Marketing

Can we monetize on this concept?

Coat, $4,995 by Calvin Klein Collection

Where does it seem to be the focus of the algorithm?

There's a company in NYC doing this!

Coat, $4,995 by Calvin Klein Collection

As opposed to Pinterest, Clarifai’s ‘Apparel’ model recognizes more (specific) than 100 fashion-related concepts and terms spanning clothing and accessories — from “men’s boots” and “cardigan,” to “wide leg pants” and “romper.”

Don't

Step 1: The training data contains target class information for each record.

Step 2: New records are classified based on the models developed on the training data.

1. Supervised Learning (Estimation and Classification)

Data Mining Techniques: 

2. Unsupervised Learning (Clustering)

The classification of the training data is unknown.

The aim is to construct a set of clusters, given the data.

CRISP-DM Data Mining Process

Case Study: Predicting customer churn

Part II

On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew.

Which are the salient factors of predicting survival?

Download RapidMiner

Steps

  • Import Titanic dataset (download csv)
    • Define variable roles (label, ids, etc.)
  • Create process flow
  • Analyze Results
  • Evaluate and reiterate

Group project: Analyzing Patient Readmission

  • The dataset represents 10 years (1999-2008) of clinical care data from 130 US hospitals.
  • It includes over 50 features representing patient and hospital outcomes.
  • Information was extracted from the database for encounters that satisfied the following criteria.

Business problem:

The task is to develop a classifier that is able to determine whether a patient is readmitted to the ICU

To do (in groups of 3):

  • What is the target variable?
  • How would you transform it to a binary outcome (e.g., 0-no;1-yes)
  • What is the most important factor in determining readmission?
  • Which is the least important factor in determining readmission?
  • What is the accuracy of your model?

The best time to start was yesterday
The second best time is NOW.

Thank you

I

By acast317

I

  • 1,338