Welcome
Baruch College - Zicklin School of Business
6/9/17
Analytics Seminar
Prof. Arturo Castellanos
Information Systems and Statistics
arturo.castellanos@baruch.cuny.edu
First Hour
- Motivation for Analytics
- Artificial Intelligence
- Use cases
- Data Mining techniques
- Case Study: Santander Bank. Predicting Churn.
- Discussion: How can your organization leverage on analytics?
Second Hour
- Hands-on exercise:
- Download RapidMiner
- Titanic dataset
- Group exercise:
- Predicting patient readmission
- If time permits, we are going to visualized data in Tableau.
Agenda
Did you notice the ratio of man/machine?
Hidden Figures
Is technology taking away jobs?
Marketing
Marketing
A/B Testing
Marketing
Can we monetize on this concept?
Coat, $4,995 by Calvin Klein Collection
Where does it seem to be the focus of the algorithm?
There's a company in NYC doing this!
Coat, $4,995 by Calvin Klein Collection
As opposed to Pinterest, Clarifai’s ‘Apparel’ model recognizes more (specific) than 100 fashion-related concepts and terms spanning clothing and accessories — from “men’s boots” and “cardigan,” to “wide leg pants” and “romper.”
Don't
Step 1: The training data contains target class information for each record.
Step 2: New records are classified based on the models developed on the training data.
1. Supervised Learning (Estimation and Classification)
Data Mining Techniques:
2. Unsupervised Learning (Clustering)
The classification of the training data is unknown.
The aim is to construct a set of clusters, given the data.
CRISP-DM Data Mining Process
Case Study: Predicting customer churn
Part II
On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew.
Which are the salient factors of predicting survival?
Download RapidMiner
Steps
- Import Titanic dataset (download csv)
- Define variable roles (label, ids, etc.)
- Create process flow
- Analyze Results
- Evaluate and reiterate
Group project: Analyzing Patient Readmission
- The dataset represents 10 years (1999-2008) of clinical care data from 130 US hospitals.
- It includes over 50 features representing patient and hospital outcomes.
- Information was extracted from the database for encounters that satisfied the following criteria.
Business problem:
The task is to develop a classifier that is able to determine whether a patient is readmitted to the ICU
Link to Dataset - CSV file
To do (in groups of 3):
- What is the target variable?
- How would you transform it to a binary outcome (e.g., 0-no;1-yes)
- What is the most important factor in determining readmission?
- Which is the least important factor in determining readmission?
- What is the accuracy of your model?
The best time to start was yesterday
The second best time is NOW.
Thank you
I
By acast317
I
- 1,338