Black Friday Sales Prediction Challenge Datahack
By:
Abhishek Sharma
Introduction
There is a company "ABC Private Limited" who wants to understand the purchasing behavior of customers using the existing sales data.
Problem Statement
To build a model to predict the purchase amount of customer against various products
Problem Statement
Data Description
Train Set 550k rows
Test Set 233k rows
Data Preprocessing
Null Values in Dataset per column
Replaced null values by zero
Feature Engineering
and
Modelling
Model: Multiple Linear Regression
Rank: 1495
RMSE: 3240.59
Baseline Model
Time for more Feature Engineering and a new model
Model: Random Forest Regressor
Rank: 773
RMSE: 2732.16
Much more Feature Engineering and another model
Model: XGBoost ( Default parameters)
Rank: 310
RMSE: 2496.3621
Time for Hyperparameter Tuning
(Used Bayesian Optimization)
XGBoost Hyperparameters:
gamma
colsample_bytree
subsample
reg_alpha
Rank: 279
RMSE: 2487.4789
Again Feature Engineering and LightGBM !!
Model: LightGBM ( Default parameters)
Rank: 310
RMSE: 2574.68 (lower than XGBoost(optimized))
Optimized LightGBM
LightGBM Hyperparameters:
reg_alpha
reg_lambda
Finally!!
Rank: 182
RMSE: 2468.2523
Feature Importance (Top 10)
What's next?
What I learned?
My 1st Online Data Science Competition
Thank You