Black Friday Sales Prediction Challenge Datahack
By:
Abhishek Sharma
Introduction
There is a company "ABC Private Limited" who wants to understand the purchasing behavior of customers using the existing sales data.
Problem Statement
To build a model to predict the purchase amount of customer against various products
Problem Statement
Data Description
Train Set 550k rows
Test Set 233k rows
Data Preprocessing
Null Values in Dataset per column
Replaced null values by zero
Feature Engineering
and
Modelling
- Label Encoded Gender, Age, City_Category
- About 3600+ different Product IDs
- One-hot-encoded top 1000 most frequent Product IDs
Model: Multiple Linear Regression
Rank: 1495
RMSE: 3240.59
Baseline Model
Time for more Feature Engineering and a new model
- Reduced number of one-hot encoded features to 20
Model: Random Forest Regressor
Rank: 773
RMSE: 2732.16
Much more Feature Engineering and another model
- Performed Target mean encoding and frequency encoding for Product_ID and User_ID
Model: XGBoost ( Default parameters)
Rank: 310
RMSE: 2496.3621
Time for Hyperparameter Tuning
(Used Bayesian Optimization)
XGBoost Hyperparameters:
- n_estimators
- max_depth
- min_child_weight
-
gamma
-
colsample_bytree
-
subsample
-
reg_alpha
Rank: 279
RMSE: 2487.4789
Again Feature Engineering and LightGBM !!
- Performed Target min, max encoding for Product_ID and User_ID
Model: LightGBM ( Default parameters)
Rank: 310
RMSE: 2574.68 (lower than XGBoost(optimized))
Optimized LightGBM
LightGBM Hyperparameters:
- n_estimators
- num_leaves
- max_depth
- learning_rate
-
reg_alpha
-
reg_lambda
Finally!!
Rank: 182
RMSE: 2468.2523
Feature Importance (Top 10)
What's next?
- Try to engineer more features
- Create ensembles
- Try Catboost
What I learned?
- Feature Engineering is King
- Target Encoding is key to jumping up in leaderboard
- Ensembles are necessary
- How much LightGBM is fast?
- Don't waste much time in tuning hyperparameters
My 1st Online Data Science Competition
Thank You
black-friday-datahack
By Abhishek Sharma
black-friday-datahack
- 11