Black Friday Sales Prediction Challenge Datahack

By:

Abhishek Sharma

Introduction

There is a company "ABC Private Limited" who wants to understand the purchasing behavior of customers using the existing sales data.

Problem Statement

To build a model to predict the purchase amount of customer against various products

Problem Statement

Data Description

 Train Set 550k rows

 Test Set 233k rows

Data Preprocessing

Null Values in Dataset per column

Replaced null values by zero

Feature Engineering

and

Modelling

  • Label Encoded Gender, Age, City_Category
  • About 3600+ different Product IDs
  • One-hot-encoded top 1000 most frequent Product IDs

Model: Multiple Linear                     Regression

Rank: 1495

RMSE: 3240.59

 

Baseline Model

 

Time for more Feature Engineering and a new model

 
  • Reduced number of one-hot encoded features to 20

Model: Random Forest Regressor

Rank: 773

RMSE: 2732.16

 

Much more Feature Engineering and another model

 
  • Performed Target mean encoding and frequency encoding for Product_ID and User_ID

Model: XGBoost ( Default parameters)

Rank: 310

RMSE: 2496.3621

 

Time for Hyperparameter Tuning

 

(Used Bayesian Optimization)

XGBoost Hyperparameters:

 
  • n_estimators
  • max_depth
  • min_child_weight
  • gamma

  • colsample_bytree

  • subsample

  • reg_alpha

Rank: 279

RMSE: 2487.4789

 

Again Feature Engineering and LightGBM !!

 
  • Performed Target min, max encoding for Product_ID and User_ID

Model: LightGBM ( Default parameters)

Rank: 310

RMSE: 2574.68 (lower than XGBoost(optimized))

 

Optimized LightGBM

 

LightGBM Hyperparameters:

 
  • n_estimators
  • num_leaves
  • max_depth
  • learning_rate
  • reg_alpha

  • reg_lambda

Finally!!

 

Rank: 182

RMSE: 2468.2523

 

Feature Importance (Top 10)

 

What's next?

 
  • Try to engineer more features
  • Create ensembles
  • Try Catboost

What I learned?

 
  • Feature Engineering is King
  • Target Encoding is key to jumping up in leaderboard
  • Ensembles are necessary
  • How much LightGBM is fast?
  • Don't waste much time in tuning hyperparameters

My 1st Online Data Science Competition

Thank You

black-friday-datahack

By Abhishek Sharma

black-friday-datahack

  • 11