PV226 ML: Boosting

Content of this session

Where to use gradient boosting

Catboost

Tree algorithms

So what is gradient boosting?

Gradient boosting involves three elements:

  • A loss function to be optimized.
  • A weak learner to make predictions.
  • An additive model to add weak learners to minimize the loss function.

When to use it?

primarily on heterogeneous data

for both classification and regression

that means it is usualy your first choice

Commonly used algorithms

Random Forest

XGBM

LGBM

CatBoost

Catboost

Why to use it?

  • easy to use - like AutoML
  • great default settings
  • means good results fast
  • no need for much data preparation
  • blazing fast

Options

  • Regression
  • Multiregression
  • Classification
  • Multiclassification
  • Ranking

Installation

pip install catboost shap ipywidgets sklearn
jupyter nbextension enable --py widgetsnbextension

Usage

from catboost import CatBoostClassifier, Pool

train_data = Pool(data=[[1, 4, 5, 6],
                        [4, 5, 6, 7],
                        [30, 40, 50, 60]],
                  label=[1, 1, -1],
                  weight=[0.1, 0.2, 0.3])

model = CatBoostClassifier(iterations=10)

model.fit(train_data)
preds_class = model.predict(train_data)

Any questions?

PV226: Boosting

By Lukáš Grolig

PV226: Boosting

  • 376