PV226 ML: Boosting

Content of this session

Where to use gradient boosting

Catboost

Tree algorithms

So what is gradient boosting?

Gradient boosting involves three elements:

A loss function to be optimized.
A weak learner to make predictions.
An additive model to add weak learners to minimize the loss function.

When to use it?

primarily on heterogeneous data

for both classification and regression

that means it is usualy your first choice

Commonly used algorithms

Random Forest

XGBM

LGBM

CatBoost

Catboost

Why to use it?

easy to use - like AutoML
great default settings
means good results fast
no need for much data preparation
blazing fast

Options

Regression
Multiregression
Classification
Multiclassification
Ranking

Installation

pip install catboost shap ipywidgets sklearn
jupyter nbextension enable --py widgetsnbextension

Usage

from catboost import CatBoostClassifier, Pool

train_data = Pool(data=[[1, 4, 5, 6],
                        [4, 5, 6, 7],
                        [30, 40, 50, 60]],
                  label=[1, 1, -1],
                  weight=[0.1, 0.2, 0.3])

model = CatBoostClassifier(iterations=10)

model.fit(train_data)
preds_class = model.predict(train_data)

PV226 ML: Boosting

Content of this session

So what is gradient boosting?

When to use it?

Commonly used algorithms

Random Forest

XGBM

LGBM

CatBoost

Catboost

Why to use it?

Options

Installation

Usage

Any questions?

PV226: Boosting

PV226: Boosting

Lukáš Grolig

PV226 ML: Boosting

Content of this session

So what is gradient boosting?

When to use it?

Commonly used algorithms

Random Forest

XGBM

LGBM

CatBoost

Catboost

Why to use it?

Options

Installation

Usage

Any questions?

PV226: Boosting

More from Lukáš Grolig