Grab the slides:
slides.com/cheukting_ho/legend-data-boosting-algrithms
Every Monday 5pm UK time
by Cheuk Ting Ho
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. -wikipedia
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers.
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
Faster training speed and higher efficiency.
Lower memory usage.
Better accuracy.
Support of parallel and GPU learning.
Capable of handling large-scale data.
XGBoost - pre-sorting splitting
LightGBM - Gradient-based One-Side Sampling (GOSS)
CatBoost is an algorithm for gradient boosting on decision trees.
CatBoost has the flexibility of giving indices of categorical columns so that it can be encoded as one-hot encoding or following the following formula:
Every Monday 5pm UK time
Get the notebooks: https://github.com/Cheukting/legend_data