Ensemble, Bagging and Boosting

Ensembling is a method of combining more than one models to generate a final output.(Reference [1])

There are two ways of doing that:

Bagging
Boosting

Bagging	Boosting
We take subset of data and train different models Example Random forest It takes subset of data as well as subset of features Pros of random forest Handles high dimensions Handles missing values Cons of random forest It won’t give precise value regression because final value is mean from subset tress None the less people are using it for regression depending upon domain	We train different model on with same data. Each sample is assigned different weight in each iteration Example AdaBoost, XgBoost Pros of XgBoost Supports different loss function Works well with interactions Cons of XbBoost Prone to overfitting Tuning of hyper parameters is critical

Pros-cons of bagging vs boosting:

Bagging is easy to parallelize and hence training is faster
Boosting is more efficient for fixed no of iterations (classifiers)

AdaBoost vs XgBoost

Reference : [2]

ada_vs_xg

Quote from Tianqi Chen, one of the developers of XGBoost:

Adaboost and gradboosting [XGBoost] are two different ways to derive boosters. Both are generic. I like gradboosting better because it works for generic loss functions, while adaboost is derived mainly for classification with exponential loss.

Reference :

[1] https://towardsdatascience.com/decision-tree-ensembles-bagging-and-boosting-266a8ba60fd9

[2] https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781788295758/4/ch04lvl1sec34/comparison-between-adaboosting-versus-gradient-boosting

Data Stories

Ensemble, Bagging and Boosting

AdaBoost vs XgBoost

Leave a comment Cancel reply

AdaBoost vs XgBoost

Share this:

Related

Leave a comment Cancel reply