AdaBoost

AdaBoost stands for adaptive boosting
Boosting is a technique for creating better classifier form several regular/weak/base classifiers
Thought:
- We want to train second classifier in such a way that training data has more samples for the case where first classifier was wrong
- Third classifier will have more samples where first and second classifier contradicts
How do we accomplish above:
- We assign weights to each sample
  - This weights will be used in error function for each sample
  - Error function would be sum of weights of incorrectly classified samples
  - This weights keep changing in each iteration
- We also assign wight to each base (weak) classifier (α In below algorithm)
  - Formula for calculating α below is valid for binary classification
    - It will be different for multi class classification
When do we stop iteration ?
- Combined classifier is predicting all sample correctly
- No more base classifier left
Most machine learning algorithms allows to apply weight to data
- In logistic regression we can multiply weight in error fucntion
- In decision tree, classification error can be calculated by summing weights of missclassified samples

adboost

Formulations in above algorithm:

Formula of α
- if ε=0.5 then α=0. It is just a random chance classifier.
- Else it can positive/negative.
- ε < 0.5 => negative α => we invert the prediction in weighted average
Formula of D_(t+1)(i)
- y*h is just a sign
  - y and h are {-1, 1}
  - y*h will be positive for correct prediction
- Why it is function of ε
  - Random classifier updating weights does not make sense
By Z_t we are normalizing weights, which ensures numerical stability

Adaboost Theorem

Over-fitting

Contrary to weak classifier (decision stumps) we saw that rate of increase test error is less with boosting
However no of iteration still should be selected via cross validation for small data-set and validation for large data-set

References:

Data Stories