tree | Data Stories

Figure below shows how decision tree creates rectangle in predictor space:

As we have describe classification tree, the only difference here is how to come up with output instead of label and how to define a metric other than entropy/gini.

For output it uses the mean of leaves.

For metric CART methodology uses simple SSE.

SSE = ∑ (y_i – y1)² + ∑(y_i – y2)²
- where y1 and y2 are means of two newly crated groups.
- It is simple sum of squares, not standard deviation
- While deriving R2 in linear recession we were concerned with sum of squares
- Ward linkage criterion in hierarchical clustering also uses difference in sum of squares
diff = SSE_before – SSE_after
- We choose a predictor which minimizes SSE the most.
- diff is highest

My hand-coded regression tree is available at : https://github.com/arcarchit/datastories/tree/master/regression_tree

Reference

Applied predictive modeling by Max Kuhn and Kjell Johnson

Data Stories

Tag tree

Regression Trees

Reference