Classification and Regression Tree (CART) - Regression Tree
- introduced by Leo Breiman to refer to Decision Tree algorithms that can be used for classification or regression predictive modeling problems
CART - Model Representation
CART Model is a binary tree
CART - Model Learning/Training
the model is built through a process known as binary recursive partitioning, which is an iterative process that splits the data into partitions or branches, and then continues splitting each partition into smaller groups as the method moves to each branch
this involves REPEATING the following processes:
- selecting an input variable among all input variables
- selecting a split/cut point on that variable
Both are selected by a greedy algorithm that minimizes the cost function (e.g. sum of squared residuals). It is repeated until a predefined stopping criterion is met (e.g. such a minimum number of training instances assigned to each leaf node of the tree)
CART - Model Tree Pruning
cost complexity pruning (similar to Adjusted R-Squared):
- 𝑇𝑟𝑒𝑒 𝑆𝑐𝑜𝑟𝑒 = 𝑆𝑆𝑅 + 𝛼·𝑇
where:
- 𝑆𝑆𝑅 - is the sum of squared residuals
- 𝛼 - is a hyperparameter (tuned by cross-validation)
- 𝑇 - is the total number of leaves/terminal-nodes