Iterative Dichotomiser 3 (ID3)
- is an algorithm used to generate a decision tree from a dataset
- is the precursor to the C4.5 algorithm
- utilizes entropy and gain ideas from Information Theory
- is uses βgreedy searchβ
ID3 - Pseudocode #1
- Calculate the entropy/information-gain of every attribute π of the data set π
- Partition (βsplitβ) the set πΒ into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum.
- Make a decision tree node containing that attribute
- Recurse on subsets using the remaining attributes
ID3 - Pseudocode #2
function ID3(S, A) {
if (all of S are labeled 1) return leaf 1
if (all of S are labeled 0) return leaf 0
if (A = β
) return leaf with value = label majority in S
else:
j = argmax_{iβA} Gain(S,i)
T1 = ID3({(x,y) β S : x_j = 1}, A\{j})
T2 = ID3({(x,y) β S : x_j = 0}, A\{j})
return T.left(T2).right(T1);
}
ID3 - Implementations of Gain Measure
Train Error
Let:
The training error before splitting on feature π is shown below, since we took a majority vote among labels:
Similarly, the error after splitting on feature π is:
Therefore, we can define πΊπππ to be the difference between the two:
Information Gain
The information gain is the difference between the entropy of the label before and after the split, and is achieved by replacing the function πΆ in the previous expression by the entropy function:
Gini Index
ID3 - Pruning
GENERIC TREE PRUNING PROCEDURE
inputs:
function π(π,π) which is the bound/estimate for the generalization error of a decision tree π, based on a sample of size π
tree π
foreach node π in a bottom-up walk on π (from leaves to root):
find π' which minimies π(π',π), where π' is any of the following:
the current tree after replacing node π with a leaf 1
the current tree after replacing node π with a leaf 0
the current tree after replacing node π with its left subtree
the current tree after replacing node π with its right subtree
the current tree
let T := T'