ID3 - Pseudocode #1

Calculate the entropy/information-gain of every attribute 𝑎 of the data set 𝑆
Partition (“split”) the set 𝑆 into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum.
Make a decision tree node containing that attribute
Recurse on subsets using the remaining attributes

ID3 - Pseudocode #2

function ID3(S, A) {
	if (all of S are labeled 1) return leaf 1
	if (all of S are labeled 0) return leaf 0
	if (A = ∅) return leaf with value = label majority in S
	else:
		j = argmax_{i∊A} Gain(S,i)
		T1 = ID3({(x,y) ∊ S : x_j = 1}, A\{j})
		T2 = ID3({(x,y) ∊ S : x_j = 0}, A\{j})
		return T.left(T2).right(T1);
}

ID3 - Implementations of Gain Measure

Train Error

Let:

$C (a) = min (a, 1 - a)$

The training error before splitting on feature 𝑖 is shown below, since we took a majority vote among labels:

$C (P_{S} (y = 1))$

Similarly, the error after splitting on feature 𝑖 is:

$C (P_{S} (y = 1∣ x_{i} = 1)) P_{S} (x_{i} = 1) + C (P_{S} (y = 1∣ x_{i} = 0)) P_{S} (x_{i} = 0)$

Therefore, we can define 𝐺𝑎𝑖𝑛 to be the difference between the two:

$G ain (S, i) := C (P_{S} (y = 1)) - C (P_{S} (y = 1∣ x_{i} = 1)) P_{S} (x_{i} = 1) - C (P_{S} (y = 1∣ x_{i} = 0)) P_{S} (x_{i} = 0)$

Information Gain

The information gain is the difference between the entropy of the label before and after the split, and is achieved by replacing the function 𝐶 in the previous expression by the entropy function:

$C (a) = - a l o g (a) - (1 - a) l o g (1 - a)$

Gini Index

$C (a) = 2 a (1 - a)$

ID3 - Pruning

GENERIC TREE PRUNING PROCEDURE
inputs:
	function 𝑓(𝑇,𝑚) which is the bound/estimate for the generalization error of a decision tree 𝑇, based on a sample of size 𝑚
    tree 𝑇

foreach node 𝑗 in a bottom-up walk on 𝑇 (from leaves to root):
	find 𝑇' which minimies 𝑓(𝑇',𝑚), where 𝑇' is any of the following:
		the current tree after replacing node 𝑗 with a leaf 1
		the current tree after replacing node 𝑗 with a leaf 0
		the current tree after replacing node 𝑗 with its left subtree
		the current tree after replacing node 𝑗 with its right subtree
		the current tree
	let T := T'

／var／log marcus chiu

Explorer

Iterative Dichotomiser 3 (ID3)

Iterative Dichotomiser 3 (ID3)

ID3 - Pseudocode #1

ID3 - Pseudocode #2

ID3 - Implementations of Gain Measure

ID3 - Pruning

／var／logmarcus chiu

Explorer

Iterative Dichotomiser 3 (ID3)

Iterative Dichotomiser 3 (ID3)

ID3 - Pseudocode #1

ID3 - Pseudocode #2

ID3 - Implementations of Gain Measure

ID3 - Pruning

／var／log marcus chiu