Cross-Validation - K-Fold Cross-Validation
- used to tune hyperparameters
- used to estimate generalization/true error of a trained model when the given dataset is too small for a simple validation split to yield accurate estimation of generalization error, because the mean-of-a-loss on a small test set may have too high variance
Main Idea
- partition dataset into 𝑘 non-overlapping subsets
- on trial 𝑖, the 𝑖th subset is used as test/validation-set and the rest is used as training set
- the test-error is estimated by taking the average test error across the 𝑘 trials
problem is that there exist no unbiased estimators of the variance of such average error estimators (Bengio & Grandvalet 2004) but approximations are typically used
Pseudocode
KFoldXV(𝐷,𝐴,𝐿,𝑘):
split 𝐷 into 𝑘 mutually exclusive subsets 𝐷[𝑖] whose union is 𝐷
for 𝑖 = 1 to 𝑘:
𝑓[𝑖] = 𝐴(𝐷\𝐷[𝑖])
for 𝑧[𝑗] in 𝐷[𝑖]:
𝑒[𝑗] = 𝐿(𝑓[𝑖], 𝑧[𝑗])
return 𝑒
where:
- 𝐷 - is the given dataset with elements {𝑧[1], …, 𝑧[𝑛]}
- 𝐴 - the learning algorithm (a function that takes a dataset as input and outputs a learned function)
- 𝐿 - the loss function (a function that takes a learned function 𝑓 and an example 𝑧[𝑖]∊𝐷 as input and outputs a scalar∊ℝ)
- 𝑘 - the number of folds