- training data - usually split into: training-set and validation-set
- training set
- examples that the training algorithm observes
- used to learn the parameters of the model
- typically 80% of training data
- validation set
- examples that the training algorithm does NOT see
- used to estimate the generalization error of the trained model DURING training
- used to learn the hyperparameters of the model
- typically 20% of training data
- training set
- test set/data
- examples coming from the same distribution as the training data
- used to estimate the generalization error of the trained model AFTER the training is completed
- examples are not used in any way to make choices about the model, including in the validation set
- training error - performance of the model over the training set
- validation error - performance of the model over the validation set
- test error - performance of the model over the test set
We could monitor the performance of the model AS it learns from the training set by plotting: training errors & validation error
As the algorithm learns, the training error goes down and so does the validation error. If we train for too long, training error may continue to decrease because the model is overfitting and learning the irrelevant detail and noise in the training dataset. However, at the same time, the validation error starts to rise again as the model’s ability to generalize decreases. The sweet spot is the point just before the validation error starts to increase where the model has good skill on both the training dataset and the unseen validation dataset
A test set is held back from your machine learning algorithms until the very end of the training process. At this post-training stage, you can evaluate the performance of the model on the test set to get a final objective idea of how the models might perform on unseen data
Subpages
- Cross-Validation - K-Fold Cross-Validation
- Leave-One-Out Cross-Validation
- ML - Data Representation
- Validation Data Set
Training Set Size
as the number of training examples increases:
- the expected test-error never increase
- a non-parametric model will yield lower test-error until the best possible test-error is achieved (i.e. Bayes Error)
- a parametric model (with less than optimal capacity) will asymptote to an error ABOVE Bayes Error
- a parametric model (with optimal capacity) will yield lower test-error until the best possible test-error is achieved (i.e. Bayes Error)