training data - usually split into: training-set and validation-set
- training set
  - examples that the training algorithm observes
  - used to learn the parameters of the model
  - typically 80% of training data
- validation set
  - examples that the training algorithm does NOT see
  - used to estimate the generalization error of the trained model DURING training
  - used to learn the hyperparameters of the model
  - typically 20% of training data
test set/data
- examples coming from the same distribution as the training data
- used to estimate the generalization error of the trained model AFTER the training is completed
- examples are not used in any way to make choices about the model, including in the validation set
training error - performance of the model over the training set
validation error - performance of the model over the validation set
test error - performance of the model over the test set

We could monitor the performance of the model AS it learns from the training set by plotting: training errors & validation error

As the algorithm learns, the training error goes down and so does the validation error. If we train for too long, training error may continue to decrease because the model is overfitting and learning the irrelevant detail and noise in the training dataset. However, at the same time, the validation error starts to rise again as the model’s ability to generalize decreases. The sweet spot is the point just before the validation error starts to increase where the model has good skill on both the training dataset and the unseen validation dataset

A test set is held back from your machine learning algorithms until the very end of the training process. At this post-training stage, you can evaluate the performance of the model on the test set to get a final objective idea of how the models might perform on unseen data

Subpages

Training Set Size

as the number of training examples increases:

the expected test-error never increase
a non-parametric model will yield lower test-error until the best possible test-error is achieved (i.e. Bayes Error)
a parametric model (with less than optimal capacity) will asymptote to an error ABOVE Bayes Error
a parametric model (with optimal capacity) will yield lower test-error until the best possible test-error is achieved (i.e. Bayes Error)

／var／log marcus chiu

Explorer

ML - Training／Validation／Test Data／Set - Training／Validation／Test Error

Subpages

Training Set Size

／var／logmarcus chiu

Explorer

ML - Training／Validation／Test Data／Set - Training／Validation／Test Error

Subpages

Training Set Size

／var／log marcus chiu