extrinsic evaluation
intrinsic evaluation
- perplexity - the probability of the test-set, normalized by the number of words

Extrinsic (in-vivo) Evaluation

extrinsic evaluation is the best for comparing 2 language models 𝐴 and 𝐵:

put each model in a task (spelling corrector, speech recognizer, etc)
get the accuracy of 𝐴 and 𝐵
- how many misspelled words were corrected properly
- how many words were translated correctly
compare the accuracy of 𝐴 and 𝐵

downside is time-consuming

Intrinsic Evaluation - Perplexity

minimizing perplexity ≈ maximizing probability

perplexity: