• extrinsic evaluation
  • intrinsic evaluation
    • perplexity - the probability of the test-set, normalized by the number of words

Extrinsic (in-vivo) Evaluation

extrinsic evaluation is the best for comparing 2 language models 𝐴 and 𝐵:

  • put each model in a task (spelling corrector, speech recognizer, etc)
  • get the accuracy of 𝐴 and 𝐵
    • how many misspelled words were corrected properly
    • how many words were translated correctly
  • compare the accuracy of 𝐴 and 𝐵

downside is time-consuming

Intrinsic Evaluation - Perplexity

  • 𝐏𝐏(𝑊) = 𝐏(𝑤1, …, 𝑤𝑛)-(1/𝑛)
  • 𝐏𝐏(𝑊) = [𝛱1≤𝑖≤𝑛𝐏(𝑤𝑖|𝑤1, …, 𝑤𝑖-1)]-(1/𝑛)# chain rule
  • 𝐏𝐏(𝑊) = [𝛱1≤𝑖≤𝑛𝐏(𝑤𝑖|𝑤𝑖-1)]-(1/𝑛)# bi-grams

minimizing perplexity ≈ maximizing probability

perplexity:

  • bad approximation unless the test-data looks just like the training-data
  • is related to the weighted branching factor

Resources