Step 6 of 8

Evaluation

Compute perplexity on the held-out test set and compare LM1 vs LM2.

You need a split dataset before evaluation. Go to split.

Test perplexity

Lower perplexity = better next-token predictions on the held-out test split.

ModelMethodSmoothingPerplexityStatus
LM1BackoffNonenot run
LM2Interpolationadd-k (k=0.1)not run

Comparison

Lower is better.

Run evaluation
Perplexity is computed in log space: exp(-1/N · Σ log P(wᵢ | h)) over all 4-gram windows in the test set.