Step 6 of 8

Evaluation

Compute perplexity on the held-out test set and compare LM1 vs LM2.

You need a split dataset before evaluation. Go to split.

Lower perplexity = better next-token predictions on the held-out test split.

Model	Method	Smoothing	Perplexity	Status
LM1	Backoff	None	—	not run
LM2	Interpolation	add-k (k=0.1)	—	not run

Lower is better.

Run evaluation

Perplexity is computed in log space: exp(-1/N · Σ log P(wᵢ | h)) over all 4-gram windows in the test set.