Step 2 of 8

Preprocessing

Normalize the corpus, add sentence boundaries, and shrink the vocabulary by replacing rare words with <UNK>.

No corpus loaded yet. Pick one first.

Settings

Tweak and re-run.

Vocabulary cap1000
Click Apply preprocessingto see the tokenized output.