Skip to content

Levenshtein Distance

Electronart edited this page Aug 1, 2018 · 2 revisions

The Levenshtein distance between two words is the minimum number of character edits (insertion, deletion, substitution) required to change one word into the other.

By default, List Analyser lists all words where the Levenshtein Distance between each word in List A and List B is from 0 to 32. It should be left on this setting for measurements of Stemming Errors and Similarity.

If the Levenshtein Distance setting is 0 to 0, the words listed will be where the words in List A and List B are identical, conversely with a setting of 1 to 32* it will list all words that differ in List A and List B. The setting will affect the metric displayed for List A and B Mean Word Length, and Mean Characters Removed.

The number of words that differ and the Mean Characters Removed are a crude metric for stemmer strength, see Number of Words and Stems that Differ and Mean Characters Removed.

*The default maximum word length in a dtSearch Index is 32 (it can be increased via the API or a setting under Options>Preferences>Letter & Words in the dtSearch Desktop/Network end-user version).

Clone this wiki locally