Package org.apache.commons.text.similarity

Provides algorithms for string similarity.

The algorithms that implement the EditDistance interface follow the same simple principle: the more similar (closer) strings are, lower is the distance. For example, the words house and hose are closer than house and trousers.

The following algorithms are available at the moment:

The Cosine Distance utilises a regular expression tokenizer (\w+). And the Levenshtein Distance's behavior can be changed to take into consideration a maximum throughput.