public class JaroWinklerSimilarity extends Object implements SimilarityScore<Double>
The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.
This implementation is based on the Jaro Winkler similarity algorithm from http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.
This code has been adapted from Apache Commons Lang 3.3.
Constructor and Description |
---|
JaroWinklerSimilarity() |
Modifier and Type | Method and Description |
---|---|
Double |
apply(CharSequence left,
CharSequence right)
Computes the Jaro Winkler Similarity between two character sequences.
|
protected static int[] |
matches(CharSequence first,
CharSequence second)
This method returns the Jaro-Winkler string matches, half transpositions, prefix array.
|
public JaroWinklerSimilarity()
protected static int[] matches(CharSequence first, CharSequence second)
first
- the first string to be matchedsecond
- the second string to be matchedpublic Double apply(CharSequence left, CharSequence right)
sim.apply(null, null) = IllegalArgumentException sim.apply("foo", null) = IllegalArgumentException sim.apply(null, "foo") = IllegalArgumentException sim.apply("", "") = 1.0 sim.apply("foo", "foo") = 1.0 sim.apply("foo", "foo ") = 0.94 sim.apply("foo", "foo ") = 0.91 sim.apply("foo", " foo ") = 0.87 sim.apply("foo", " foo") = 0.51 sim.apply("", "a") = 0.0 sim.apply("aaapppp", "") = 0.0 sim.apply("frog", "fog") = 0.93 sim.apply("fly", "ant") = 0.0 sim.apply("elephant", "hippo") = 0.44 sim.apply("hippo", "elephant") = 0.44 sim.apply("hippo", "zzzzzzzz") = 0.0 sim.apply("hello", "hallo") = 0.88 sim.apply("ABC Corporation", "ABC Corp") = 0.91 sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88
apply
in interface SimilarityScore<Double>
left
- the first CharSequence, must not be nullright
- the second CharSequence, must not be nullIllegalArgumentException
- if either CharSequence input is null
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.