public class LongestCommonSubsequence extends Object implements SimilarityScore<Integer>
The Longest common subsequence algorithm returns the length of the longest subsequence that two strings have in
common. Two strings that are entirely different, return a value of 0, and two strings that return a value
of the commonly shared length implies that the strings are completely the same in value and position.
Note. Generally this algorithm is fairly inefficient, as for length m, n of the input
CharSequence
's left
and right
respectively, the runtime of the
algorithm is O(m*n).
As of version 1.10, a more space-efficient of the algorithm is implemented. The new algorithm has linear space complexity instead of quadratic. However, time complexity is still quadratic in the size of input strings.
The implementation is based on Hirschberg's Longest Commons Substring algorithm (cited below).
For further reading see:
Constructor and Description |
---|
LongestCommonSubsequence() |
Modifier and Type | Method and Description |
---|---|
Integer |
apply(CharSequence left,
CharSequence right)
Calculates the longest common subsequence similarity score of two
CharSequence 's passed as
input. |
CharSequence |
logestCommonSubsequence(CharSequence left,
CharSequence right)
Deprecated.
Deprecated as of 1.2 due to a typo in the method name.
Use
longestCommonSubsequence(CharSequence, CharSequence) instead.
This method will be removed in 2.0. |
CharSequence |
longestCommonSubsequence(CharSequence left,
CharSequence right)
Computes the longest common subsequence between the two
CharSequence 's passed as
input. |
int[][] |
longestCommonSubstringLengthArray(CharSequence left,
CharSequence right)
Deprecated.
Deprecated as of 1.10. A more efficient implementation for calculating LCS is now available.
Use
longestCommonSubsequence(CharSequence, CharSequence) instead to directly calculate the LCS.
This method will be removed in 2.0. |
public LongestCommonSubsequence()
public Integer apply(CharSequence left, CharSequence right)
CharSequence
's passed as
input.
This method implements a more efficient version of LCS algorithm which has quadratic time and linear space complexity.
This method is based on newly implemented algorithmB(CharSequence, CharSequence)
.
An evaluation using JMH revealed that this method is almost two times faster than its previous version.
apply
in interface SimilarityScore<Integer>
left
- first character sequenceright
- second character sequenceleft
and right
IllegalArgumentException
- if either String input null
@Deprecated public CharSequence logestCommonSubsequence(CharSequence left, CharSequence right)
longestCommonSubsequence(CharSequence, CharSequence)
instead.
This method will be removed in 2.0.CharSequence
's passed as input.
Note, a substring and subsequence are not necessarily the same thing. Indeed, abcxyzqrs
and
xyzghfm
have both the same common substring and subsequence, namely xyz
. However,
axbyczqrs
and abcxyzqtv
have the longest common subsequence xyzq
because a
subsequence need not have adjacent characters.
For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
public CharSequence longestCommonSubsequence(CharSequence left, CharSequence right)
CharSequence
's passed as
input.
This method implements a more efficient version of LCS algorithm which although has quadratic time, it has linear space complexity.
Note, a substring and subsequence are not necessarily the same thing. Indeed, abcxyzqrs
and
xyzghfm
have both the same common substring and subsequence, namely xyz
. However,
axbyczqrs
and abcxyzqtv
have the longest common subsequence xyzq
because a
subsequence need not have adjacent characters.
For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
@Deprecated public int[][] longestCommonSubstringLengthArray(CharSequence left, CharSequence right)
longestCommonSubsequence(CharSequence, CharSequence)
instead to directly calculate the LCS.
This method will be removed in 2.0.left
- first character sequenceright
- second character sequenceCopyright © 2014–2022 The Apache Software Foundation. All rights reserved.