public class Percentile extends AbstractUnivariateStatistic implements Serializable
There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:
n
be the length of the (sorted) array and
0 < p <= 100
be the desired percentile. n = 1
return the unique array element (regardless of
the value of p
); otherwise pos = p * (n + 1) / 100
and the difference, d
between pos
and floor(pos)
(i.e. the fractional
part of pos
).pos < 1
return the smallest element in the array.pos >= n
return the largest element in the array.lower
be the element in position
floor(pos)
in the array and let upper
be the
next element in the array. Return lower + d * (upper - lower)
To compute percentiles, the data must be at least partially ordered. Input
arrays are copied and recursively partitioned using an ordering definition.
The ordering used by Arrays.sort(double[])
is the one determined
by Double.compareTo(Double)
. This ordering makes
Double.NaN
larger than any other value (including
Double.POSITIVE_INFINITY
). Therefore, for example, the median
(50th percentile) of
{0, 1, 2, 3, 4, Double.NaN}
evaluates to 2.5.
Since percentile estimation usually involves interpolation between array
elements, arrays containing NaN
or infinite values will often
result in NaN
or infinite values returned.
Further, to include different estimation types such as R1, R2 as mentioned in Quantile page(wikipedia), a type specific NaN handling strategy is used to closely match with the typically observed results from popular tools like R(R1-R9), Excel(R7).
Since 2.2, Percentile uses only selection instead of complete sorting
and caches selection algorithm state between calls to the various
evaluate
methods. This greatly improves efficiency, both for a single
percentile and multiple percentile computations. To maximize performance when
multiple percentiles are computed based on the same data, users should set the
data array once using either one of the evaluate(double[], double)
or
setData(double[])
methods and thereafter evaluate(double)
with just the percentile provided.
Note that this implementation is not synchronized. If
multiple threads access an instance of this class concurrently, and at least
one of the threads invokes the increment()
or
clear()
method, it must be synchronized externally.
Modifier and Type | Class and Description |
---|---|
static class |
Percentile.EstimationType
An enum for various estimation strategies of a percentile referred in
wikipedia on quantile
with the names of enum matching those of types mentioned in
wikipedia.
|
Modifier | Constructor and Description |
---|---|
|
Percentile()
Constructs a Percentile with the following defaults.
|
|
Percentile(double quantile)
Constructs a Percentile with the specific quantile value and the following
default method type:
Percentile.EstimationType.LEGACY
default NaN strategy: NaNStrategy.REMOVED
a Kth Selector : KthSelector
|
protected |
Percentile(double quantile,
Percentile.EstimationType estimationType,
NaNStrategy nanStrategy,
KthSelector kthSelector)
Constructs a Percentile with the specific quantile value,
Percentile.EstimationType , NaNStrategy and KthSelector . |
|
Percentile(Percentile original)
Copy constructor, creates a new
Percentile identical
to the original |
Modifier and Type | Method and Description |
---|---|
Percentile |
copy()
Returns a copy of the statistic with the same internal state.
|
static void |
copy(Percentile source,
Percentile dest)
Deprecated.
as of 3.4 this method does not work anymore, as it fails to
copy internal states between instances configured with different
estimation type , NaN handling strategies
and kthSelector , it therefore always
throw MathUnsupportedOperationException |
double |
evaluate(double p)
Returns the result of evaluating the statistic over the stored data.
|
double |
evaluate(double[] values,
double p)
Returns an estimate of the
p th percentile of the values
in the values array. |
double |
evaluate(double[] values,
int start,
int length)
Returns an estimate of the
quantile th percentile of the
designated values in the values array. |
double |
evaluate(double[] values,
int begin,
int length,
double p)
Returns an estimate of the
p th percentile of the values
in the values array, starting with the element in (0-based)
position begin in the array and including length
values. |
Percentile.EstimationType |
getEstimationType()
Get the estimation
type used for computation. |
KthSelector |
getKthSelector()
Get the
kthSelector used for computation. |
NaNStrategy |
getNaNStrategy()
Get the
NaN Handling strategy used for computation. |
PivotingStrategyInterface |
getPivotingStrategy()
Get the
PivotingStrategyInterface used in KthSelector for computation. |
double |
getQuantile()
Returns the value of the quantile field (determines what percentile is
computed when evaluate() is called with no quantile argument).
|
protected double[] |
getWorkArray(double[] values,
int begin,
int length)
Get the work array to operate.
|
void |
setData(double[] values)
Set the data array.
|
void |
setData(double[] values,
int begin,
int length)
Set the data array.
|
void |
setQuantile(double p)
Sets the value of the quantile field (determines what percentile is
computed when evaluate() is called with no quantile argument).
|
Percentile |
withEstimationType(Percentile.EstimationType newEstimationType)
Build a new instance similar to the current one except for the
estimation type . |
Percentile |
withKthSelector(KthSelector newKthSelector)
Build a new instance similar to the current one except for the
kthSelector instance specifically set. |
Percentile |
withNaNStrategy(NaNStrategy newNaNStrategy)
Build a new instance similar to the current one except for the
NaN handling strategy. |
evaluate, evaluate, getData, getDataRef, test, test, test, test
public Percentile()
setQuantile(double)
Percentile.EstimationType.LEGACY
,
can be reset with withEstimationType(EstimationType)
NaNStrategy.REMOVED
,
can be reset with withNaNStrategy(NaNStrategy)
MedianOf3PivotingStrategy
,
can be reset with withKthSelector(KthSelector)
public Percentile(double quantile) throws MathIllegalArgumentException
Percentile.EstimationType.LEGACY
NaNStrategy.REMOVED
KthSelector
quantile
- the quantileMathIllegalArgumentException
- if p is not greater than 0 and less
than or equal to 100public Percentile(Percentile original) throws NullArgumentException
Percentile
identical
to the original
original
- the Percentile
instance to copyNullArgumentException
- if original is nullprotected Percentile(double quantile, Percentile.EstimationType estimationType, NaNStrategy nanStrategy, KthSelector kthSelector) throws MathIllegalArgumentException
Percentile.EstimationType
, NaNStrategy
and KthSelector
.quantile
- the quantile to be computedestimationType
- one of the percentile estimation types
nanStrategy
- one of NaNStrategy
to handle with NaNskthSelector
- a KthSelector
to use for pivoting during searchMathIllegalArgumentException
- if p is not within (0,100]NullArgumentException
- if type or NaNStrategy passed is nullpublic void setData(double[] values)
The stored value is a copy of the parameter array, not the array itself.
setData
in class AbstractUnivariateStatistic
values
- data array to store (may be null to remove stored data)AbstractUnivariateStatistic.evaluate()
public void setData(double[] values, int begin, int length) throws MathIllegalArgumentException
setData
in class AbstractUnivariateStatistic
values
- data array to storebegin
- the index of the first element to includelength
- the number of elements to includeMathIllegalArgumentException
- if values is null or the indices
are not validAbstractUnivariateStatistic.evaluate()
public double evaluate(double p) throws MathIllegalArgumentException
The stored array is the one which was set by previous calls to
setData(double[])
p
- the percentile value to computeMathIllegalArgumentException
- if p is not a valid quantile value
(p must be greater than 0 and less than or equal to 100)public double evaluate(double[] values, double p) throws MathIllegalArgumentException
p
th percentile of the values
in the values
array.
Calls to this method do not modify the internal quantile
state of this statistic.
Double.NaN
if values
has length
0
p
) values[0]
if values
has length 1
MathIllegalArgumentException
if values
is null or p is not a valid quantile value (p must be greater than 0
and less than or equal to 100)
See Percentile
for a description of the percentile estimation
algorithm used.
values
- input array of valuesp
- the percentile value to computeMathIllegalArgumentException
- if values
is null
or p is invalidpublic double evaluate(double[] values, int start, int length) throws MathIllegalArgumentException
quantile
th percentile of the
designated values in the values
array. The quantile
estimated is determined by the quantile
property.
Double.NaN
if length = 0
quantile
)
values[begin]
if length = 1
MathIllegalArgumentException
if values
is null, or start
or length
is invalid
See Percentile
for a description of the percentile estimation
algorithm used.
evaluate
in interface UnivariateStatistic
evaluate
in interface MathArrays.Function
evaluate
in class AbstractUnivariateStatistic
values
- the input arraystart
- index of the first array element to includelength
- the number of elements to includeMathIllegalArgumentException
- if the parameters are not validpublic double evaluate(double[] values, int begin, int length, double p) throws MathIllegalArgumentException
p
th percentile of the values
in the values
array, starting with the element in (0-based)
position begin
in the array and including length
values.
Calls to this method do not modify the internal quantile
state of this statistic.
Double.NaN
if length = 0
p
) values[begin]
if length = 1
MathIllegalArgumentException
if values
is null , begin
or length
is invalid, or
p
is not a valid quantile value (p must be greater than 0
and less than or equal to 100)
See Percentile
for a description of the percentile estimation
algorithm used.
values
- array of input valuesp
- the percentile to computebegin
- the first (0-based) element to include in the computationlength
- the number of array elements to includeMathIllegalArgumentException
- if the parameters are not valid or the
input array is nullpublic double getQuantile()
setQuantile(double)
public void setQuantile(double p) throws MathIllegalArgumentException
p
- a value between 0 < p <= 100MathIllegalArgumentException
- if p is not greater than 0 and less
than or equal to 100public Percentile copy()
copy
in interface UnivariateStatistic
copy
in class AbstractUnivariateStatistic
@Deprecated public static void copy(Percentile source, Percentile dest) throws MathUnsupportedOperationException
estimation type
, NaN handling strategies
and kthSelector
, it therefore always
throw MathUnsupportedOperationException
source
- Percentile to copydest
- Percentile to copy toMathUnsupportedOperationException
- always thrown since 3.4protected double[] getWorkArray(double[] values, int begin, int length)
storedData
if
it exists or else do a check on NaNs and copy a subset of the array
defined by begin and length parameters. The set nanStrategy
will
be used to either retain/remove/replace any NaNs present before returning
the resultant array.values
- the array of numbersbegin
- index to start reading the arraylength
- the length of array to be read from the begin indexMathIllegalArgumentException
- if values or indices are invalidpublic Percentile.EstimationType getEstimationType()
type
used for computation.estimationType
setpublic Percentile withEstimationType(Percentile.EstimationType newEstimationType)
estimation type
.
This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);
If any of the withXxx
method is omitted, the default value for
the corresponding customization parameter will be used.
newEstimationType
- estimation type for the new instanceNullArgumentException
- when newEstimationType is nullpublic NaNStrategy getNaNStrategy()
NaN Handling
strategy used for computation.NaN Handling
strategy set during constructionpublic Percentile withNaNStrategy(NaNStrategy newNaNStrategy)
NaN handling
strategy.
This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(kthSelector);
If any of the withXxx
method is omitted, the default value for
the corresponding customization parameter will be used.
newNaNStrategy
- NaN strategy for the new instanceNullArgumentException
- when newNaNStrategy is nullpublic KthSelector getKthSelector()
kthSelector
used for computation.kthSelector
setpublic PivotingStrategyInterface getPivotingStrategy()
PivotingStrategyInterface
used in KthSelector for computation.public Percentile withKthSelector(KthSelector newKthSelector)
kthSelector
instance specifically set.
This method is intended to be used as part of a fluent-type builder pattern. Building finely tune instances should be done as follows:
Percentile customized = new Percentile(quantile). withEstimationType(estimationType). withNaNStrategy(nanStrategy). withKthSelector(newKthSelector);
If any of the withXxx
method is omitted, the default value for
the corresponding customization parameter will be used.
newKthSelector
- KthSelector for the new instanceNullArgumentException
- when newKthSelector is nullCopyright © 2003–2016 The Apache Software Foundation. All rights reserved.