public class EmpiricalDistribution extends AbstractRealDistribution
Represents an empirical probability distribution  a probability distribution derived from observed data without making any assumptions about the functional form of the population distribution that the data come from.
An EmpiricalDistribution
maintains data structures, called
distribution digests, that describe empirical distributions and
support the following operations:
EmpiricalDistribution
to build grouped
frequency histograms representing the input data or to generate random values
"like" those in the input file  i.e., the values generated will follow the
distribution of the values in the file.
The implementation uses what amounts to the Variable Kernel Method with Gaussian smoothing:
Digesting the input file
binCount
"bins."EmpiricalDistribution implements the RealDistribution
interface
as follows. Given x within the range of values in the dataset, let B
be the bin containing x and let K be the withinbin kernel for B. Let P(B)
be the sum of the probabilities of the bins below B and let K(B) be the
mass of B under K (i.e., the integral of the kernel density over B). Then
set P(X < x) = P(B) + P(B) * K(x) / K(B) where K(x) is the kernel distribution
evaluated at x. This results in a cdf that matches the grouped frequency
distribution at the bin endpoints and interpolates within bins using
withinbin kernels.
binCount
is set by default to 1000. A good rule of thumb
is to set the bin count to approximately the length of the input file divided
by 10. RealDistribution.Sampler
Modifier and Type  Field and Description 

static int 
DEFAULT_BIN_COUNT
Default bin count

SOLVER_DEFAULT_ABSOLUTE_ACCURACY
Constructor and Description 

EmpiricalDistribution()
Creates a new EmpiricalDistribution with the default bin count.

EmpiricalDistribution(int binCount)
Creates a new EmpiricalDistribution with the specified bin count.

Modifier and Type  Method and Description 

RealDistribution.Sampler 
createSampler(org.apache.commons.rng.UniformRandomProvider rng)
Creates a sampler.

double 
cumulativeProbability(double x)
For a random variable
X whose values are distributed according
to this distribution, this method returns P(X <= x) . 
double 
density(double x)
Returns the probability density function (PDF) of this distribution
evaluated at the specified point
x . 
int 
getBinCount()
Returns the number of bins.

List<SummaryStatistics> 
getBinStats()
Returns a List of
SummaryStatistics instances containing
statistics describing the values in each of the bins. 
double[] 
getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals
of [0,1] used in generating data from the empirical distribution.

protected RealDistribution 
getKernel(SummaryStatistics bStats)
The withinbin smoothing kernel.

double 
getNumericalMean()
Use this method to get the numerical value of the mean of this
distribution.

double 
getNumericalVariance()
Use this method to get the numerical value of the variance of this
distribution.

StatisticalSummary 
getSampleStats()
Returns a
StatisticalSummary describing this distribution. 
double 
getSupportLowerBound()
Access the lower bound of the support.

double 
getSupportUpperBound()
Access the upper bound of the support.

double[] 
getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins.

double 
inverseCumulativeProbability(double p)
Computes the quantile function of this distribution.

boolean 
isLoaded()
Property indicating whether or not the distribution has been loaded.

boolean 
isSupportConnected()
Use this method to get information about whether the support is connected,
i.e.

void 
load(double[] in)
Computes the empirical distribution from the provided
array of numbers.

void 
load(File file)
Computes the empirical distribution from the input file.

void 
load(URL url)
Computes the empirical distribution using data read from a URL.

double 
probability(double x)
For a random variable
X whose values are distributed according
to this distribution, this method returns P(X = x) . 
getSolverAbsoluteAccuracy, logDensity, probability, sample
public static final int DEFAULT_BIN_COUNT
public EmpiricalDistribution()
public EmpiricalDistribution(int binCount)
binCount
 number of bins. Must be strictly positive.NotStrictlyPositiveException
 if binCount <= 0
.public void load(double[] in) throws NullArgumentException
in
 the input data arrayNullArgumentException
 if in is nullpublic void load(URL url) throws IOException, NullArgumentException, ZeroException
The input file must be an ASCII text file containing one valid numeric entry per line.
url
 url of the input fileIOException
 if an IO error occursNullArgumentException
 if url is nullZeroException
 if URL contains no datapublic void load(File file) throws IOException, NullArgumentException
The input file must be an ASCII text file containing one valid numeric entry per line.
file
 the input fileIOException
 if an IO error occursNullArgumentException
 if file is nullpublic StatisticalSummary getSampleStats()
StatisticalSummary
describing this distribution.
Preconditions:IllegalStateException
 if the distribution has not been loadedpublic int getBinCount()
public List<SummaryStatistics> getBinStats()
SummaryStatistics
instances containing
statistics describing the values in each of the bins. The list is
indexed on the bin number.public double[] getUpperBounds()
Returns a fresh copy of the array of upper bounds for the bins.
Bins are:
[min,upperBounds[0]],(upperBounds[0],upperBounds[1]],...,
(upperBounds[binCount2], upperBounds[binCount1] = max].
Note: In versions 1.02.0 of commonsmath, this method
incorrectly returned the array of probability generator upper
bounds now returned by getGeneratorUpperBounds()
.
public double[] getGeneratorUpperBounds()
Returns a fresh copy of the array of upper bounds of the subintervals of [0,1] used in generating data from the empirical distribution. Subintervals correspond to bins with lengths proportional to bin counts.
Preconditions:In versions 1.02.0 of commonsmath, this array was (incorrectly) returned
by getUpperBounds()
.
NullPointerException
 unless a load
method has been
called beforehand.public boolean isLoaded()
public double probability(double x)
X
whose values are distributed according
to this distribution, this method returns P(X = x)
. In other
words, this method represents the probability mass function (PMF)
for the distribution.probability
in interface RealDistribution
probability
in class AbstractRealDistribution
x
 the point at which the PMF is evaluatedpublic double density(double x)
x
. In general, the PDF is
the derivative of the CDF
.
If the derivative does not exist at x
, then an appropriate
replacement should be returned, e.g. Double.POSITIVE_INFINITY
,
Double.NaN
, or the limit inferior or limit superior of the
difference quotient.
Returns the kernel density normalized so that its integral over each bin equals the bin mass.
Algorithm description:
x
 the point at which the PDF is evaluatedx
public double cumulativeProbability(double x)
X
whose values are distributed according
to this distribution, this method returns P(X <= x)
. In other
words, this method represents the (cumulative) distribution function
(CDF) for this distribution.
Algorithm description:
x
 the point at which the CDF is evaluatedx
public double inverseCumulativeProbability(double p) throws OutOfRangeException
X
distributed according to this distribution, the
returned value is
inf{x in R  P(X<=x) >= p}
for 0 < p <= 1
,inf{x in R  P(X<=x) > 0}
for p = 0
.RealDistribution.getSupportLowerBound()
for p = 0
,RealDistribution.getSupportUpperBound()
for p = 1
.Algorithm description:
inverseCumulativeProbability
in interface RealDistribution
inverseCumulativeProbability
in class AbstractRealDistribution
p
 the cumulative probabilityp
quantile of this distribution
(largest 0quantile for p = 0
)OutOfRangeException
 if p < 0
or p > 1
public double getNumericalMean()
Double.NaN
if it is not definedpublic double getNumericalVariance()
Double.POSITIVE_INFINITY
as
for certain cases in TDistribution
) or Double.NaN
if it
is not definedpublic double getSupportLowerBound()
inverseCumulativeProbability(0)
. In other words, this
method must return
inf {x in R  P(X <= x) > 0}
.
Double.NEGATIVE_INFINITY
)public double getSupportUpperBound()
inverseCumulativeProbability(1)
. In other words, this
method must return
inf {x in R  P(X <= x) = 1}
.
Double.POSITIVE_INFINITY
)public boolean isSupportConnected()
public RealDistribution.Sampler createSampler(org.apache.commons.rng.UniformRandomProvider rng)
createSampler
in interface RealDistribution
createSampler
in class AbstractRealDistribution
rng
 Generator of uniformly distributed numbers.protected RealDistribution getKernel(SummaryStatistics bStats)
bStats
, unless the bin contains only one
observation, in which case a constant distribution is returned.bStats
 summary statistics for the binCopyright © 2003–2016 The Apache Software Foundation. All rights reserved.