GTest (Apache Commons Math 3.6.1 API)

java.lang.Object
- org.apache.commons.math3.stat.inference.GTest

```
public class GTest
extends Object
```
Implements G Test statistics.
This is known in statistical genetics as the McDonald-Kreitman test. The implementation handles both known and unknown distributions.

Two samples tests can be used when the distribution is unknown a priori but provided by one sample, or when the hypothesis under test is that the two samples come from the same underlying distribution.

Since:

3.1

Constructor Summary

Constructors
Constructor and Description

GTest()

Constructors
Constructor and Description
`GTest()`

Method Summary

Methods
Modifier and Type	Method and Description
`double`	`g(double[] expected, long[] observed)` Computes the G statistic for Goodness of Fit comparing `observed` and `expected` frequency counts.
`double`	`gDataSetsComparison(long[] observed1, long[] observed2)` Computes a G (Log-Likelihood Ratio) two sample test statistic for independence comparing frequency counts in `observed1` and `observed2`.
`double`	`gTest(double[] expected, long[] observed)` Returns the observed significance level, or p-value, associated with a G-Test for goodness of fit comparing the `observed` frequency counts to those in the `expected` array.
`boolean`	`gTest(double[] expected, long[] observed, double alpha)` Performs a G-Test (Log-Likelihood Ratio Test) for goodness of fit evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance level `alpha`.
`double`	`gTestDataSetsComparison(long[] observed1, long[] observed2)` Returns the observed significance level, or p-value, associated with a G-Value (Log-Likelihood Ratio) for two sample test comparing bin frequency counts in `observed1` and `observed2`.
`boolean`	`gTestDataSetsComparison(long[] observed1, long[] observed2, double alpha)` Performs a G-Test (Log-Likelihood Ratio Test) comparing two binned data sets.
`double`	`gTestIntrinsic(double[] expected, long[] observed)` Returns the intrinsic (Hardy-Weinberg proportions) p-Value, as described in p64-69 of McDonald, J.H.
`double`	`rootLogLikelihoodRatio(long k11, long k12, long k21, long k22)` Calculates the root log-likelihood ratio for 2 state Datasets.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - GTest
```
public GTest()
```
- Method Detail
  - g
```
public double g(double[] expected,
       long[] observed)
         throws NotPositiveException,
                NotStrictlyPositiveException,
                DimensionMismatchException
```
    Computes the G statistic for Goodness of Fit comparing observed and expected frequency counts.
    This statistic can be used to perform a G test (Log-Likelihood Ratio Test) evaluating the null hypothesis that the observed counts follow the expected distribution.
    
    Preconditions:
    - Expected counts must all be positive.
    - Observed counts must all be ≥ 0.
    - The observed and expected arrays must have the same length and their common length must be at least 2.
    If any of the preconditions are not met, a MathIllegalArgumentException is thrown.
    
    Note:This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.
    Parameters:
    observed - array of observed frequency counts
    expected - array of expected frequency counts
    
    Returns:
    G-Test statistic
    
    Throws:
    
    NotPositiveException - if observed has negative entries
    
    NotStrictlyPositiveException - if expected has entries that are not strictly positive
    
    DimensionMismatchException - if the array lengths do not match or are less than 2.
  - gTest
```
public double gTest(double[] expected,
           long[] observed)
             throws NotPositiveException,
                    NotStrictlyPositiveException,
                    DimensionMismatchException,
                    MaxCountExceededException
```
    Returns the observed significance level, or p-value, associated with a G-Test for goodness of fit comparing the observed frequency counts to those in the expected array.
    The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts.
    
    The probability returned is the tail probability beyond g(expected, observed) in the ChiSquare distribution with degrees of freedom one less than the common length of expected and observed.
    
    Preconditions:
    - Expected counts must all be positive.
    - Observed counts must all be ≥ 0.
    - The observed and expected arrays must have the same length and their common length must be at least 2.
    If any of the preconditions are not met, a MathIllegalArgumentException is thrown.
    
    Note:This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.
    Parameters:
    observed - array of observed frequency counts
    expected - array of expected frequency counts
    
    Returns:
    p-value
    
    Throws:
    
    NotPositiveException - if observed has negative entries
    
    NotStrictlyPositiveException - if expected has entries that are not strictly positive
    
    DimensionMismatchException - if the array lengths do not match or are less than 2.
    
    MaxCountExceededException - if an error occurs computing the p-value.
  - gTestIntrinsic
```
public double gTestIntrinsic(double[] expected,
                    long[] observed)
                      throws NotPositiveException,
                             NotStrictlyPositiveException,
                             DimensionMismatchException,
                             MaxCountExceededException
```
    Returns the intrinsic (Hardy-Weinberg proportions) p-Value, as described in p64-69 of McDonald, J.H. 2009. Handbook of Biological Statistics (2nd ed.). Sparky House Publishing, Baltimore, Maryland.
    The probability returned is the tail probability beyond g(expected, observed) in the ChiSquare distribution with degrees of freedom two less than the common length of expected and observed.
    
    Parameters:
    observed - array of observed frequency counts
    expected - array of expected frequency counts
    
    Returns:
    p-value
    
    Throws:
    
    NotPositiveException - if observed has negative entries
    
    NotStrictlyPositiveException - expected has entries that are not strictly positive
    
    DimensionMismatchException - if the array lengths do not match or are less than 2.
    
    MaxCountExceededException - if an error occurs computing the p-value.
  - gTest
```
public boolean gTest(double[] expected,
            long[] observed,
            double alpha)
              throws NotPositiveException,
                     NotStrictlyPositiveException,
                     DimensionMismatchException,
                     OutOfRangeException,
                     MaxCountExceededException
```
    Performs a G-Test (Log-Likelihood Ratio Test) for goodness of fit evaluating the null hypothesis that the observed counts conform to the frequency distribution described by the expected counts, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.
    Example:
    To test the hypothesis that observed follows expected at the 99% level, use
    gTest(expected, observed, 0.01)
    
    Returns true iff gTestGoodnessOfFitPValue(expected, observed) < alpha
    
    Preconditions:
    - Expected counts must all be positive.
    - Observed counts must all be ≥ 0.
    - The observed and expected arrays must have the same length and their common length must be at least 2.
    - 0 < alpha < 0.5
    If any of the preconditions are not met, a MathIllegalArgumentException is thrown.
    
    Note:This implementation rescales the expected array if necessary to ensure that the sum of the expected and observed counts are equal.
    Parameters:
    observed - array of observed frequency counts
    expected - array of expected frequency counts
    alpha - significance level of the test
    
    Returns:
    true iff null hypothesis can be rejected with confidence 1 - alpha
    
    Throws:
    
    NotPositiveException - if observed has negative entries
    
    NotStrictlyPositiveException - if expected has entries that are not strictly positive
    
    DimensionMismatchException - if the array lengths do not match or are less than 2.
    
    MaxCountExceededException - if an error occurs computing the p-value.
    
    OutOfRangeException - if alpha is not strictly greater than zero and less than or equal to 0.5
  - gDataSetsComparison
```
public double gDataSetsComparison(long[] observed1,
                         long[] observed2)
                           throws DimensionMismatchException,
                                  NotPositiveException,
                                  ZeroException
```
    Computes a G (Log-Likelihood Ratio) two sample test statistic for independence comparing frequency counts in observed1 and observed2. The sums of frequency counts in the two samples are not required to be the same. The formula used to compute the test statistic is
    
    2 * totalSum * [H(rowSums) + H(colSums) - H(k)]
    
    where H is the Shannon Entropy of the random variable formed by viewing the elements of the argument array as incidence counts;
    k is a matrix with rows [observed1, observed2];
    rowSums, colSums are the row/col sums of k;
    and totalSum is the overall sum of all entries in k.
    
    This statistic can be used to perform a G test evaluating the null hypothesis that both observed counts are independent
    
    Preconditions:
    - Observed counts must be non-negative.
    - Observed counts for a specific bin must not both be zero.
    - Observed counts for a specific sample must not all be 0.
    - The arrays observed1 and observed2 must have the same length and their common length must be at least 2.
    If any of the preconditions are not met, a MathIllegalArgumentException is thrown.
    Parameters:
    observed1 - array of observed frequency counts of the first data set
    observed2 - array of observed frequency counts of the second data set
    
    Returns:
    G-Test statistic
    
    Throws:
    
    DimensionMismatchException - the the lengths of the arrays do not match or their common length is less than 2
    
    NotPositiveException - if any entry in observed1 or observed2 is negative
    
    ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at the same index is zero for both arrays.
  - rootLogLikelihoodRatio
```
public double rootLogLikelihoodRatio(long k11,
                            long k12,
                            long k21,
                            long k22)
```
    Calculates the root log-likelihood ratio for 2 state Datasets. See gDataSetsComparison(long[], long[] ).
    Given two events A and B, let k11 be the number of times both events occur, k12 the incidence of B without A, k21 the count of A without B, and k22 the number of times neither A nor B occurs. What is returned by this method is
    
    (sgn) sqrt(gValueDataSetsComparison({k11, k12}, {k21, k22})
    
    where sgn is -1 if k11 / (k11 + k12) < k21 / (k21 + k22));
    1 otherwise.
    
    Signed root LLR has two advantages over the basic LLR: a) it is positive where k11 is bigger than expected, negative where it is lower b) if there is no difference it is asymptotically normally distributed. This allows one to talk about "number of standard deviations" which is a more common frame of reference than the chi^2 distribution.
    
    Parameters:
    k11 - number of times the two events occurred together (AB)
    k12 - number of times the second event occurred WITHOUT the first event (notA,B)
    k21 - number of times the first event occurred WITHOUT the second event (A, notB)
    k22 - number of times something else occurred (i.e. was neither of these events (notA, notB)
    
    Returns:
    root log-likelihood ratio
  - gTestDataSetsComparison
```
public double gTestDataSetsComparison(long[] observed1,
                             long[] observed2)
                               throws DimensionMismatchException,
                                      NotPositiveException,
                                      ZeroException,
                                      MaxCountExceededException
```
    Returns the observed significance level, or p-value, associated with a G-Value (Log-Likelihood Ratio) for two sample test comparing bin frequency counts in observed1 and observed2.
    
    The number returned is the smallest significance level at which one can reject the null hypothesis that the observed counts conform to the same distribution.
    
    See gTest(double[], long[]) for details on how the p-value is computed. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.
    
    Preconditions:
    - Observed counts must be non-negative.
    - Observed counts for a specific bin must not both be zero.
    - Observed counts for a specific sample must not all be 0.
    - The arrays observed1 and observed2 must have the same length and their common length must be at least 2.
    If any of the preconditions are not met, a MathIllegalArgumentException is thrown.
    Parameters:
    observed1 - array of observed frequency counts of the first data set
    observed2 - array of observed frequency counts of the second data set
    
    Returns:
    p-value
    
    Throws:
    
    DimensionMismatchException - the the length of the arrays does not match or their common length is less than 2
    
    NotPositiveException - if any of the entries in observed1 or observed2 are negative
    
    ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at some index is zero for both arrays
    
    MaxCountExceededException - if an error occurs computing the p-value.
  - gTestDataSetsComparison
```
public boolean gTestDataSetsComparison(long[] observed1,
                              long[] observed2,
                              double alpha)
                                throws DimensionMismatchException,
                                       NotPositiveException,
                                       ZeroException,
                                       OutOfRangeException,
                                       MaxCountExceededException
```
    Performs a G-Test (Log-Likelihood Ratio Test) comparing two binned data sets. The test evaluates the null hypothesis that the two lists of observed counts conform to the same frequency distribution, with significance level alpha. Returns true iff the null hypothesis can be rejected with 100 * (1 - alpha) percent confidence.
    
    See gDataSetsComparison(long[], long[]) for details on the formula used to compute the G (LLR) statistic used in the test and gTest(double[], long[]) for information on how the observed significance level is computed. The degrees of of freedom used to perform the test is one less than the common length of the input observed count arrays.
    Preconditions:
    - Observed counts must be non-negative.
    - Observed counts for a specific bin must not both be zero.
    - Observed counts for a specific sample must not all be 0.
    - The arrays observed1 and observed2 must have the same length and their common length must be at least 2.
    - 0 < alpha < 0.5
    If any of the preconditions are not met, a MathIllegalArgumentException is thrown.
    Parameters:
    observed1 - array of observed frequency counts of the first data set
    observed2 - array of observed frequency counts of the second data set
    alpha - significance level of the test
    
    Returns:
    true iff null hypothesis can be rejected with confidence 1 - alpha
    
    Throws:
    
    DimensionMismatchException - the the length of the arrays does not match
    
    NotPositiveException - if any of the entries in observed1 or observed2 are negative
    
    ZeroException - if either all counts of observed1 or observed2 are zero, or if the count at some index is zero for both arrays
    
    OutOfRangeException - if alpha is not in the range (0, 0.5]
    
    MaxCountExceededException - if an error occurs performing the test

Class GTest

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

GTest

Method Detail

g

gTest

gTestIntrinsic

gTest

gDataSetsComparison

rootLogLikelihoodRatio

gTestDataSetsComparison

gTestDataSetsComparison