weka.classifiers.evaluation.ThresholdCurve

All Implemented Interfaces:: RevisionHandler

public class ThresholdCurve extends Object implements RevisionHandler

Generates points illustrating prediction tradeoffs that can be obtained by varying the threshold value between classes. For example, the typical threshold value of 0.5 means the predicted probability of "positive" must be higher than 0.5 for the instance to be predicted as "positive". The resulting dataset can be used to visualize precision/recall tradeoff, or for ROC curve analysis (true positive rate vs false positive rate). Weka just varies the threshold on the class probability estimates in each case. The Mann Whitney statistic is used to calculate the AUC.

Version:: $Revision: 15751 $
Author:: Len Trigg (len@reeltwo.com)

Field Summary

Fields

Modifier and Type

Field

Description

static final String

FALLOUT_NAME

attribute name: Fallout

static final String

FALSE_NEG_NAME

attribute name: False Negatives

static final String

FALSE_POS_NAME

attribute name: False Positives

static final String

FMEASURE_NAME

attribute name: FMeasure

static final String

FP_RATE_NAME

attribute name: False Positive Rate"

static final String

LIFT_NAME

attribute name: Lift

static final String

PRECISION_NAME

attribute name: Precision

static final String

RECALL_NAME

attribute name: Recall

static final String

RELATION_NAME

The name of the relation used in threshold curve datasets

static final String

SAMPLE_SIZE_NAME

attribute name: Sample Size

static final String

THRESHOLD_NAME

attribute name: Threshold

static final String

TP_RATE_NAME

attribute name: True Positive Rate

static final String

TRUE_NEG_NAME

attribute name: True Negatives

static final String

TRUE_POS_NAME

attribute name: True Positives
Constructor Summary

Constructors

Constructor

Description

ThresholdCurve()
Method Summary

Modifier and Type

Method

Description

Instances

getCurve(ArrayList<Prediction> predictions)

Calculates the performance stats for the default class and return results as a set of Instances.

Instances

getCurve(ArrayList<Prediction> predictions, int classIndex)

Calculates the performance stats for the desired class and return results as a set of Instances.

static double

getNPointPrecision(Instances tcurve, int n)

Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.

static double

getPRCArea(Instances tcurve)

Calculates the area under the precision-recall curve (AUPRC).

String

getRevision()

Returns the revision string.

static double

getROCArea(Instances tcurve)

Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.

static int

getThresholdInstance(Instances tcurve, double threshold)

Gets the index of the instance with the closest threshold value to the desired target

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- RELATION_NAME
  
  public static final String RELATION_NAME
  
  The name of the relation used in threshold curve datasets
  See Also:
  
  Constant Field Values
- TRUE_POS_NAME
  
  public static final String TRUE_POS_NAME
  
  attribute name: True Positives
  See Also:
  
  Constant Field Values
- FALSE_NEG_NAME
  
  public static final String FALSE_NEG_NAME
  
  attribute name: False Negatives
  See Also:
  
  Constant Field Values
- FALSE_POS_NAME
  
  public static final String FALSE_POS_NAME
  
  attribute name: False Positives
  See Also:
  
  Constant Field Values
- TRUE_NEG_NAME
  
  public static final String TRUE_NEG_NAME
  
  attribute name: True Negatives
  See Also:
  
  Constant Field Values
- FP_RATE_NAME
  
  public static final String FP_RATE_NAME
  
  attribute name: False Positive Rate"
  See Also:
  
  Constant Field Values
- TP_RATE_NAME
  
  public static final String TP_RATE_NAME
  
  attribute name: True Positive Rate
  See Also:
  
  Constant Field Values
- PRECISION_NAME
  
  public static final String PRECISION_NAME
  
  attribute name: Precision
  See Also:
  
  Constant Field Values
- RECALL_NAME
  
  public static final String RECALL_NAME
  
  attribute name: Recall
  See Also:
  
  Constant Field Values
- FALLOUT_NAME
  
  public static final String FALLOUT_NAME
  
  attribute name: Fallout
  See Also:
  
  Constant Field Values
- FMEASURE_NAME
  
  public static final String FMEASURE_NAME
  
  attribute name: FMeasure
  See Also:
  
  Constant Field Values
- SAMPLE_SIZE_NAME
  
  public static final String SAMPLE_SIZE_NAME
  
  attribute name: Sample Size
  See Also:
  
  Constant Field Values
- LIFT_NAME
  
  public static final String LIFT_NAME
  
  attribute name: Lift
  See Also:
  
  Constant Field Values
- THRESHOLD_NAME
  
  public static final String THRESHOLD_NAME
  
  attribute name: Threshold
  See Also:
  
  Constant Field Values
Constructor Details
- ThresholdCurve
  
  public ThresholdCurve()
Method Details
- getCurve
  
  public Instances getCurve(ArrayList<Prediction> predictions)
  Calculates the performance stats for the default class and return results as a set of Instances. The structure of these Instances is as follows:
  
  True Positives
  False Negatives
  False Positives
  True Negatives
  False Positive Rate
  True Positive Rate
  Precision
  Recall
  Fallout
  Threshold contains the probability threshold that gives rise to the previous performance values.
  
  For the definitions of these measures, see TwoClassStats
  Parameters:
  
  predictions - the predictions to base the curve on
  
  Returns:
  
  datapoints as a set of instances, null if no predictions have been made.
  
  See Also:
  
  TwoClassStats
- getCurve
  
  public Instances getCurve(ArrayList<Prediction> predictions, int classIndex)
  
  Calculates the performance stats for the desired class and return results as a set of Instances.
  
  Parameters:
  
  predictions - the predictions to base the curve on
  
  classIndex - index of the class of interest.
  
  Returns:
  
  datapoints as a set of instances.
- getNPointPrecision
  
  public static double getNPointPrecision(Instances tcurve, int n)
  
  Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.
  
  Parameters:
  
  tcurve - a previously extracted threshold curve Instances.
  
  n - the number of points to average over.
  
  Returns:
  
  the n-point precision.
- getPRCArea
  
  public static double getPRCArea(Instances tcurve)
  
  Calculates the area under the precision-recall curve (AUPRC).
  
  Parameters:
  
  tcurve - a previously extracted threshold curve Instances.
  
  Returns:
  
  the PRC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
- getROCArea
  
  public static double getROCArea(Instances tcurve)
  
  Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.
  
  Parameters:
  
  tcurve - a previously extracted threshold curve Instances.
  
  Returns:
  
  the ROC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
- getThresholdInstance
  
  public static int getThresholdInstance(Instances tcurve, double threshold)
  
  Gets the index of the instance with the closest threshold value to the desired target
  
  Parameters:
  
  tcurve - a set of instances that have been generated by this class
  
  threshold - the target threshold
  
  Returns:
  
  the index of the instance that has threshold closest to the target, or -1 if this could not be found (i.e. no data, or bad threshold target)
- getRevision
  
  public String getRevision()
  
  Returns the revision string.
  
  Specified by:
  
  getRevision in interface RevisionHandler
  
  Returns:
  
  the revision

Class ThresholdCurve

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

RELATION_NAME

TRUE_POS_NAME

FALSE_NEG_NAME

FALSE_POS_NAME

TRUE_NEG_NAME

FP_RATE_NAME

TP_RATE_NAME

PRECISION_NAME

RECALL_NAME

FALLOUT_NAME

FMEASURE_NAME

SAMPLE_SIZE_NAME

LIFT_NAME

THRESHOLD_NAME

Constructor Details

ThresholdCurve

Method Details

getCurve

getCurve

getNPointPrecision

getPRCArea

getROCArea

getThresholdInstance

getRevision