Package weka.classifiers.evaluation
Class ThresholdCurve
java.lang.Object
weka.classifiers.evaluation.ThresholdCurve
- All Implemented Interfaces:
RevisionHandler
Generates points illustrating prediction tradeoffs that can be obtained by
varying the threshold value between classes. For example, the typical
threshold value of 0.5 means the predicted probability of "positive" must be
higher than 0.5 for the instance to be predicted as "positive". The resulting
dataset can be used to visualize precision/recall tradeoff, or for ROC curve
analysis (true positive rate vs false positive rate). Weka just varies the
threshold on the class probability estimates in each case. The Mann Whitney
statistic is used to calculate the AUC.
- Version:
- $Revision: 15751 $
- Author:
- Len Trigg (len@reeltwo.com)
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
attribute name: Falloutstatic final String
attribute name: False Negativesstatic final String
attribute name: False Positivesstatic final String
attribute name: FMeasurestatic final String
attribute name: False Positive Rate"static final String
attribute name: Liftstatic final String
attribute name: Precisionstatic final String
attribute name: Recallstatic final String
The name of the relation used in threshold curve datasetsstatic final String
attribute name: Sample Sizestatic final String
attribute name: Thresholdstatic final String
attribute name: True Positive Ratestatic final String
attribute name: True Negativesstatic final String
attribute name: True Positives -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiongetCurve
(ArrayList<Prediction> predictions) Calculates the performance stats for the default class and return results as a set of Instances.getCurve
(ArrayList<Prediction> predictions, int classIndex) Calculates the performance stats for the desired class and return results as a set of Instances.static double
getNPointPrecision
(Instances tcurve, int n) Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.static double
getPRCArea
(Instances tcurve) Calculates the area under the precision-recall curve (AUPRC).Returns the revision string.static double
getROCArea
(Instances tcurve) Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.static int
getThresholdInstance
(Instances tcurve, double threshold) Gets the index of the instance with the closest threshold value to the desired target
-
Field Details
-
RELATION_NAME
The name of the relation used in threshold curve datasets- See Also:
-
TRUE_POS_NAME
attribute name: True Positives- See Also:
-
FALSE_NEG_NAME
attribute name: False Negatives- See Also:
-
FALSE_POS_NAME
attribute name: False Positives- See Also:
-
TRUE_NEG_NAME
attribute name: True Negatives- See Also:
-
FP_RATE_NAME
attribute name: False Positive Rate"- See Also:
-
TP_RATE_NAME
attribute name: True Positive Rate- See Also:
-
PRECISION_NAME
attribute name: Precision- See Also:
-
RECALL_NAME
attribute name: Recall- See Also:
-
FALLOUT_NAME
attribute name: Fallout- See Also:
-
FMEASURE_NAME
attribute name: FMeasure- See Also:
-
SAMPLE_SIZE_NAME
attribute name: Sample Size- See Also:
-
LIFT_NAME
attribute name: Lift- See Also:
-
THRESHOLD_NAME
attribute name: Threshold- See Also:
-
-
Constructor Details
-
ThresholdCurve
public ThresholdCurve()
-
-
Method Details
-
getCurve
Calculates the performance stats for the default class and return results as a set of Instances. The structure of these Instances is as follows:- True Positives
- False Negatives
- False Positives
- True Negatives
- False Positive Rate
- True Positive Rate
- Precision
- Recall
- Fallout
- Threshold contains the probability threshold that gives rise to the previous performance values.
For the definitions of these measures, see TwoClassStats
- Parameters:
predictions
- the predictions to base the curve on- Returns:
- datapoints as a set of instances, null if no predictions have been made.
- See Also:
-
getCurve
Calculates the performance stats for the desired class and return results as a set of Instances.- Parameters:
predictions
- the predictions to base the curve onclassIndex
- index of the class of interest.- Returns:
- datapoints as a set of instances.
-
getNPointPrecision
Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.- Parameters:
tcurve
- a previously extracted threshold curve Instances.n
- the number of points to average over.- Returns:
- the n-point precision.
-
getPRCArea
Calculates the area under the precision-recall curve (AUPRC).- Parameters:
tcurve
- a previously extracted threshold curve Instances.- Returns:
- the PRC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
-
getROCArea
Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.- Parameters:
tcurve
- a previously extracted threshold curve Instances.- Returns:
- the ROC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
-
getThresholdInstance
Gets the index of the instance with the closest threshold value to the desired target- Parameters:
tcurve
- a set of instances that have been generated by this classthreshold
- the target threshold- Returns:
- the index of the instance that has threshold closest to the target, or -1 if this could not be found (i.e. no data, or bad threshold target)
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-