Package weka.classifiers.meta
Class CostSensitiveClassifier
java.lang.Object
weka.classifiers.AbstractClassifier
weka.classifiers.SingleClassifierEnhancer
weka.classifiers.RandomizableSingleClassifierEnhancer
weka.classifiers.meta.CostSensitiveClassifier
- All Implemented Interfaces:
Serializable
,Cloneable
,Classifier
,BatchPredictor
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,Drawable
,OptionHandler
,Randomizable
,RevisionHandler
,WeightedInstancesHandler
public class CostSensitiveClassifier
extends RandomizableSingleClassifierEnhancer
implements OptionHandler, Drawable, BatchPredictor, WeightedInstancesHandler
A metaclassifier that makes its base classifier cost sensitive. Two methods can be used to introduce
cost-sensitivity: reweighting training instances according to the total cost assigned to each class;
or predicting the class with minimum expected misclassification cost (rather than the most likely class).
Performance can often be improved by using a bagged classifier to improve the probability estimates of
the base classifier. If the base classifier cannot handle instance weights, and the instance weights are not uniform,
the data will be resampled with replacement based on the weights before being passed to the base classifier.
Valid options are:
-M Minimize expected misclassification cost. Default is to reweight training instances according to costs per class
-C <cost file name> File name of a cost matrix to use. If this is not supplied, a cost matrix will be loaded on demand. The name of the on-demand file is the relation name of the training data plus ".cost", and the path to the on-demand file is specified with the -N option.
-N <directory> Name of a directory to search for cost files when loading costs on demand (default current directory).
-cost-matrix <matrix> The cost matrix in Matlab single line format.
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.rules.ZeroR)
Options specific to classifier weka.classifiers.rules.ZeroR:
-D If set, classifier is run in debug mode and may output additional info to the consoleOptions after -- are passed to the designated classifier.
- Version:
- $Revision: 15519 $
- Author:
- Len Trigg (len@reeltwo.com)
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
load cost matrix on demandstatic final int
use explicit cost matrixstatic final Tag[]
Specify possible sources of the cost matrixFields inherited from class weka.classifiers.AbstractClassifier
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT
Fields inherited from interface weka.core.Drawable
BayesNet, Newick, NOT_DRAWABLE, TREE
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionTool tip text for this propertyvoid
buildClassifier
(Instances data) Builds the model of the base learner.double[]
distributionForInstance
(Instance instance) Returns class probabilities.double[][]
Batch scoring method.Gets the preferred batch size from the base learner if it implements BatchPredictor.Returns default capabilities of the classifier.Gets the misclassification cost matrix.Gets the source location method of the cost matrix.boolean
Gets the value of MinimizeExpectedCost.Returns the directory that will be searched for cost files when loading on demand.String[]
Gets the current settings of the Classifier.Returns the revision string.graph()
Returns graph describing the classifier (if possible).int
Returns the type of graph this classifier represents.boolean
Returns true if the base classifier implements BatchPredictor and is able to generate batch predictions efficientlyReturns an enumeration describing the available options.static void
Main method for testing this class.void
setBatchSize
(String size) Set the batch size to use.void
setCostMatrix
(CostMatrix newCostMatrix) Sets the misclassification cost matrix.void
setCostMatrixSource
(SelectedTag newMethod) Sets the source location of the cost matrix.void
setMinimizeExpectedCost
(boolean newMinimizeExpectedCost) Set the value of MinimizeExpectedCost.void
setOnDemandDirectory
(File newDir) Sets the directory that will be searched for cost files when loading on demand.void
setOptions
(String[] options) Parses a given list of options.toString()
Output a representation of this classifierMethods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, postExecution, preExecution, setClassifier
Methods inherited from class weka.classifiers.AbstractClassifier
classifyInstance, debugTipText, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, makeCopies, makeCopy, numDecimalPlacesTipText, run, runClassifier, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
-
Field Details
-
MATRIX_ON_DEMAND
public static final int MATRIX_ON_DEMANDload cost matrix on demand- See Also:
-
MATRIX_SUPPLIED
public static final int MATRIX_SUPPLIEDuse explicit cost matrix- See Also:
-
TAGS_MATRIX_SOURCE
Specify possible sources of the cost matrix
-
-
Constructor Details
-
CostSensitiveClassifier
public CostSensitiveClassifier()Default constructor.
-
-
Method Details
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classRandomizableSingleClassifierEnhancer
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-M Minimize expected misclassification cost. Default is to reweight training instances according to costs per class
-C <cost file name> File name of a cost matrix to use. If this is not supplied, a cost matrix will be loaded on demand. The name of the on-demand file is the relation name of the training data plus ".cost", and the path to the on-demand file is specified with the -N option.
-N <directory> Name of a directory to search for cost files when loading costs on demand (default current directory).
-cost-matrix <matrix> The cost matrix in Matlab single line format.
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.rules.ZeroR)
Options specific to classifier weka.classifiers.rules.ZeroR:
-D If set, classifier is run in debug mode and may output additional info to the console
Options after -- are passed to the designated classifier.- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classRandomizableSingleClassifierEnhancer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classRandomizableSingleClassifierEnhancer
- Returns:
- an array of strings suitable for passing to setOptions
-
globalInfo
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
costMatrixSourceTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getCostMatrixSource
Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.- Returns:
- the cost matrix source.
-
setCostMatrixSource
Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.- Parameters:
newMethod
- the cost matrix location method.
-
onDemandDirectoryTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getOnDemandDirectory
Returns the directory that will be searched for cost files when loading on demand.- Returns:
- The cost file search directory.
-
setOnDemandDirectory
Sets the directory that will be searched for cost files when loading on demand.- Parameters:
newDir
- The cost file search directory.
-
minimizeExpectedCostTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMinimizeExpectedCost
public boolean getMinimizeExpectedCost()Gets the value of MinimizeExpectedCost.- Returns:
- Value of MinimizeExpectedCost.
-
setMinimizeExpectedCost
public void setMinimizeExpectedCost(boolean newMinimizeExpectedCost) Set the value of MinimizeExpectedCost.- Parameters:
newMinimizeExpectedCost
- Value to assign to MinimizeExpectedCost.
-
costMatrixTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getCostMatrix
Gets the misclassification cost matrix.- Returns:
- the cost matrix
-
setCostMatrix
Sets the misclassification cost matrix.- Parameters:
newCostMatrix
- the cost matrix
-
getCapabilities
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Specified by:
getCapabilities
in interfaceClassifier
- Overrides:
getCapabilities
in classSingleClassifierEnhancer
- Returns:
- the capabilities of this classifier
- See Also:
-
buildClassifier
Builds the model of the base learner.- Specified by:
buildClassifier
in interfaceClassifier
- Parameters:
data
- the training data- Throws:
Exception
- if the classifier could not be built successfully
-
distributionForInstance
Returns class probabilities. When minimum expected cost approach is chosen, returns probability one for class with the minimum expected misclassification cost. Otherwise it returns the probability distribution returned by the base classifier.- Specified by:
distributionForInstance
in interfaceClassifier
- Overrides:
distributionForInstance
in classAbstractClassifier
- Parameters:
instance
- the instance to be classified- Returns:
- the computed distribution for the given instance
- Throws:
Exception
- if instance could not be classified successfully
-
distributionsForInstances
Batch scoring method. Calls the appropriate method for the base learner if it implements BatchPredictor. Otherwise it simply calls the distributionForInstance() method repeatedly.- Specified by:
distributionsForInstances
in interfaceBatchPredictor
- Overrides:
distributionsForInstances
in classAbstractClassifier
- Parameters:
insts
- the instances to get predictions for- Returns:
- an array of probability distributions, one for each instance
- Throws:
Exception
- if a problem occurs
-
batchSizeTipText
Tool tip text for this property- Overrides:
batchSizeTipText
in classAbstractClassifier
- Returns:
- the tool tip for this property
-
setBatchSize
Set the batch size to use. Gets passed through to the base learner if it implements BatchPredictor. Otherwise it is just ignored.- Specified by:
setBatchSize
in interfaceBatchPredictor
- Overrides:
setBatchSize
in classAbstractClassifier
- Parameters:
size
- the batch size to use
-
getBatchSize
Gets the preferred batch size from the base learner if it implements BatchPredictor. Returns 1 as the preferred batch size otherwise.- Specified by:
getBatchSize
in interfaceBatchPredictor
- Overrides:
getBatchSize
in classAbstractClassifier
- Returns:
- the batch size to use
-
implementsMoreEfficientBatchPrediction
public boolean implementsMoreEfficientBatchPrediction()Returns true if the base classifier implements BatchPredictor and is able to generate batch predictions efficiently- Specified by:
implementsMoreEfficientBatchPrediction
in interfaceBatchPredictor
- Overrides:
implementsMoreEfficientBatchPrediction
in classAbstractClassifier
- Returns:
- true if the base classifier can generate batch predictions efficiently
-
graphType
public int graphType()Returns the type of graph this classifier represents. -
graph
Returns graph describing the classifier (if possible). -
toString
Output a representation of this classifier -
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractClassifier
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- should contain the following arguments: -t training file [-T test file] [-c class index]
-