Package weka.classifiers.meta
Class Bagging
- All Implemented Interfaces:
Serializable
,Cloneable
,Classifier
,AdditionalMeasureProducer
,Aggregateable<Bagging>
,BatchPredictor
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,PartitionGenerator
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
,WeightedInstancesHandler
- Direct Known Subclasses:
RandomForest
public class Bagging
extends RandomizableParallelIteratedSingleClassifierEnhancer
implements WeightedInstancesHandler, AdditionalMeasureProducer, TechnicalInformationHandler, PartitionGenerator, Aggregateable<Bagging>
Class for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner.
For more information, see
Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140. BibTeX:
For more information, see
Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140. BibTeX:
@article{Breiman1996, author = {Leo Breiman}, journal = {Machine Learning}, number = {2}, pages = {123-140}, title = {Bagging predictors}, volume = {24}, year = {1996} }Valid options are:
-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-print Print the individual classifiers in the output
-store-out-of-bag-predictions Whether to store out of bag predictions in internal evaluation object.
-output-out-of-bag-complexity-statistics Whether to output complexity-based statistics when out-of-bag evaluation is performed.
-represent-copies-using-weights Represent copies of instances using weights rather than explicitly.
-S <num> Random number seed. (default 1)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-I <num> Number of iterations. (default 10)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.trees.REPTree)
Options specific to classifier weka.classifiers.trees.REPTree:
-M <minimum number of instances> Set minimum number of instances per leaf (default 2).
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-N <number of folds> Number of folds for reduced error pruning (default 3).
-S <seed> Seed for random data shuffling (default 1).
-P No pruning.
-L Maximum tree depth (default -1, no maximum)
-I Initial class value count (default 0)
-R Spread initial count over all class values (i.e. don't use 1 per value)Options after -- are passed to the designated classifier.
- Version:
- $Revision: 15800 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (len@reeltwo.com), Richard Kirkby (rkirkby@cs.waikato.ac.nz)
- See Also:
-
Field Summary
Fields inherited from class weka.classifiers.AbstractClassifier
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionAggregate an object with this oneReturns the tip text for this propertyTool tip text for this propertyvoid
buildClassifier
(Instances data) Bagging method.Returns the tip text for this propertydouble[]
distributionForInstance
(Instance instance) Calculates the class membership probabilities for the given test instance.double[][]
Batch scoring method.Returns an enumeration of the additional measure names.void
Call to complete the aggregation process.void
generatePartition
(Instances data) Builds the classifier to generate a partition.int
Gets the size of each bag, as a percentage of the training set size.Gets the preferred batch size from the base learner if it implements BatchPredictor.boolean
Get whether the out of bag error is calculated.double
getMeasure
(String additionalMeasureName) Returns the value of the named measure.double[]
getMembershipValues
(Instance inst) Computes an array that indicates leaf membershipString[]
Gets the current settings of the Classifier.Returns the out-of-bag evaluation object.boolean
Gets whether complexity statistics are output when OOB estimation is performed.boolean
Get whether to print the individual ensemble classifiers in the outputboolean
Get whether copies of instances are represented using weights rather than explicitly.Returns the revision string.boolean
Get whether the out of bag predictions are stored.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Returns a string describing classifierboolean
Returns true if the base classifier implements BatchPredictor and is able to generate batch predictions efficientlyReturns an enumeration describing the available options.static void
Main method for testing this class.double
Gets the out of bag error that was calculated as the classifier was built.int
Returns the number of elements in the partition.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyvoid
setBagSizePercent
(int newBagSizePercent) Sets the size of each bag, as a percentage of the training set size.void
setBatchSize
(String size) Set the batch size to use.void
setCalcOutOfBag
(boolean calcOutOfBag) Set whether the out of bag error is calculated.void
setOptions
(String[] options) Parses a given list of options.void
setOutputOutOfBagComplexityStatistics
(boolean b) Sets whether complexity statistics are output when OOB estimation is performed.void
setPrintClassifiers
(boolean print) Set whether to print the individual ensemble classifiers in the outputvoid
setRepresentCopiesUsingWeights
(boolean representUsingWeights) Set whether copies of instances are represented using weights rather than explicitly.void
setStoreOutOfBagPredictions
(boolean storeOutOfBag) Set whether the out of bag predictions are stored.Returns the tip text for this propertytoString()
Returns description of the bagged classifier.Methods inherited from class weka.classifiers.RandomizableParallelIteratedSingleClassifierEnhancer
getSeed, seedTipText, setSeed
Methods inherited from class weka.classifiers.ParallelIteratedSingleClassifierEnhancer
getNumExecutionSlots, numExecutionSlotsTipText, setNumExecutionSlots
Methods inherited from class weka.classifiers.IteratedSingleClassifierEnhancer
getNumIterations, numIterationsTipText, setNumIterations
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getCapabilities, getClassifier, postExecution, preExecution, setClassifier
Methods inherited from class weka.classifiers.AbstractClassifier
classifyInstance, debugTipText, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, makeCopies, makeCopy, numDecimalPlacesTipText, run, runClassifier, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface weka.core.CapabilitiesHandler
getCapabilities
-
Constructor Details
-
Bagging
public Bagging()Constructor.
-
-
Method Details
-
globalInfo
Returns a string describing classifier- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classRandomizableParallelIteratedSingleClassifierEnhancer
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-print Print the individual classifiers in the output
-store-out-of-bag-predictions Whether to store out of bag predictions in internal evaluation object.
-output-out-of-bag-complexity-statistics Whether to output complexity-based statistics when out-of-bag evaluation is performed.
-represent-copies-using-weights Represent copies of instances using weights rather than explicitly.
-S <num> Random number seed. (default 1)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism)
-I <num> Number of iterations. (default 10)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.trees.REPTree)
Options specific to classifier weka.classifiers.trees.REPTree:
-M <minimum number of instances> Set minimum number of instances per leaf (default 2).
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-N <number of folds> Number of folds for reduced error pruning (default 3).
-S <seed> Seed for random data shuffling (default 1).
-P No pruning.
-L Maximum tree depth (default -1, no maximum)
-I Initial class value count (default 0)
-R Spread initial count over all class values (i.e. don't use 1 per value)
Options after -- are passed to the designated classifier.- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classRandomizableParallelIteratedSingleClassifierEnhancer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classRandomizableParallelIteratedSingleClassifierEnhancer
- Returns:
- an array of strings suitable for passing to setOptions
-
bagSizePercentTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getBagSizePercent
public int getBagSizePercent()Gets the size of each bag, as a percentage of the training set size.- Returns:
- the bag size, as a percentage.
-
setBagSizePercent
public void setBagSizePercent(int newBagSizePercent) Sets the size of each bag, as a percentage of the training set size.- Parameters:
newBagSizePercent
- the bag size, as a percentage.
-
representCopiesUsingWeightsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRepresentCopiesUsingWeights
public void setRepresentCopiesUsingWeights(boolean representUsingWeights) Set whether copies of instances are represented using weights rather than explicitly.- Parameters:
representUsingWeights
- whether to represent copies using weights
-
getRepresentCopiesUsingWeights
public boolean getRepresentCopiesUsingWeights()Get whether copies of instances are represented using weights rather than explicitly.- Returns:
- whether copies of instances are represented using weights rather than explicitly
-
storeOutOfBagPredictionsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setStoreOutOfBagPredictions
public void setStoreOutOfBagPredictions(boolean storeOutOfBag) Set whether the out of bag predictions are stored.- Parameters:
storeOutOfBag
- whether the out of bag predictions are stored
-
getStoreOutOfBagPredictions
public boolean getStoreOutOfBagPredictions()Get whether the out of bag predictions are stored.- Returns:
- whether the out of bag predictions are stored
-
calcOutOfBagTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setCalcOutOfBag
public void setCalcOutOfBag(boolean calcOutOfBag) Set whether the out of bag error is calculated.- Parameters:
calcOutOfBag
- whether to calculate the out of bag error
-
getCalcOutOfBag
public boolean getCalcOutOfBag()Get whether the out of bag error is calculated.- Returns:
- whether the out of bag error is calculated
-
outputOutOfBagComplexityStatisticsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getOutputOutOfBagComplexityStatistics
public boolean getOutputOutOfBagComplexityStatistics()Gets whether complexity statistics are output when OOB estimation is performed.- Returns:
- whether statistics are calculated
-
setOutputOutOfBagComplexityStatistics
public void setOutputOutOfBagComplexityStatistics(boolean b) Sets whether complexity statistics are output when OOB estimation is performed.- Parameters:
b
- whether statistics are calculated
-
printClassifiersTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setPrintClassifiers
public void setPrintClassifiers(boolean print) Set whether to print the individual ensemble classifiers in the output- Parameters:
print
- true if the individual classifiers are to be printed
-
getPrintClassifiers
public boolean getPrintClassifiers()Get whether to print the individual ensemble classifiers in the output- Returns:
- true if the individual classifiers are to be printed
-
measureOutOfBagError
public double measureOutOfBagError()Gets the out of bag error that was calculated as the classifier was built. Returns error rate in classification case and mean absolute error in regression case.- Returns:
- the out of bag error; -1 if out-of-bag-error has not be estimated
-
enumerateMeasures
Returns an enumeration of the additional measure names.- Specified by:
enumerateMeasures
in interfaceAdditionalMeasureProducer
- Returns:
- an enumeration of the measure names
-
getMeasure
Returns the value of the named measure.- Specified by:
getMeasure
in interfaceAdditionalMeasureProducer
- Parameters:
additionalMeasureName
- the name of the measure to query for its value- Returns:
- the value of the named measure
- Throws:
IllegalArgumentException
- if the named measure is not supported
-
getOutOfBagEvaluationObject
Returns the out-of-bag evaluation object.- Returns:
- the out-of-bag evaluation object; null if out-of-bag error hasn't been calculated
-
buildClassifier
Bagging method.- Specified by:
buildClassifier
in interfaceClassifier
- Overrides:
buildClassifier
in classParallelIteratedSingleClassifierEnhancer
- Parameters:
data
- the training data to be used for generating the bagged classifier.- Throws:
Exception
- if the classifier could not be built successfully
-
distributionForInstance
Calculates the class membership probabilities for the given test instance.- Specified by:
distributionForInstance
in interfaceClassifier
- Overrides:
distributionForInstance
in classAbstractClassifier
- Parameters:
instance
- the instance to be classified- Returns:
- preedicted class probability distribution
- Throws:
Exception
- if distribution can't be computed successfully
-
batchSizeTipText
Tool tip text for this property- Overrides:
batchSizeTipText
in classAbstractClassifier
- Returns:
- the tool tip for this property
-
setBatchSize
Set the batch size to use. Gets passed through to the base learner if it implements BatchPredictor. Otherwise it is just ignored.- Specified by:
setBatchSize
in interfaceBatchPredictor
- Overrides:
setBatchSize
in classAbstractClassifier
- Parameters:
size
- the batch size to use
-
getBatchSize
Gets the preferred batch size from the base learner if it implements BatchPredictor. Returns 1 as the preferred batch size otherwise.- Specified by:
getBatchSize
in interfaceBatchPredictor
- Overrides:
getBatchSize
in classAbstractClassifier
- Returns:
- the batch size to use
-
distributionsForInstances
Batch scoring method. Calls the appropriate method for the base learner if it implements BatchPredictor. Otherwise it simply calls the distributionForInstance() method repeatedly.- Specified by:
distributionsForInstances
in interfaceBatchPredictor
- Overrides:
distributionsForInstances
in classAbstractClassifier
- Parameters:
insts
- the instances to get predictions for- Returns:
- an array of probability distributions, one for each instance
- Throws:
Exception
- if a problem occurs
-
implementsMoreEfficientBatchPrediction
public boolean implementsMoreEfficientBatchPrediction()Returns true if the base classifier implements BatchPredictor and is able to generate batch predictions efficiently- Specified by:
implementsMoreEfficientBatchPrediction
in interfaceBatchPredictor
- Overrides:
implementsMoreEfficientBatchPrediction
in classAbstractClassifier
- Returns:
- true if the base classifier can generate batch predictions efficiently
-
toString
Returns description of the bagged classifier. -
generatePartition
Builds the classifier to generate a partition.- Specified by:
generatePartition
in interfacePartitionGenerator
- Throws:
Exception
-
getMembershipValues
Computes an array that indicates leaf membership- Specified by:
getMembershipValues
in interfacePartitionGenerator
- Throws:
Exception
-
numElements
Returns the number of elements in the partition.- Specified by:
numElements
in interfacePartitionGenerator
- Throws:
Exception
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractClassifier
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- the options
-
aggregate
Aggregate an object with this one- Specified by:
aggregate
in interfaceAggregateable<Bagging>
- Parameters:
toAggregate
- the object to aggregate- Returns:
- the result of aggregation
- Throws:
Exception
- if the supplied object can't be aggregated for some reason
-
finalizeAggregation
Call to complete the aggregation process. Allows implementers to do any final processing based on how many objects were aggregated.- Specified by:
finalizeAggregation
in interfaceAggregateable<Bagging>
- Throws:
Exception
- if the aggregation can't be finalized for some reason
-