Class Bagging

All Implemented Interfaces:
Serializable, Cloneable, Classifier, AdditionalMeasureProducer, Aggregateable<Bagging>, BatchPredictor, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, PartitionGenerator, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler
Direct Known Subclasses:
RandomForest

Class for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner.

For more information, see

Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140.

BibTeX:

 @article{Breiman1996,
    author = {Leo Breiman},
    journal = {Machine Learning},
    number = {2},
    pages = {123-140},
    title = {Bagging predictors},
    volume = {24},
    year = {1996}
 }
 

Valid options are:

 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 -O
  Calculate the out of bag error.
 -print
  Print the individual classifiers in the output
 -store-out-of-bag-predictions
  Whether to store out of bag predictions in internal evaluation object.
 -output-out-of-bag-complexity-statistics
  Whether to output complexity-based statistics when out-of-bag evaluation is performed.
 -represent-copies-using-weights
  Represent copies of instances using weights rather than explicitly.
 -S <num>
  Random number seed.
  (default 1)
 -num-slots <num>
  Number of execution slots.
  (default 1 - i.e. no parallelism)
 -I <num>
  Number of iterations.
  (default 10)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.trees.REPTree)
 
 Options specific to classifier weka.classifiers.trees.REPTree:
 
 -M <minimum number of instances>
  Set minimum number of instances per leaf (default 2).
 -V <minimum variance for split>
  Set minimum numeric class variance proportion
  of train variance for split (default 1e-3).
 -N <number of folds>
  Number of folds for reduced error pruning (default 3).
 -S <seed>
  Seed for random data shuffling (default 1).
 -P
  No pruning.
 -L
  Maximum tree depth (default -1, no maximum)
 -I
  Initial class value count (default 0)
 -R
  Spread initial count over all class values (i.e. don't use 1 per value)
Options after -- are passed to the designated classifier.

Version:
$Revision: 15800 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (len@reeltwo.com), Richard Kirkby (rkirkby@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • Bagging

      public Bagging()
      Constructor.
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      Returns:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableParallelIteratedSingleClassifierEnhancer
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -P
        Size of each bag, as a percentage of the
        training set size. (default 100)
       -O
        Calculate the out of bag error.
       -print
        Print the individual classifiers in the output
       -store-out-of-bag-predictions
        Whether to store out of bag predictions in internal evaluation object.
       -output-out-of-bag-complexity-statistics
        Whether to output complexity-based statistics when out-of-bag evaluation is performed.
       -represent-copies-using-weights
        Represent copies of instances using weights rather than explicitly.
       -S <num>
        Random number seed.
        (default 1)
       -num-slots <num>
        Number of execution slots.
        (default 1 - i.e. no parallelism)
       -I <num>
        Number of iterations.
        (default 10)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -W
        Full name of base classifier.
        (default: weka.classifiers.trees.REPTree)
       
       Options specific to classifier weka.classifiers.trees.REPTree:
       
       -M <minimum number of instances>
        Set minimum number of instances per leaf (default 2).
       -V <minimum variance for split>
        Set minimum numeric class variance proportion
        of train variance for split (default 1e-3).
       -N <number of folds>
        Number of folds for reduced error pruning (default 3).
       -S <seed>
        Seed for random data shuffling (default 1).
       -P
        No pruning.
       -L
        Maximum tree depth (default -1, no maximum)
       -I
        Initial class value count (default 0)
       -R
        Spread initial count over all class values (i.e. don't use 1 per value)
      Options after -- are passed to the designated classifier.

      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableParallelIteratedSingleClassifierEnhancer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableParallelIteratedSingleClassifierEnhancer
      Returns:
      an array of strings suitable for passing to setOptions
    • bagSizePercentTipText

      public String bagSizePercentTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getBagSizePercent

      public int getBagSizePercent()
      Gets the size of each bag, as a percentage of the training set size.
      Returns:
      the bag size, as a percentage.
    • setBagSizePercent

      public void setBagSizePercent(int newBagSizePercent)
      Sets the size of each bag, as a percentage of the training set size.
      Parameters:
      newBagSizePercent - the bag size, as a percentage.
    • representCopiesUsingWeightsTipText

      public String representCopiesUsingWeightsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setRepresentCopiesUsingWeights

      public void setRepresentCopiesUsingWeights(boolean representUsingWeights)
      Set whether copies of instances are represented using weights rather than explicitly.
      Parameters:
      representUsingWeights - whether to represent copies using weights
    • getRepresentCopiesUsingWeights

      public boolean getRepresentCopiesUsingWeights()
      Get whether copies of instances are represented using weights rather than explicitly.
      Returns:
      whether copies of instances are represented using weights rather than explicitly
    • storeOutOfBagPredictionsTipText

      public String storeOutOfBagPredictionsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setStoreOutOfBagPredictions

      public void setStoreOutOfBagPredictions(boolean storeOutOfBag)
      Set whether the out of bag predictions are stored.
      Parameters:
      storeOutOfBag - whether the out of bag predictions are stored
    • getStoreOutOfBagPredictions

      public boolean getStoreOutOfBagPredictions()
      Get whether the out of bag predictions are stored.
      Returns:
      whether the out of bag predictions are stored
    • calcOutOfBagTipText

      public String calcOutOfBagTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setCalcOutOfBag

      public void setCalcOutOfBag(boolean calcOutOfBag)
      Set whether the out of bag error is calculated.
      Parameters:
      calcOutOfBag - whether to calculate the out of bag error
    • getCalcOutOfBag

      public boolean getCalcOutOfBag()
      Get whether the out of bag error is calculated.
      Returns:
      whether the out of bag error is calculated
    • outputOutOfBagComplexityStatisticsTipText

      public String outputOutOfBagComplexityStatisticsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getOutputOutOfBagComplexityStatistics

      public boolean getOutputOutOfBagComplexityStatistics()
      Gets whether complexity statistics are output when OOB estimation is performed.
      Returns:
      whether statistics are calculated
    • setOutputOutOfBagComplexityStatistics

      public void setOutputOutOfBagComplexityStatistics(boolean b)
      Sets whether complexity statistics are output when OOB estimation is performed.
      Parameters:
      b - whether statistics are calculated
    • printClassifiersTipText

      public String printClassifiersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPrintClassifiers

      public void setPrintClassifiers(boolean print)
      Set whether to print the individual ensemble classifiers in the output
      Parameters:
      print - true if the individual classifiers are to be printed
    • getPrintClassifiers

      public boolean getPrintClassifiers()
      Get whether to print the individual ensemble classifiers in the output
      Returns:
      true if the individual classifiers are to be printed
    • measureOutOfBagError

      public double measureOutOfBagError()
      Gets the out of bag error that was calculated as the classifier was built. Returns error rate in classification case and mean absolute error in regression case.
      Returns:
      the out of bag error; -1 if out-of-bag-error has not be estimated
    • enumerateMeasures

      public Enumeration<String> enumerateMeasures()
      Returns an enumeration of the additional measure names.
      Specified by:
      enumerateMeasures in interface AdditionalMeasureProducer
      Returns:
      an enumeration of the measure names
    • getMeasure

      public double getMeasure(String additionalMeasureName)
      Returns the value of the named measure.
      Specified by:
      getMeasure in interface AdditionalMeasureProducer
      Parameters:
      additionalMeasureName - the name of the measure to query for its value
      Returns:
      the value of the named measure
      Throws:
      IllegalArgumentException - if the named measure is not supported
    • getOutOfBagEvaluationObject

      public Evaluation getOutOfBagEvaluationObject()
      Returns the out-of-bag evaluation object.
      Returns:
      the out-of-bag evaluation object; null if out-of-bag error hasn't been calculated
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Bagging method.
      Specified by:
      buildClassifier in interface Classifier
      Overrides:
      buildClassifier in class ParallelIteratedSingleClassifierEnhancer
      Parameters:
      data - the training data to be used for generating the bagged classifier.
      Throws:
      Exception - if the classifier could not be built successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Calculates the class membership probabilities for the given test instance.
      Specified by:
      distributionForInstance in interface Classifier
      Overrides:
      distributionForInstance in class AbstractClassifier
      Parameters:
      instance - the instance to be classified
      Returns:
      preedicted class probability distribution
      Throws:
      Exception - if distribution can't be computed successfully
    • batchSizeTipText

      public String batchSizeTipText()
      Tool tip text for this property
      Overrides:
      batchSizeTipText in class AbstractClassifier
      Returns:
      the tool tip for this property
    • setBatchSize

      public void setBatchSize(String size)
      Set the batch size to use. Gets passed through to the base learner if it implements BatchPredictor. Otherwise it is just ignored.
      Specified by:
      setBatchSize in interface BatchPredictor
      Overrides:
      setBatchSize in class AbstractClassifier
      Parameters:
      size - the batch size to use
    • getBatchSize

      public String getBatchSize()
      Gets the preferred batch size from the base learner if it implements BatchPredictor. Returns 1 as the preferred batch size otherwise.
      Specified by:
      getBatchSize in interface BatchPredictor
      Overrides:
      getBatchSize in class AbstractClassifier
      Returns:
      the batch size to use
    • distributionsForInstances

      public double[][] distributionsForInstances(Instances insts) throws Exception
      Batch scoring method. Calls the appropriate method for the base learner if it implements BatchPredictor. Otherwise it simply calls the distributionForInstance() method repeatedly.
      Specified by:
      distributionsForInstances in interface BatchPredictor
      Overrides:
      distributionsForInstances in class AbstractClassifier
      Parameters:
      insts - the instances to get predictions for
      Returns:
      an array of probability distributions, one for each instance
      Throws:
      Exception - if a problem occurs
    • implementsMoreEfficientBatchPrediction

      public boolean implementsMoreEfficientBatchPrediction()
      Returns true if the base classifier implements BatchPredictor and is able to generate batch predictions efficiently
      Specified by:
      implementsMoreEfficientBatchPrediction in interface BatchPredictor
      Overrides:
      implementsMoreEfficientBatchPrediction in class AbstractClassifier
      Returns:
      true if the base classifier can generate batch predictions efficiently
    • toString

      public String toString()
      Returns description of the bagged classifier.
      Overrides:
      toString in class Object
      Returns:
      description of the bagged classifier as a string
    • generatePartition

      public void generatePartition(Instances data) throws Exception
      Builds the classifier to generate a partition.
      Specified by:
      generatePartition in interface PartitionGenerator
      Throws:
      Exception
    • getMembershipValues

      public double[] getMembershipValues(Instance inst) throws Exception
      Computes an array that indicates leaf membership
      Specified by:
      getMembershipValues in interface PartitionGenerator
      Throws:
      Exception
    • numElements

      public int numElements() throws Exception
      Returns the number of elements in the partition.
      Specified by:
      numElements in interface PartitionGenerator
      Throws:
      Exception
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractClassifier
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - the options
    • aggregate

      public Bagging aggregate(Bagging toAggregate) throws Exception
      Aggregate an object with this one
      Specified by:
      aggregate in interface Aggregateable<Bagging>
      Parameters:
      toAggregate - the object to aggregate
      Returns:
      the result of aggregation
      Throws:
      Exception - if the supplied object can't be aggregated for some reason
    • finalizeAggregation

      public void finalizeAggregation() throws Exception
      Call to complete the aggregation process. Allows implementers to do any final processing based on how many objects were aggregated.
      Specified by:
      finalizeAggregation in interface Aggregateable<Bagging>
      Throws:
      Exception - if the aggregation can't be finalized for some reason