Class RandomForest

All Implemented Interfaces:
Serializable, Cloneable, Classifier, AdditionalMeasureProducer, Aggregateable<Bagging>, BatchPredictor, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, PartitionGenerator, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

public class RandomForest extends Bagging
Class for constructing a forest of random trees.

For more information see:

Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.

BibTeX:
 @article{Breiman2001,
    author = {Leo Breiman},
    journal = {Machine Learning},
    number = {1},
    pages = {5-32},
    title = {Random Forests},
    volume = {45},
    year = {2001}
 }
 


Valid options are:

 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 
 -O
  Calculate the out of bag error.
 
 -store-out-of-bag-predictions
  Whether to store out of bag predictions in internal evaluation object.
 
 -output-out-of-bag-complexity-statistics
  Whether to output complexity-based statistics when out-of-bag evaluation is performed.
 
 -print
  Print the individual classifiers in the output
 
 -attribute-importance
  Compute and output attribute importance (mean impurity decrease method)
 
 -I <num>
  Number of iterations (i.e., the number of trees in the random forest).
  (current value 100)
 
 -num-slots <num>
  Number of execution slots.
  (default 1 - i.e. no parallelism)
  (use 0 to auto-detect number of cores)
 
 -K <number of attributes>
  Number of attributes to randomly investigate. (default 0)
  (<1 = int(log_2(#predictors)+1)).
 
 -M <minimum number of instances>
  Set minimum number of instances per leaf.
  (default 1)
 
 -V <minimum variance for split>
  Set minimum numeric class variance proportion
  of train variance for split (default 1e-3).
 
 -S <num>
  Seed for random number generator.
  (default 1)
 
 -depth <num>
  The maximum depth of the tree, 0 for unlimited.
  (default 0)
 
 -N <num>
  Number of folds for backfitting (default 0, no backfitting).
 
 -U
  Allow unclassified instances.
 
 -B
  Break ties randomly when several attributes look equally good.
 
 -output-debug-info
  If set, classifier is run in debug mode and
  may output additional info to the console
 
 -do-not-check-capabilities
  If set, classifier capabilities are not checked before classifier is built
  (use with caution).
 
 -num-decimal-places
  The number of decimal places for the output of numbers in the model (default 2).
 
 -batch-size
  The desired batch size for batch prediction  (default 100).
 
Version:
$Revision: 15311 $
Author:
Richard Kirkby (rkirkby@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • RandomForest

      public RandomForest()
      Constructor that sets base classifier for bagging to RandomTre and default number of iterations to 100.
  • Method Details

    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the base classifier.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Specified by:
      getCapabilities in interface Classifier
      Overrides:
      getCapabilities in class SingleClassifierEnhancer
      Returns:
      the capabilities of the base classifier
      See Also:
    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      Overrides:
      globalInfo in class Bagging
      Returns:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Overrides:
      getTechnicalInformation in class Bagging
      Returns:
      the technical information about this class
    • numIterationsTipText

      public String numIterationsTipText()
      Returns the tip text for the number of iterations. Overridden here to be more informative.
      Overrides:
      numIterationsTipText in class IteratedSingleClassifierEnhancer
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setClassifier

      @ProgrammaticProperty public void setClassifier(Classifier newClassifier)
      This method only accepts RandomTree arguments.
      Overrides:
      setClassifier in class SingleClassifierEnhancer
      Parameters:
      newClassifier - the RandomTree to use.
    • setRepresentCopiesUsingWeights

      @ProgrammaticProperty public void setRepresentCopiesUsingWeights(boolean representUsingWeights)
      This method only accepts true as its argument
      Overrides:
      setRepresentCopiesUsingWeights in class Bagging
      Parameters:
      representUsingWeights - must be set to true.
    • numFeaturesTipText

      public String numFeaturesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getNumFeatures

      public int getNumFeatures()
      Get the number of features used in random selection.
      Returns:
      Value of numFeatures.
    • setNumFeatures

      public void setNumFeatures(int newNumFeatures)
      Set the number of features to use in random selection.
      Parameters:
      newNumFeatures - Value to assign to numFeatures.
    • computeAttributeImportanceTipText

      public String computeAttributeImportanceTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setComputeAttributeImportance

      public void setComputeAttributeImportance(boolean computeAttributeImportance)
      Set whether to compute and output attribute importance scores
      Parameters:
      computeAttributeImportance - true to compute attribute importance scores
    • getComputeAttributeImportance

      public boolean getComputeAttributeImportance()
      Get whether to compute and output attribute importance scores
      Returns:
      true if computing attribute importance scores
    • maxDepthTipText

      public String maxDepthTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMaxDepth

      public int getMaxDepth()
      Get the maximum depth of trh tree, 0 for unlimited.
      Returns:
      the maximum depth.
    • setMaxDepth

      public void setMaxDepth(int value)
      Set the maximum depth of the tree, 0 for unlimited.
      Parameters:
      value - the maximum depth.
    • breakTiesRandomlyTipText

      public String breakTiesRandomlyTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getBreakTiesRandomly

      public boolean getBreakTiesRandomly()
      Get whether to break ties randomly.
      Returns:
      true if ties are to be broken randomly.
    • setBreakTiesRandomly

      public void setBreakTiesRandomly(boolean newBreakTiesRandomly)
      Set whether to break ties randomly.
      Parameters:
      newBreakTiesRandomly - true if ties are to be broken randomly
    • setDebug

      public void setDebug(boolean debug)
      Set debugging mode.
      Overrides:
      setDebug in class AbstractClassifier
      Parameters:
      debug - true if debug output should be printed
    • setNumDecimalPlaces

      public void setNumDecimalPlaces(int num)
      Set the number of decimal places.
      Overrides:
      setNumDecimalPlaces in class AbstractClassifier
    • setBatchSize

      public void setBatchSize(String size)
      Set the preferred batch size for batch prediction.
      Specified by:
      setBatchSize in interface BatchPredictor
      Overrides:
      setBatchSize in class Bagging
      Parameters:
      size - the batch size to use
    • setSeed

      public void setSeed(int s)
      Sets the seed for the random number generator.
      Specified by:
      setSeed in interface Randomizable
      Overrides:
      setSeed in class RandomizableParallelIteratedSingleClassifierEnhancer
      Parameters:
      s - the seed to be used
    • toString

      public String toString()
      Returns description of the bagged classifier.
      Overrides:
      toString in class Bagging
      Returns:
      description of the bagged classifier as a string
    • computeAverageImpurityDecreasePerAttribute

      public double[] computeAverageImpurityDecreasePerAttribute(double[] nodeCounts) throws WekaException
      Computes the average impurity decrease per attribute over the trees
      Parameters:
      nodeCounts - an optional array that, if non-null, will hold the count of the number of nodes at which each attribute was used for splitting
      Returns:
      the average impurity decrease per attribute over the trees
      Throws:
      WekaException
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Bagging
      Returns:
      an enumeration of all the available options
    • getOptions

      public String[] getOptions()
      Gets the current settings of the forest.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Bagging
      Returns:
      an array of strings suitable for passing to setOptions()
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -P
        Size of each bag, as a percentage of the
        training set size. (default 100)
       
       -O
        Calculate the out of bag error.
       
       -store-out-of-bag-predictions
        Whether to store out of bag predictions in internal evaluation object.
       
       -output-out-of-bag-complexity-statistics
        Whether to output complexity-based statistics when out-of-bag evaluation is performed.
       
       -print
        Print the individual classifiers in the output
       
       -attribute-importance
        Compute and output attribute importance (mean impurity decrease method)
       
       -I <num>
        Number of iterations (i.e., the number of trees in the random forest).
        (current value 100)
       
       -num-slots <num>
        Number of execution slots.
        (default 1 - i.e. no parallelism)
        (use 0 to auto-detect number of cores)
       
       -K <number of attributes>
        Number of attributes to randomly investigate. (default 0)
        (<1 = int(log_2(#predictors)+1)).
       
       -M <minimum number of instances>
        Set minimum number of instances per leaf.
        (default 1)
       
       -V <minimum variance for split>
        Set minimum numeric class variance proportion
        of train variance for split (default 1e-3).
       
       -S <num>
        Seed for random number generator.
        (default 1)
       
       -depth <num>
        The maximum depth of the tree, 0 for unlimited.
        (default 0)
       
       -N <num>
        Number of folds for backfitting (default 0, no backfitting).
       
       -U
        Allow unclassified instances.
       
       -B
        Break ties randomly when several attributes look equally good.
       
       -output-debug-info
        If set, classifier is run in debug mode and
        may output additional info to the console
       
       -do-not-check-capabilities
        If set, classifier capabilities are not checked before classifier is built
        (use with caution).
       
       -num-decimal-places
        The number of decimal places for the output of numbers in the model (default 2).
       
       -batch-size
        The desired batch size for batch prediction  (default 100).
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Bagging
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Bagging
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for this class.
      Parameters:
      argv - the options