Package weka.classifiers.trees
Class RandomForest
- All Implemented Interfaces:
Serializable
,Cloneable
,Classifier
,AdditionalMeasureProducer
,Aggregateable<Bagging>
,BatchPredictor
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,PartitionGenerator
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
,WeightedInstancesHandler
Class for constructing a forest of random trees.
For more information see:
Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.
BibTeX:
Valid options are:
For more information see:
Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.
BibTeX:
@article{Breiman2001, author = {Leo Breiman}, journal = {Machine Learning}, number = {1}, pages = {5-32}, title = {Random Forests}, volume = {45}, year = {2001} }
Valid options are:
-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-store-out-of-bag-predictions Whether to store out of bag predictions in internal evaluation object.
-output-out-of-bag-complexity-statistics Whether to output complexity-based statistics when out-of-bag evaluation is performed.
-print Print the individual classifiers in the output
-attribute-importance Compute and output attribute importance (mean impurity decrease method)
-I <num> Number of iterations (i.e., the number of trees in the random forest). (current value 100)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism) (use 0 to auto-detect number of cores)
-K <number of attributes> Number of attributes to randomly investigate. (default 0) (<1 = int(log_2(#predictors)+1)).
-M <minimum number of instances> Set minimum number of instances per leaf. (default 1)
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-S <num> Seed for random number generator. (default 1)
-depth <num> The maximum depth of the tree, 0 for unlimited. (default 0)
-N <num> Number of folds for backfitting (default 0, no backfitting).
-U Allow unclassified instances.
-B Break ties randomly when several attributes look equally good.
-output-debug-info If set, classifier is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
-num-decimal-places The number of decimal places for the output of numbers in the model (default 2).
-batch-size The desired batch size for batch prediction (default 100).
- Version:
- $Revision: 15311 $
- Author:
- Richard Kirkby (rkirkby@cs.waikato.ac.nz)
- See Also:
-
Field Summary
Fields inherited from class weka.classifiers.AbstractClassifier
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT
-
Constructor Summary
ConstructorDescriptionConstructor that sets base classifier for bagging to RandomTre and default number of iterations to 100. -
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyReturns the tip text for this propertydouble[]
computeAverageImpurityDecreasePerAttribute
(double[] nodeCounts) Computes the average impurity decrease per attribute over the treesboolean
Get whether to break ties randomly.Returns default capabilities of the base classifier.boolean
Get whether to compute and output attribute importance scoresint
Get the maximum depth of trh tree, 0 for unlimited.int
Get the number of features used in random selection.String[]
Gets the current settings of the forest.Returns the revision string.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Returns a string describing classifierReturns an enumeration describing the available options.static void
Main method for this class.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for the number of iterations.void
setBatchSize
(String size) Set the preferred batch size for batch prediction.void
setBreakTiesRandomly
(boolean newBreakTiesRandomly) Set whether to break ties randomly.void
setClassifier
(Classifier newClassifier) This method only accepts RandomTree arguments.void
setComputeAttributeImportance
(boolean computeAttributeImportance) Set whether to compute and output attribute importance scoresvoid
setDebug
(boolean debug) Set debugging mode.void
setMaxDepth
(int value) Set the maximum depth of the tree, 0 for unlimited.void
setNumDecimalPlaces
(int num) Set the number of decimal places.void
setNumFeatures
(int newNumFeatures) Set the number of features to use in random selection.void
setOptions
(String[] options) Parses a given list of options.void
setRepresentCopiesUsingWeights
(boolean representUsingWeights) This method only accepts true as its argumentvoid
setSeed
(int s) Sets the seed for the random number generator.toString()
Returns description of the bagged classifier.Methods inherited from class weka.classifiers.meta.Bagging
aggregate, bagSizePercentTipText, batchSizeTipText, buildClassifier, calcOutOfBagTipText, distributionForInstance, distributionsForInstances, enumerateMeasures, finalizeAggregation, generatePartition, getBagSizePercent, getBatchSize, getCalcOutOfBag, getMeasure, getMembershipValues, getOutOfBagEvaluationObject, getOutputOutOfBagComplexityStatistics, getPrintClassifiers, getRepresentCopiesUsingWeights, getStoreOutOfBagPredictions, implementsMoreEfficientBatchPrediction, measureOutOfBagError, numElements, outputOutOfBagComplexityStatisticsTipText, printClassifiersTipText, representCopiesUsingWeightsTipText, setBagSizePercent, setCalcOutOfBag, setOutputOutOfBagComplexityStatistics, setPrintClassifiers, setStoreOutOfBagPredictions, storeOutOfBagPredictionsTipText
Methods inherited from class weka.classifiers.RandomizableParallelIteratedSingleClassifierEnhancer
getSeed, seedTipText
Methods inherited from class weka.classifiers.ParallelIteratedSingleClassifierEnhancer
getNumExecutionSlots, numExecutionSlotsTipText, setNumExecutionSlots
Methods inherited from class weka.classifiers.IteratedSingleClassifierEnhancer
getNumIterations, setNumIterations
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, postExecution, preExecution
Methods inherited from class weka.classifiers.AbstractClassifier
classifyInstance, debugTipText, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, makeCopies, makeCopy, numDecimalPlacesTipText, run, runClassifier, setDoNotCheckCapabilities
-
Constructor Details
-
RandomForest
public RandomForest()Constructor that sets base classifier for bagging to RandomTre and default number of iterations to 100.
-
-
Method Details
-
getCapabilities
Returns default capabilities of the base classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Specified by:
getCapabilities
in interfaceClassifier
- Overrides:
getCapabilities
in classSingleClassifierEnhancer
- Returns:
- the capabilities of the base classifier
- See Also:
-
globalInfo
Returns a string describing classifier- Overrides:
globalInfo
in classBagging
- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Overrides:
getTechnicalInformation
in classBagging
- Returns:
- the technical information about this class
-
numIterationsTipText
Returns the tip text for the number of iterations. Overridden here to be more informative.- Overrides:
numIterationsTipText
in classIteratedSingleClassifierEnhancer
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setClassifier
This method only accepts RandomTree arguments.- Overrides:
setClassifier
in classSingleClassifierEnhancer
- Parameters:
newClassifier
- the RandomTree to use.
-
setRepresentCopiesUsingWeights
This method only accepts true as its argument- Overrides:
setRepresentCopiesUsingWeights
in classBagging
- Parameters:
representUsingWeights
- must be set to true.
-
numFeaturesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumFeatures
public int getNumFeatures()Get the number of features used in random selection.- Returns:
- Value of numFeatures.
-
setNumFeatures
public void setNumFeatures(int newNumFeatures) Set the number of features to use in random selection.- Parameters:
newNumFeatures
- Value to assign to numFeatures.
-
computeAttributeImportanceTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setComputeAttributeImportance
public void setComputeAttributeImportance(boolean computeAttributeImportance) Set whether to compute and output attribute importance scores- Parameters:
computeAttributeImportance
- true to compute attribute importance scores
-
getComputeAttributeImportance
public boolean getComputeAttributeImportance()Get whether to compute and output attribute importance scores- Returns:
- true if computing attribute importance scores
-
maxDepthTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMaxDepth
public int getMaxDepth()Get the maximum depth of trh tree, 0 for unlimited.- Returns:
- the maximum depth.
-
setMaxDepth
public void setMaxDepth(int value) Set the maximum depth of the tree, 0 for unlimited.- Parameters:
value
- the maximum depth.
-
breakTiesRandomlyTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getBreakTiesRandomly
public boolean getBreakTiesRandomly()Get whether to break ties randomly.- Returns:
- true if ties are to be broken randomly.
-
setBreakTiesRandomly
public void setBreakTiesRandomly(boolean newBreakTiesRandomly) Set whether to break ties randomly.- Parameters:
newBreakTiesRandomly
- true if ties are to be broken randomly
-
setDebug
public void setDebug(boolean debug) Set debugging mode.- Overrides:
setDebug
in classAbstractClassifier
- Parameters:
debug
- true if debug output should be printed
-
setNumDecimalPlaces
public void setNumDecimalPlaces(int num) Set the number of decimal places.- Overrides:
setNumDecimalPlaces
in classAbstractClassifier
-
setBatchSize
Set the preferred batch size for batch prediction.- Specified by:
setBatchSize
in interfaceBatchPredictor
- Overrides:
setBatchSize
in classBagging
- Parameters:
size
- the batch size to use
-
setSeed
public void setSeed(int s) Sets the seed for the random number generator.- Specified by:
setSeed
in interfaceRandomizable
- Overrides:
setSeed
in classRandomizableParallelIteratedSingleClassifierEnhancer
- Parameters:
s
- the seed to be used
-
toString
Returns description of the bagged classifier. -
computeAverageImpurityDecreasePerAttribute
public double[] computeAverageImpurityDecreasePerAttribute(double[] nodeCounts) throws WekaException Computes the average impurity decrease per attribute over the trees- Parameters:
nodeCounts
- an optional array that, if non-null, will hold the count of the number of nodes at which each attribute was used for splitting- Returns:
- the average impurity decrease per attribute over the trees
- Throws:
WekaException
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classBagging
- Returns:
- an enumeration of all the available options
-
getOptions
Gets the current settings of the forest.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classBagging
- Returns:
- an array of strings suitable for passing to setOptions()
-
setOptions
Parses a given list of options. Valid options are:-P Size of each bag, as a percentage of the training set size. (default 100)
-O Calculate the out of bag error.
-store-out-of-bag-predictions Whether to store out of bag predictions in internal evaluation object.
-output-out-of-bag-complexity-statistics Whether to output complexity-based statistics when out-of-bag evaluation is performed.
-print Print the individual classifiers in the output
-attribute-importance Compute and output attribute importance (mean impurity decrease method)
-I <num> Number of iterations (i.e., the number of trees in the random forest). (current value 100)
-num-slots <num> Number of execution slots. (default 1 - i.e. no parallelism) (use 0 to auto-detect number of cores)
-K <number of attributes> Number of attributes to randomly investigate. (default 0) (<1 = int(log_2(#predictors)+1)).
-M <minimum number of instances> Set minimum number of instances per leaf. (default 1)
-V <minimum variance for split> Set minimum numeric class variance proportion of train variance for split (default 1e-3).
-S <num> Seed for random number generator. (default 1)
-depth <num> The maximum depth of the tree, 0 for unlimited. (default 0)
-N <num> Number of folds for backfitting (default 0, no backfitting).
-U Allow unclassified instances.
-B Break ties randomly when several attributes look equally good.
-output-debug-info If set, classifier is run in debug mode and may output additional info to the console
-do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
-num-decimal-places The number of decimal places for the output of numbers in the model (default 2).
-batch-size The desired batch size for batch prediction (default 100).
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classBagging
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classBagging
- Returns:
- the revision
-
main
Main method for this class.- Parameters:
argv
- the options
-