Package weka.classifiers.trees
Class HoeffdingTree
java.lang.Object
weka.classifiers.AbstractClassifier
weka.classifiers.trees.HoeffdingTree
- All Implemented Interfaces:
Serializable
,Cloneable
,Classifier
,UpdateableClassifier
,BatchPredictor
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,Drawable
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
,WeightedInstancesHandler
public class HoeffdingTree
extends AbstractClassifier
implements UpdateableClassifier, WeightedInstancesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, Drawable, Serializable
A Hoeffding tree (VFDT) is an incremental, anytime
decision tree induction algorithm that is capable of learning from massive
data streams, assuming that the distribution generating examples does not
change over time. Hoeffding trees exploit the fact that a small sample can
often be enough to choose an optimal splitting attribute. This idea is
supported mathematically by the Hoeffding bound, which quantifies the number
of observations (in our case, examples) needed to estimate some statistics
within a prescribed precision (in our case, the goodness of an attribute).
A theoretically appealing feature of Hoeffding Trees not shared by otherincremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. For more information see:
Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 2001. BibTeX:
A theoretically appealing feature of Hoeffding Trees not shared by otherincremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. For more information see:
Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 2001. BibTeX:
@inproceedings{Hulten2001, author = {Geoff Hulten and Laurie Spencer and Pedro Domingos}, booktitle = {ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining}, pages = {97-106}, publisher = {ACM Press}, title = {Mining time-changing data streams}, year = {2001} }Valid options are:
-L The leaf prediction strategy to use. 0 = majority class, 1 = naive Bayes, 2 = naive Bayes adaptive. (default = 0)
-S The splitting criterion to use. 0 = Gini, 1 = Info gain (default = 0)
-E The allowable error in a split decision - values closer to zero will take longer to decide (default = 1e-7)
-H Threshold below which a split will be forced to break ties (default = 0.05)
-M Minimum fraction of weight required down at least two branches for info gain splitting (default = 0.01)
-G Grace period - the number of instances a leaf should observe between split attempts (default = 200)
-N The number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions (NB or NB adaptive only) (default = 0)
-P Print leaf models when using naive Bayes at the leaves.
- Version:
- $Revision: 15519 $
- Author:
- Richard Kirkby (rkirkby@cs.waikato.ac.nz), Mark Hall (mhall{[at]}pentaho{[dot]}com)
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
static final int
static final int
static final int
static final int
static final Tag[]
static final Tag[]
Fields inherited from class weka.classifiers.AbstractClassifier
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT
Fields inherited from interface weka.core.Drawable
BayesNet, Newick, NOT_DRAWABLE, TREE
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
buildClassifier
(Instances data) Builds the classifier.double[]
Returns class probabilities for an instance.Returns default capabilities of the classifier.double
Get the number of instances (or total weight of instances) a leaf should observe between split attemptsdouble
Get the threshold below which a split will be forced to break tiesGet the leaf prediction strategy to use (majority class, naive Bayes or naive Bayes adaptive)double
Get the minimum fraction of weight required down at least two branches for info gain splittingdouble
Get the number of instances (weight) a leaf should observe before allowing naive Bayes to make predictionsString[]
Gets the current settings of the Classifier.boolean
Returns the revision string.double
Get the allowable error in a split decision.Get the split criterion to use (either Gini or info gain).Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Returns a string describing classifierReturns the tip text for this propertygraph()
Returns a string that describes a graph representing the object.int
Returns the type of graph representing the object.Returns the tip text for this propertyReturns the tip text for this propertyReturns an enumeration describing the available options.static void
Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyvoid
setGracePeriod
(double grace) Set the number of instances (or total weight of instances) a leaf should observe between split attemptsvoid
setHoeffdingTieThreshold
(double ht) Set the threshold below which a split will be forced to break tiesvoid
Set the leaf prediction strategy to use (majority class, naive Bayes or naive Bayes adaptive)void
setMinimumFractionOfWeightInfoGain
(double m) Set the minimum fraction of weight required down at least two branches for info gain splittingvoid
setNaiveBayesPredictionThreshold
(double n) Set the number of instances (weight) a leaf should observe before allowing naive Bayes to make predictionsvoid
setOptions
(String[] options) Parses a given list of options.void
setPrintLeafModels
(boolean p) void
setSplitConfidence
(double sc) Set the allowable error in a split decision.void
setSplitCriterion
(SelectedTag crit) Set the split criterion to use (either Gini or info gain).Returns the tip text for this propertyReturns the tip text for this propertytoString()
Return a textual description of the modevoid
updateClassifier
(Instance inst) Updates the classifier with the given instance.Methods inherited from class weka.classifiers.AbstractClassifier
batchSizeTipText, classifyInstance, debugTipText, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDebug, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, postExecution, preExecution, run, runClassifier, setBatchSize, setDebug, setDoNotCheckCapabilities, setNumDecimalPlaces
-
Field Details
-
GINI_SPLIT
public static final int GINI_SPLIT- See Also:
-
INFO_GAIN_SPLIT
public static final int INFO_GAIN_SPLIT- See Also:
-
TAGS_SELECTION
-
LEAF_MAJ_CLASS
public static final int LEAF_MAJ_CLASS- See Also:
-
LEAF_NB
public static final int LEAF_NB- See Also:
-
LEAF_NB_ADAPTIVE
public static final int LEAF_NB_ADAPTIVE- See Also:
-
TAGS_SELECTION2
-
-
Constructor Details
-
HoeffdingTree
public HoeffdingTree()
-
-
Method Details
-
globalInfo
Returns a string describing classifier- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
getCapabilities
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Specified by:
getCapabilities
in interfaceClassifier
- Overrides:
getCapabilities
in classAbstractClassifier
- Returns:
- the capabilities of this classifier
- See Also:
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classAbstractClassifier
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-L The leaf prediction strategy to use. 0 = majority class, 1 = naive Bayes, 2 = naive Bayes adaptive. (default = 0)
-S The splitting criterion to use. 0 = Gini, 1 = Info gain (default = 0)
-E The allowable error in a split decision - values closer to zero will take longer to decide (default = 1e-7)
-H Threshold below which a split will be forced to break ties (default = 0.05)
-M Minimum fraction of weight required down at least two branches for info gain splitting (default = 0.01)
-G Grace period - the number of instances a leaf should observe between split attempts (default = 200)
-N The number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions (NB or NB adaptive only) (default = 0)
-P Print leaf models when using naive Bayes at the leaves.
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classAbstractClassifier
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classAbstractClassifier
- Returns:
- an array of strings suitable for passing to setOptions
-
printLeafModelsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setPrintLeafModels
public void setPrintLeafModels(boolean p) -
getPrintLeafModels
public boolean getPrintLeafModels() -
minimumFractionOfWeightInfoGainTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMinimumFractionOfWeightInfoGain
public void setMinimumFractionOfWeightInfoGain(double m) Set the minimum fraction of weight required down at least two branches for info gain splitting- Parameters:
m
- the minimum fraction of weight
-
getMinimumFractionOfWeightInfoGain
public double getMinimumFractionOfWeightInfoGain()Get the minimum fraction of weight required down at least two branches for info gain splitting- Returns:
- the minimum fraction of weight
-
gracePeriodTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setGracePeriod
public void setGracePeriod(double grace) Set the number of instances (or total weight of instances) a leaf should observe between split attempts- Parameters:
grace
- the grace period
-
getGracePeriod
public double getGracePeriod()Get the number of instances (or total weight of instances) a leaf should observe between split attempts- Returns:
- the grace period
-
hoeffdingTieThresholdTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setHoeffdingTieThreshold
public void setHoeffdingTieThreshold(double ht) Set the threshold below which a split will be forced to break ties- Parameters:
ht
- the threshold
-
getHoeffdingTieThreshold
public double getHoeffdingTieThreshold()Get the threshold below which a split will be forced to break ties- Returns:
- the threshold
-
splitConfidenceTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSplitConfidence
public void setSplitConfidence(double sc) Set the allowable error in a split decision. Values closer to zero will take longer to decide.- Parameters:
sc
- the split confidence
-
getSplitConfidence
public double getSplitConfidence()Get the allowable error in a split decision. Values closer to zero will take longer to decide.- Returns:
- the split confidence
-
splitCriterionTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSplitCriterion
Set the split criterion to use (either Gini or info gain).- Parameters:
crit
- the criterion to use
-
getSplitCriterion
Get the split criterion to use (either Gini or info gain).- Returns:
- the criterion to use
-
leafPredictionStrategyTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setLeafPredictionStrategy
Set the leaf prediction strategy to use (majority class, naive Bayes or naive Bayes adaptive)- Parameters:
strat
- the strategy to use
-
getLeafPredictionStrategy
Get the leaf prediction strategy to use (majority class, naive Bayes or naive Bayes adaptive)- Returns:
- the strategy to use
-
naiveBayesPredictionThresholdTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNaiveBayesPredictionThreshold
public void setNaiveBayesPredictionThreshold(double n) Set the number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions- Parameters:
n
- the number/weight of instances
-
getNaiveBayesPredictionThreshold
public double getNaiveBayesPredictionThreshold()Get the number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions- Returns:
- the number/weight of instances
-
buildClassifier
Builds the classifier.- Specified by:
buildClassifier
in interfaceClassifier
- Parameters:
data
- the data to train with- Throws:
Exception
- if classifier can't be built successfully
-
updateClassifier
Updates the classifier with the given instance.- Specified by:
updateClassifier
in interfaceUpdateableClassifier
- Parameters:
inst
- the new training instance to include in the model- Throws:
Exception
- if the instance could not be incorporated in the model.
-
distributionForInstance
Returns class probabilities for an instance.- Specified by:
distributionForInstance
in interfaceClassifier
- Overrides:
distributionForInstance
in classAbstractClassifier
- Parameters:
inst
- the instance to compute the distribution for- Returns:
- the class probabilities
- Throws:
Exception
- if distribution can't be computed successfully
-
toString
Return a textual description of the mode -
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractClassifier
- Returns:
- the revision
-
main
-
graphType
public int graphType()Description copied from interface:Drawable
Returns the type of graph representing the object. -
graph
Description copied from interface:Drawable
Returns a string that describes a graph representing the object. The string should be in XMLBIF ver. 0.3 format if the graph is a BayesNet, otherwise it should be in dotty format.
-