Class HoeffdingTree

java.lang.Object
weka.classifiers.AbstractClassifier
weka.classifiers.trees.HoeffdingTree
All Implemented Interfaces:
Serializable, Cloneable, Classifier, UpdateableClassifier, BatchPredictor, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, Drawable, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

A Hoeffding tree (VFDT) is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute).

A theoretically appealing feature of Hoeffding Trees not shared by otherincremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. For more information see:

Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 2001.

BibTeX:

 @inproceedings{Hulten2001,
    author = {Geoff Hulten and Laurie Spencer and Pedro Domingos},
    booktitle = {ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining},
    pages = {97-106},
    publisher = {ACM Press},
    title = {Mining time-changing data streams},
    year = {2001}
 }
 

Valid options are:

 -L
  The leaf prediction strategy to use. 0 = majority class, 1 = naive Bayes, 2 = naive Bayes adaptive.
  (default = 0)
 
 -S
  The splitting criterion to use. 0 = Gini, 1 = Info gain
  (default = 0)
 
 -E
  The allowable error in a split decision - values closer to zero will take longer to decide
  (default = 1e-7)
 
 -H
  Threshold below which a split will be forced to break ties
  (default = 0.05)
 
 -M
  Minimum fraction of weight required down at least two branches for info gain splitting
  (default = 0.01)
 
 -G
  Grace period - the number of instances a leaf should observe between split attempts
  (default = 200)
 
 -N
  The number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions (NB or NB adaptive only)
  (default = 0)
 
 -P
  Print leaf models when using naive Bayes at the leaves.
 
Version:
$Revision: 15519 $
Author:
Richard Kirkby (rkirkby@cs.waikato.ac.nz), Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
  • Field Details

  • Constructor Details

    • HoeffdingTree

      public HoeffdingTree()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      Returns:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Specified by:
      getCapabilities in interface Classifier
      Overrides:
      getCapabilities in class AbstractClassifier
      Returns:
      the capabilities of this classifier
      See Also:
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class AbstractClassifier
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -L
        The leaf prediction strategy to use. 0 = majority class, 1 = naive Bayes, 2 = naive Bayes adaptive.
        (default = 0)
       
       -S
        The splitting criterion to use. 0 = Gini, 1 = Info gain
        (default = 0)
       
       -E
        The allowable error in a split decision - values closer to zero will take longer to decide
        (default = 1e-7)
       
       -H
        Threshold below which a split will be forced to break ties
        (default = 0.05)
       
       -M
        Minimum fraction of weight required down at least two branches for info gain splitting
        (default = 0.01)
       
       -G
        Grace period - the number of instances a leaf should observe between split attempts
        (default = 200)
       
       -N
        The number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions (NB or NB adaptive only)
        (default = 0)
       
       -P
        Print leaf models when using naive Bayes at the leaves.
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class AbstractClassifier
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class AbstractClassifier
      Returns:
      an array of strings suitable for passing to setOptions
    • printLeafModelsTipText

      public String printLeafModelsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPrintLeafModels

      public void setPrintLeafModels(boolean p)
    • getPrintLeafModels

      public boolean getPrintLeafModels()
    • minimumFractionOfWeightInfoGainTipText

      public String minimumFractionOfWeightInfoGainTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMinimumFractionOfWeightInfoGain

      public void setMinimumFractionOfWeightInfoGain(double m)
      Set the minimum fraction of weight required down at least two branches for info gain splitting
      Parameters:
      m - the minimum fraction of weight
    • getMinimumFractionOfWeightInfoGain

      public double getMinimumFractionOfWeightInfoGain()
      Get the minimum fraction of weight required down at least two branches for info gain splitting
      Returns:
      the minimum fraction of weight
    • gracePeriodTipText

      public String gracePeriodTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setGracePeriod

      public void setGracePeriod(double grace)
      Set the number of instances (or total weight of instances) a leaf should observe between split attempts
      Parameters:
      grace - the grace period
    • getGracePeriod

      public double getGracePeriod()
      Get the number of instances (or total weight of instances) a leaf should observe between split attempts
      Returns:
      the grace period
    • hoeffdingTieThresholdTipText

      public String hoeffdingTieThresholdTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setHoeffdingTieThreshold

      public void setHoeffdingTieThreshold(double ht)
      Set the threshold below which a split will be forced to break ties
      Parameters:
      ht - the threshold
    • getHoeffdingTieThreshold

      public double getHoeffdingTieThreshold()
      Get the threshold below which a split will be forced to break ties
      Returns:
      the threshold
    • splitConfidenceTipText

      public String splitConfidenceTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSplitConfidence

      public void setSplitConfidence(double sc)
      Set the allowable error in a split decision. Values closer to zero will take longer to decide.
      Parameters:
      sc - the split confidence
    • getSplitConfidence

      public double getSplitConfidence()
      Get the allowable error in a split decision. Values closer to zero will take longer to decide.
      Returns:
      the split confidence
    • splitCriterionTipText

      public String splitCriterionTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSplitCriterion

      public void setSplitCriterion(SelectedTag crit)
      Set the split criterion to use (either Gini or info gain).
      Parameters:
      crit - the criterion to use
    • getSplitCriterion

      public SelectedTag getSplitCriterion()
      Get the split criterion to use (either Gini or info gain).
      Returns:
      the criterion to use
    • leafPredictionStrategyTipText

      public String leafPredictionStrategyTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setLeafPredictionStrategy

      public void setLeafPredictionStrategy(SelectedTag strat)
      Set the leaf prediction strategy to use (majority class, naive Bayes or naive Bayes adaptive)
      Parameters:
      strat - the strategy to use
    • getLeafPredictionStrategy

      public SelectedTag getLeafPredictionStrategy()
      Get the leaf prediction strategy to use (majority class, naive Bayes or naive Bayes adaptive)
      Returns:
      the strategy to use
    • naiveBayesPredictionThresholdTipText

      public String naiveBayesPredictionThresholdTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNaiveBayesPredictionThreshold

      public void setNaiveBayesPredictionThreshold(double n)
      Set the number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions
      Parameters:
      n - the number/weight of instances
    • getNaiveBayesPredictionThreshold

      public double getNaiveBayesPredictionThreshold()
      Get the number of instances (weight) a leaf should observe before allowing naive Bayes to make predictions
      Returns:
      the number/weight of instances
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Builds the classifier.
      Specified by:
      buildClassifier in interface Classifier
      Parameters:
      data - the data to train with
      Throws:
      Exception - if classifier can't be built successfully
    • updateClassifier

      public void updateClassifier(Instance inst) throws Exception
      Updates the classifier with the given instance.
      Specified by:
      updateClassifier in interface UpdateableClassifier
      Parameters:
      inst - the new training instance to include in the model
      Throws:
      Exception - if the instance could not be incorporated in the model.
    • distributionForInstance

      public double[] distributionForInstance(Instance inst) throws Exception
      Returns class probabilities for an instance.
      Specified by:
      distributionForInstance in interface Classifier
      Overrides:
      distributionForInstance in class AbstractClassifier
      Parameters:
      inst - the instance to compute the distribution for
      Returns:
      the class probabilities
      Throws:
      Exception - if distribution can't be computed successfully
    • toString

      public String toString()
      Return a textual description of the mode
      Overrides:
      toString in class Object
      Returns:
      a String describing the model
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractClassifier
      Returns:
      the revision
    • main

      public static void main(String[] args)
    • graphType

      public int graphType()
      Description copied from interface: Drawable
      Returns the type of graph representing the object.
      Specified by:
      graphType in interface Drawable
      Returns:
      the type of graph representing the object
    • graph

      public String graph() throws Exception
      Description copied from interface: Drawable
      Returns a string that describes a graph representing the object. The string should be in XMLBIF ver. 0.3 format if the graph is a BayesNet, otherwise it should be in dotty format.
      Specified by:
      graph in interface Drawable
      Returns:
      the graph described by a string
      Throws:
      Exception - if the graph can't be computed