Class BVDecomposeSegCVSub

java.lang.Object
weka.classifiers.BVDecomposeSegCVSub
All Implemented Interfaces:
OptionHandler, RevisionHandler, TechnicalInformationHandler

public class BVDecomposeSegCVSub extends Object implements OptionHandler, TechnicalInformationHandler, RevisionHandler
This class performs Bias-Variance decomposion on any classifier using the sub-sampled cross-validation procedure as specified in (1).
The Kohavi and Wolpert definition of bias and variance is specified in (2).
The Webb definition of bias and variance is specified in (3).

Geoffrey I. Webb, Paul Conilione (2002). Estimating bias and variance from data. School of Computer Science and Software Engineering, Victoria, Australia.

Ron Kohavi, David H. Wolpert: Bias Plus Variance Decomposition for Zero-One Loss Functions. In: Machine Learning: Proceedings of the Thirteenth International Conference, 275-283, 1996.

Geoffrey I. Webb (2000). MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning. 40(2):159-196.

BibTeX:

 @misc{Webb2002,
    address = {School of Computer Science and Software Engineering, Victoria, Australia},
    author = {Geoffrey I. Webb and Paul Conilione},
    institution = {Monash University},
    title = {Estimating bias and variance from data},
    year = {2002},
    PDF = {http://www.csse.monash.edu.au/\~webb/Files/WebbConilione04.pdf}
 }

 @inproceedings{Kohavi1996,
    author = {Ron Kohavi and David H. Wolpert},
    booktitle = {Machine Learning: Proceedings of the Thirteenth International Conference},
    editor = {Lorenza Saitta},
    pages = {275-283},
    publisher = {Morgan Kaufmann},
    title = {Bias Plus Variance Decomposition for Zero-One Loss Functions},
    year = {1996},
    PS = {http://robotics.stanford.edu/\~ronnyk/biasVar.ps}
 }

 @article{Webb2000,
    author = {Geoffrey I. Webb},
    journal = {Machine Learning},
    number = {2},
    pages = {159-196},
    title = {MultiBoosting: A Technique for Combining Boosting and Wagging},
    volume = {40},
    year = {2000}
 }
 

Valid options are:

 -c <class index>
  The index of the class attribute.
  (default last)
 -D
  Turn on debugging output.
 -l <num>
  The number of times each instance is classified.
  (default 10)
 -p <proportion of objects in common>
  The average proportion of instances common between any two training sets
 -s <seed>
  The random number seed used.
 -t <name of arff file>
  The name of the arff file used for the decomposition.
 -T <number of instances in training set>
  The number of instances in the training set.
 -W <classifier class name>
  Full class name of the learner used in the decomposition.
  eg: weka.classifiers.bayes.NaiveBayes
 Options specific to learner weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated sub-learner.

Version:
$Revision: 10141 $
Author:
Paul Conilione (paulc4321@yahoo.com.au)
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Carry out the bias-variance decomposition using the sub-sampled cross-validation method.
    findCentralTendencies(double[] predProbs)
    Finds the central tendency, given the classifications for an instance.
    Gets the name of the classifier being analysed
    int
    Gets the number of times an instance is classified
    int
    Get the index (starting from 1) of the attribute used as the class.
    Get the name of the data file used for the decomposition
    boolean
    Gets whether debugging is turned on
    double
    Get the calculated error rate
    double
    Get the calculated bias squared according to the Kohavi and Wolpert definition
    double
    Get the calculated sigma according to the Kohavi and Wolpert definition
    double
    Get the calculated variance according to the Kohavi and Wolpert definition
    Gets the current settings of the CheckClassifier.
    double
    Get the proportion of instances that are common between two training sets.
    Returns the revision string.
    int
    Gets the random number seed
    Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
    int
    Get the training size
    double
    Get the calculated bias according to the Webb definition
    double
    Get the calculated variance according to the Webb definition
    Returns a string describing this object
    Returns an enumeration describing the available options.
    static void
    main(String[] args)
    Test method for this class
    final void
    randomize(int[] index, Random random)
    Accepts an array of ints and randomises the values in the array, using the random seed.
    void
    setClassifier(Classifier newClassifier)
    Set the classifiers being analysed
    void
    setClassifyIterations(int classifyIterations)
    Sets the number of times an instance is classified
    void
    setClassIndex(int classIndex)
    Sets index of attribute to discretize on
    void
    setDataFileName(String dataFileName)
    Sets the name of the dataset file.
    void
    setDebug(boolean debug)
    Sets debugging mode
    void
    setOptions(String[] options)
    Sets the OptionHandler's options using the given list.
    void
    setP(double proportion)
    Set the proportion of instances that are common between two training sets used to train a classifier.
    void
    setSeed(int seed)
    Sets the random number seed
    void
    setTrainSize(int size)
    Set the training size.
    Returns description of the bias-variance decomposition results.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • BVDecomposeSegCVSub

      public BVDecomposeSegCVSub()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this object
      Returns:
      a description of the classifier suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).

      Valid options are:

       -c <class index>
        The index of the class attribute.
        (default last)
       -D
        Turn on debugging output.
       -l <num>
        The number of times each instance is classified.
        (default 10)
       -p <proportion of objects in common>
        The average proportion of instances common between any two training sets
       -s <seed>
        The random number seed used.
       -t <name of arff file>
        The name of the arff file used for the decomposition.
       -T <number of instances in training set>
        The number of instances in the training set.
       -W <classifier class name>
        Full class name of the learner used in the decomposition.
        eg: weka.classifiers.bayes.NaiveBayes
       Options specific to learner weka.classifiers.rules.ZeroR:
       
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the CheckClassifier.
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      an array of strings suitable for passing to setOptions
    • setClassifier

      public void setClassifier(Classifier newClassifier)
      Set the classifiers being analysed
      Parameters:
      newClassifier - the Classifier to use.
    • getClassifier

      public Classifier getClassifier()
      Gets the name of the classifier being analysed
      Returns:
      the classifier being analysed.
    • setDebug

      public void setDebug(boolean debug)
      Sets debugging mode
      Parameters:
      debug - true if debug output should be printed
    • getDebug

      public boolean getDebug()
      Gets whether debugging is turned on
      Returns:
      true if debugging output is on
    • setSeed

      public void setSeed(int seed)
      Sets the random number seed
      Parameters:
      seed - the random number seed
    • getSeed

      public int getSeed()
      Gets the random number seed
      Returns:
      the random number seed
    • setClassifyIterations

      public void setClassifyIterations(int classifyIterations)
      Sets the number of times an instance is classified
      Parameters:
      classifyIterations - number of times an instance is classified
    • getClassifyIterations

      public int getClassifyIterations()
      Gets the number of times an instance is classified
      Returns:
      the maximum number of times an instance is classified
    • setDataFileName

      public void setDataFileName(String dataFileName)
      Sets the name of the dataset file.
      Parameters:
      dataFileName - name of dataset file.
    • getDataFileName

      public String getDataFileName()
      Get the name of the data file used for the decomposition
      Returns:
      the name of the data file
    • getClassIndex

      public int getClassIndex()
      Get the index (starting from 1) of the attribute used as the class.
      Returns:
      the index of the class attribute
    • setClassIndex

      public void setClassIndex(int classIndex)
      Sets index of attribute to discretize on
      Parameters:
      classIndex - the index (starting from 1) of the class attribute
    • getKWBias

      public double getKWBias()
      Get the calculated bias squared according to the Kohavi and Wolpert definition
      Returns:
      the bias squared
    • getWBias

      public double getWBias()
      Get the calculated bias according to the Webb definition
      Returns:
      the bias
    • getKWVariance

      public double getKWVariance()
      Get the calculated variance according to the Kohavi and Wolpert definition
      Returns:
      the variance
    • getWVariance

      public double getWVariance()
      Get the calculated variance according to the Webb definition
      Returns:
      the variance according to Webb
    • getKWSigma

      public double getKWSigma()
      Get the calculated sigma according to the Kohavi and Wolpert definition
      Returns:
      the sigma
    • setTrainSize

      public void setTrainSize(int size)
      Set the training size.
      Parameters:
      size - the size of the training set
    • getTrainSize

      public int getTrainSize()
      Get the training size
      Returns:
      the size of the training set
    • setP

      public void setP(double proportion)
      Set the proportion of instances that are common between two training sets used to train a classifier.
      Parameters:
      proportion - the proportion of instances that are common between training sets.
    • getP

      public double getP()
      Get the proportion of instances that are common between two training sets.
      Returns:
      the proportion
    • getError

      public double getError()
      Get the calculated error rate
      Returns:
      the error rate
    • decompose

      public void decompose() throws Exception
      Carry out the bias-variance decomposition using the sub-sampled cross-validation method.
      Throws:
      Exception - if the decomposition couldn't be carried out
    • findCentralTendencies

      public Vector<Integer> findCentralTendencies(double[] predProbs)
      Finds the central tendency, given the classifications for an instance. Where the central tendency is defined as the class that was most commonly selected for a given instance.

      For example, instance 'x' may be classified out of 3 classes y = {1, 2, 3}, so if x is classified 10 times, and is classified as follows, '1' = 2 times, '2' = 5 times and '3' = 3 times. Then the central tendency is '2'.

      However, it is important to note that this method returns a list of all classes that have the highest number of classifications. In cases where there are several classes with the largest number of classifications, then all of these classes are returned. For example if 'x' is classified '1' = 4 times, '2' = 4 times and '3' = 2 times. Then '1' and '2' are returned.

      Parameters:
      predProbs - the array of classifications for a single instance.
      Returns:
      a Vector containing Integer objects which store the class(s) which are the central tendency.
    • toString

      public String toString()
      Returns description of the bias-variance decomposition results.
      Overrides:
      toString in class Object
      Returns:
      the bias-variance decomposition results as a string
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Test method for this class
      Parameters:
      args - the command line arguments
    • randomize

      public final void randomize(int[] index, Random random)
      Accepts an array of ints and randomises the values in the array, using the random seed.
      Parameters:
      index - is the array of integers
      random - is the Random seed.