weka.classifiers.rules.JRip

All Implemented Interfaces:: Serializable, Cloneable, Classifier, AdditionalMeasureProducer, BatchPredictor, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

public class JRip extends AbstractClassifier implements AdditionalMeasureProducer, WeightedInstancesHandler, TechnicalInformationHandler

This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP.

The algorithm is briefly described as follows:

Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage:
Repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:
Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage:
after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again.
3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.
ENDDO

Note that there seem to be 2 bugs in the original ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.

Details please see:

William W. Cohen: Fast Effective Rule Induction. In: Twelfth International Conference on Machine Learning, 115-123, 1995.

PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.

BibTeX:

 @inproceedings{Cohen1995,
    author = {William W. Cohen},
    booktitle = {Twelfth International Conference on Machine Learning},
    pages = {115-123},
    publisher = {Morgan Kaufmann},
    title = {Fast Effective Rule Induction},
    year = {1995}
 }

Valid options are:

 -F <number of folds>
  Set number of folds for REP
  One fold is used as pruning set.
  (default 3)

 -N <min. weights>
  Set the minimal weights of instances
  within a split.
  (default 2.0)

 -O <number of runs>
  Set the number of runs of
  optimizations. (Default: 2)

 -D
  Set whether turn on the
  debug mode (Default: false)

 -S <seed>
  The seed of randomization
  (Default: 1)

 -E
  Whether NOT check the error rate>=0.5
  in stopping criteria  (default: check)

 -P
  Whether NOT use pruning
  (default: use pruning)

Version:

$Revision: 15519 $

Author:

Xin Xu (xx5@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)

See Also:

Serialized Form

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

class

JRip.Antd

The single antecedent in the rule, which is composed of an attribute and the corresponding value.

class

JRip.NominalAntd

The antecedent with nominal attribute

class

JRip.NumericAntd

The antecedent with numeric attribute

class

JRip.RipperRule

This class implements a single rule that predicts specified class.
Field Summary

Fields inherited from class weka.classifiers.AbstractClassifier
BATCH_SIZE_DEFAULT, NUM_DECIMAL_PLACES_DEFAULT
Constructor Summary

Constructors

Constructor

Description

JRip()
Method Summary

Modifier and Type

Method

Description

void

buildClassifier(Instances instances)

Builds Ripper in the order of class frequencies.

String

checkErrorRateTipText()

Returns the tip text for this property

String

debugTipText()

Returns the tip text for this property

double[]

distributionForInstance(Instance datum)

Classify the test instance with the rule learner and provide the class distributions

Enumeration<String>

enumerateMeasures()

Returns an enumeration of the additional measure names

String

foldsTipText()

Returns the tip text for this property

Capabilities

getCapabilities()

Returns default capabilities of the classifier.

boolean

getCheckErrorRate()

Gets whether to check for error rate is in stopping criterion

boolean

getDebug()

Gets whether debug information is output to the console

int

getFolds()

Gets the number of folds

double

getMeasure(String additionalMeasureName)

Returns the value of the named measure

double

getMinNo()

Gets the minimum total weight of the instances in a rule

int

getOptimizations()

Gets the the number of optimization runs

String[]

getOptions()

Gets the current settings of the Classifier.

String

getRevision()

Returns the revision string.

ArrayList<Rule>

getRuleset()

Get the ruleset generated by Ripper

RuleStats

getRuleStats(int pos)

Get the statistics of the ruleset in the given position

long

getSeed()

Gets the current seed value to use in randomizing the data

TechnicalInformation

getTechnicalInformation()

Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

boolean

getUsePruning()

Gets whether pruning is performed

String

globalInfo()

Returns a string describing classifier

Enumeration<Option>

listOptions()

Returns an enumeration describing the available options Valid options are:

static void

main(String[] args)

Main method.

String

minNoTipText()

Returns the tip text for this property

String

optimizationsTipText()

Returns the tip text for this property

String

seedTipText()

Returns the tip text for this property

void

setCheckErrorRate(boolean d)

Sets whether to check for error rate is in stopping criterion

void

setDebug(boolean d)

Sets whether debug information is output to the console

void

setFolds(int fold)

Sets the number of folds to use

void

setMinNo(double m)

Sets the minimum total weight of the instances in a rule

void

setOptimizations(int run)

Sets the number of optimization runs

void

setOptions(String[] options)

Parses a given list of options.

void

setSeed(long s)

Sets the seed value to use in randomizing the data

void

setUsePruning(boolean d)

Sets whether pruning is performed

String

toString()

Prints the all the rules of the rule learner.

String

usePruningTipText()

Returns the tip text for this property

Methods inherited from class weka.classifiers.AbstractClassifier
batchSizeTipText, classifyInstance, distributionsForInstances, doNotCheckCapabilitiesTipText, forName, getBatchSize, getDoNotCheckCapabilities, getNumDecimalPlaces, implementsMoreEfficientBatchPrediction, makeCopies, makeCopy, numDecimalPlacesTipText, postExecution, preExecution, run, runClassifier, setBatchSize, setDoNotCheckCapabilities, setNumDecimalPlaces

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- JRip
  
  public JRip()
Method Details
- globalInfo
  
  public String globalInfo()
  
  Returns a string describing classifier
  
  Returns:
  
  a description suitable for displaying in the explorer/experimenter gui
- getTechnicalInformation
  
  public TechnicalInformation getTechnicalInformation()
  
  Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
  
  Specified by:
  
  getTechnicalInformation in interface TechnicalInformationHandler
  
  Returns:
  
  the technical information about this class
- listOptions
  
  public Enumeration<Option> listOptions()
  
  Returns an enumeration describing the available options Valid options are:
  -F number
  The number of folds for reduced error pruning. One fold is used as the pruning set. (Default: 3)
  -N number
  The minimal weights of instances within a split. (Default: 2)
  -O number
  Set the number of runs of optimizations. (Default: 2)
  -D
  Whether turn on the debug mode -S number
  The seed of randomization used in Ripper.(Default: 1)
  -E
  Whether NOT check the error rate >= 0.5 in stopping criteria. (default: check)
  -P
  Whether NOT use pruning. (default: use pruning)
  
  Specified by:
  
  listOptions in interface OptionHandler
  
  Overrides:
  
  listOptions in class AbstractClassifier
  
  Returns:
  
  an enumeration of all the available options
- setOptions
  
  public void setOptions(String[] options) throws Exception
  Parses a given list of options.
  Valid options are:
  
  -F <number of folds> Set number of folds for REP One fold is used as pruning set. (default 3)
  
  -N <min. weights> Set the minimal weights of instances within a split. (default 2.0)
  
  -O <number of runs> Set the number of runs of optimizations. (Default: 2)
  
  -D Set whether turn on the debug mode (Default: false)
  
  -S <seed> The seed of randomization (Default: 1)
  
  -E Whether NOT check the error rate>=0.5 in stopping criteria (default: check)
  
  -P Whether NOT use pruning (default: use pruning)
  Specified by:
  
  setOptions in interface OptionHandler
  
  Overrides:
  
  setOptions in class AbstractClassifier
  
  Parameters:
  
  options - the list of options as an array of strings
  
  Throws:
  
  Exception - if an option is not supported
- getOptions
  
  public String[] getOptions()
  
  Gets the current settings of the Classifier.
  
  Specified by:
  
  getOptions in interface OptionHandler
  
  Overrides:
  
  getOptions in class AbstractClassifier
  
  Returns:
  
  an array of strings suitable for passing to setOptions
- enumerateMeasures
  
  public Enumeration<String> enumerateMeasures()
  
  Returns an enumeration of the additional measure names
  
  Specified by:
  
  enumerateMeasures in interface AdditionalMeasureProducer
  
  Returns:
  
  an enumeration of the measure names
- getMeasure
  
  public double getMeasure(String additionalMeasureName)
  
  Returns the value of the named measure
  
  Specified by:
  
  getMeasure in interface AdditionalMeasureProducer
  
  Parameters:
  
  additionalMeasureName - the name of the measure to query for its value
  
  Returns:
  
  the value of the named measure
  
  Throws:
  
  IllegalArgumentException - if the named measure is not supported
- foldsTipText
  
  public String foldsTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setFolds
  
  public void setFolds(int fold)
  
  Sets the number of folds to use
  
  Parameters:
  
  fold - the number of folds
- getFolds
  
  public int getFolds()
  
  Gets the number of folds
  
  Returns:
  
  the number of folds
- minNoTipText
  
  public String minNoTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setMinNo
  
  public void setMinNo(double m)
  
  Sets the minimum total weight of the instances in a rule
  
  Parameters:
  
  m - the minimum total weight of the instances in a rule
- getMinNo
  
  public double getMinNo()
  
  Gets the minimum total weight of the instances in a rule
  
  Returns:
  
  the minimum total weight of the instances in a rule
- seedTipText
  
  public String seedTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setSeed
  
  public void setSeed(long s)
  
  Sets the seed value to use in randomizing the data
  
  Parameters:
  
  s - the new seed value
- getSeed
  
  public long getSeed()
  
  Gets the current seed value to use in randomizing the data
  
  Returns:
  
  the seed value
- optimizationsTipText
  
  public String optimizationsTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setOptimizations
  
  public void setOptimizations(int run)
  
  Sets the number of optimization runs
  
  Parameters:
  
  run - the number of optimization runs
- getOptimizations
  
  public int getOptimizations()
  
  Gets the the number of optimization runs
  
  Returns:
  
  the number of optimization runs
- debugTipText
  
  public String debugTipText()
  
  Returns the tip text for this property
  
  Overrides:
  
  debugTipText in class AbstractClassifier
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setDebug
  
  public void setDebug(boolean d)
  
  Sets whether debug information is output to the console
  
  Overrides:
  
  setDebug in class AbstractClassifier
  
  Parameters:
  
  d - whether debug information is output to the console
- getDebug
  
  public boolean getDebug()
  
  Gets whether debug information is output to the console
  
  Overrides:
  
  getDebug in class AbstractClassifier
  
  Returns:
  
  whether debug information is output to the console
- checkErrorRateTipText
  
  public String checkErrorRateTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setCheckErrorRate
  
  public void setCheckErrorRate(boolean d)
  
  Sets whether to check for error rate is in stopping criterion
  
  Parameters:
  
  d - whether to check for error rate is in stopping criterion
- getCheckErrorRate
  
  public boolean getCheckErrorRate()
  
  Gets whether to check for error rate is in stopping criterion
  
  Returns:
  
  true if checking for error rate is in stopping criterion
- usePruningTipText
  
  public String usePruningTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setUsePruning
  
  public void setUsePruning(boolean d)
  
  Sets whether pruning is performed
  
  Parameters:
  
  d - Whether pruning is performed
- getUsePruning
  
  public boolean getUsePruning()
  
  Gets whether pruning is performed
  
  Returns:
  
  true if pruning is performed
- getRuleset
  
  public ArrayList<Rule> getRuleset()
  
  Get the ruleset generated by Ripper
  
  Returns:
  
  the ruleset
- getRuleStats
  
  public RuleStats getRuleStats(int pos)
  
  Get the statistics of the ruleset in the given position
  
  Parameters:
  
  pos - the position of the stats, assuming correct
  
  Returns:
  
  the statistics of the ruleset in the given position
- getCapabilities
  
  public Capabilities getCapabilities()
  
  Returns default capabilities of the classifier.
  Specified by:
  
  getCapabilities in interface CapabilitiesHandler
  
  Specified by:
  
  getCapabilities in interface Classifier
  
  Overrides:
  
  getCapabilities in class AbstractClassifier
  
  Returns:
  
  the capabilities of this classifier
  
  See Also:
  
  Capabilities
- buildClassifier
  
  public void buildClassifier(Instances instances) throws Exception
  
  Builds Ripper in the order of class frequencies. For each class it's built in two stages: building and optimization
  
  Specified by:
  
  buildClassifier in interface Classifier
  
  Parameters:
  
  instances - the training data
  
  Throws:
  
  Exception - if classifier can't be built successfully
- distributionForInstance
  
  public double[] distributionForInstance(Instance datum)
  
  Classify the test instance with the rule learner and provide the class distributions
  
  Specified by:
  
  distributionForInstance in interface Classifier
  
  Overrides:
  
  distributionForInstance in class AbstractClassifier
  
  Parameters:
  
  datum - the instance to be classified
  
  Returns:
  
  the distribution
- toString
  
  public String toString()
  
  Prints the all the rules of the rule learner.
  
  Overrides:
  
  toString in class Object
  
  Returns:
  
  a textual description of the classifier
- getRevision
  
  public String getRevision()
  
  Returns the revision string.
  
  Specified by:
  
  getRevision in interface RevisionHandler
  
  Overrides:
  
  getRevision in class AbstractClassifier
  
  Returns:
  
  the revision
- main
  
  public static void main(String[] args)
  
  Main method.
  
  Parameters:
  
  args - the options for the classifier

Class JRip

Nested Class Summary

Field Summary

Fields inherited from class weka.classifiers.AbstractClassifier

Constructor Summary

Method Summary

Methods inherited from class weka.classifiers.AbstractClassifier

Methods inherited from class java.lang.Object

Constructor Details

JRip

Method Details

globalInfo

getTechnicalInformation

listOptions

setOptions

getOptions

enumerateMeasures

getMeasure

foldsTipText

setFolds

getFolds

minNoTipText

setMinNo

getMinNo

seedTipText

setSeed

getSeed

optimizationsTipText

setOptimizations

getOptimizations

debugTipText

setDebug

getDebug

checkErrorRateTipText

setCheckErrorRate

getCheckErrorRate

usePruningTipText

setUsePruning

getUsePruning

getRuleset

getRuleStats

getCapabilities

buildClassifier

distributionForInstance

toString

getRevision

main