Class InterquartileRange

All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, WeightedAttributesHandler

public class InterquartileRange extends SimpleBatchFilter implements WeightedAttributesHandler
A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.

Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR

Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR

Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor

Valid options are:

 -D
  Turns on output of debugging information.
 
 -R <col1,col2-col4,...>
  Specifies list of columns to base outlier/extreme value detection
  on. If an instance is considered in at least one of those
  attributes an outlier/extreme value, it is tagged accordingly.
  'first' and 'last' are valid indexes.
  (default none)
 
 -O <num>
  The factor for outlier detection.
  (default: 3)
 
 -E <num>
  The factor for extreme values detection.
  (default: 2*Outlier Factor)
 
 -E-as-O
  Tags extreme values also as outliers.
  (default: off)
 
 -P
  Generates Outlier/ExtremeValue pair for each numeric attribute in
  the range, not just a single indicator pair for all the attributes.
  (default: off)
 
 -M
  Generates an additional attribute 'Offset' per Outlier/ExtremeValue
  pair that contains the multiplier that the value is off the median.
     value = median + 'multiplier' * IQR
 Note: implicitely sets '-P'. (default: off)
 
Thanks to Dale for a few brainstorming sessions.
Version:
$Revision: 15447 $
Author:
Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
See Also:
  • Field Details

    • NON_NUMERIC

      public static final int NON_NUMERIC
      indicator for non-numeric attributes
      See Also:
  • Constructor Details

    • InterquartileRange

      public InterquartileRange()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter
      Specified by:
      globalInfo in class SimpleFilter
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a list of options for this object.

      Valid options are:

       -D
        Turns on output of debugging information.
       
       -R <col1,col2-col4,...>
        Specifies list of columns to base outlier/extreme value detection
        on. If an instance is considered in at least one of those
        attributes an outlier/extreme value, it is tagged accordingly.
        'first' and 'last' are valid indexes.
        (default none)
       
       -O <num>
        The factor for outlier detection.
        (default: 3)
       
       -E <num>
        The factor for extreme values detection.
        (default: 2*Outlier Factor)
       
       -E-as-O
        Tags extreme values also as outliers.
        (default: off)
       
       -P
        Generates Outlier/ExtremeValue pair for each numeric attribute in
        the range, not just a single indicator pair for all the attributes.
        (default: off)
       
       -M
        Generates an additional attribute 'Offset' per Outlier/ExtremeValue
        pair that contains the multiplier that the value is off the median.
           value = median + 'multiplier' * IQR
       Note: implicitely sets '-P'. (default: off)
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • attributeIndicesTipText

      public String attributeIndicesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getAttributeIndices

      public String getAttributeIndices()
      Gets the current range selection
      Returns:
      a string containing a comma separated list of ranges
    • setAttributeIndices

      public void setAttributeIndices(String value)
      Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).
      Parameters:
      value - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
      eg: first-3,5,6-last
      Throws:
      IllegalArgumentException - if an invalid range list is supplied
    • setAttributeIndicesArray

      public void setAttributeIndicesArray(int[] value)
      Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).
      Parameters:
      value - an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.
      Throws:
      IllegalArgumentException - if an invalid set of ranges is supplied
    • outlierFactorTipText

      public String outlierFactorTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setOutlierFactor

      public void setOutlierFactor(double value)
      Sets the factor for determining the thresholds for outliers.
      Parameters:
      value - the factor.
    • getOutlierFactor

      public double getOutlierFactor()
      Gets the factor for determining the thresholds for outliers.
      Returns:
      the factor.
    • extremeValuesFactorTipText

      public String extremeValuesFactorTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setExtremeValuesFactor

      public void setExtremeValuesFactor(double value)
      Sets the factor for determining the thresholds for extreme values.
      Parameters:
      value - the factor.
    • getExtremeValuesFactor

      public double getExtremeValuesFactor()
      Gets the factor for determining the thresholds for extreme values.
      Returns:
      the factor.
    • extremeValuesAsOutliersTipText

      public String extremeValuesAsOutliersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setExtremeValuesAsOutliers

      public void setExtremeValuesAsOutliers(boolean value)
      Set whether extreme values are also tagged as outliers.
      Parameters:
      value - whether or not to tag extreme values also as outliers.
    • getExtremeValuesAsOutliers

      public boolean getExtremeValuesAsOutliers()
      Get whether extreme values are also tagged as outliers.
      Returns:
      true if extreme values are also tagged as outliers.
    • detectionPerAttributeTipText

      public String detectionPerAttributeTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDetectionPerAttribute

      public void setDetectionPerAttribute(boolean value)
      Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").
      Parameters:
      value - whether or not to generate indicator attribute pairs for each numeric attribute.
    • getDetectionPerAttribute

      public boolean getDetectionPerAttribute()
      Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").
      Returns:
      true if indicator attribute pairs are generated for each numeric attribute.
    • outputOffsetMultiplierTipText

      public String outputOffsetMultiplierTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setOutputOffsetMultiplier

      public void setOutputOffsetMultiplier(boolean value)
      Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.
      Parameters:
      value - whether or not to generate the additional attribute.
    • getOutputOffsetMultiplier

      public boolean getOutputOffsetMultiplier()
      Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.
      Returns:
      true if the additional attribute is generated.
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • getValues

      public double[] getValues(InterquartileRange.ValueType type)
      Returns the values for the specified type.
      Parameters:
      type - the type of values to return
      Returns:
      the values
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing this class.
      Parameters:
      args - should contain arguments to the filter: use -h for help