Class InterquartileRange
java.lang.Object
weka.filters.Filter
weka.filters.SimpleFilter
weka.filters.SimpleBatchFilter
weka.filters.unsupervised.attribute.InterquartileRange
- All Implemented Interfaces:
Serializable
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,RevisionHandler
,WeightedAttributesHandler
A filter for detecting outliers and extreme values
based on interquartile ranges. The filter skips the class attribute.
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor Valid options are:
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor Valid options are:
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)Thanks to Dale for a few brainstorming sessions.
- Version:
- $Revision: 15447 $
- Author:
- Dale Fletcher (dale at cs dot waikato dot ac dot nz), fracpete (fracpete at waikato dot ac dot nz)
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
enum for obtaining the various determined IQR values. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
indicator for non-numeric attributes -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyGets the current range selectionReturns the Capabilities of this filter.boolean
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").boolean
Get whether extreme values are also tagged as outliers.double
Gets the factor for determining the thresholds for extreme values.String[]
Gets the current settings of the filter.double
Gets the factor for determining the thresholds for outliers.boolean
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.Returns the revision string.double[]
Returns the values for the specified type.Returns a string describing this filterReturns an enumeration describing the available options.static void
Main method for testing this class.Returns the tip text for this propertyReturns the tip text for this propertyvoid
setAttributeIndices
(String value) Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).void
setAttributeIndicesArray
(int[] value) Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).void
setDetectionPerAttribute
(boolean value) Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").void
setExtremeValuesAsOutliers
(boolean value) Set whether extreme values are also tagged as outliers.void
setExtremeValuesFactor
(double value) Sets the factor for determining the thresholds for extreme values.void
setOptions
(String[] options) Parses a list of options for this object.void
setOutlierFactor
(double value) Sets the factor for determining the thresholds for outliers.void
setOutputOffsetMultiplier
(boolean value) Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.Methods inherited from class weka.filters.SimpleBatchFilter
allowAccessToFullInputFormat, batchFinished, input, input
Methods inherited from class weka.filters.SimpleFilter
setInputFormat
Methods inherited from class weka.filters.Filter
batchFilterFile, debugTipText, doNotCheckCapabilitiesTipText, filterFile, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, postExecution, preExecution, run, runFilter, setDebug, setDoNotCheckCapabilities, toString, useFilter, wekaStaticWrapper
-
Field Details
-
NON_NUMERIC
public static final int NON_NUMERICindicator for non-numeric attributes- See Also:
-
-
Constructor Details
-
InterquartileRange
public InterquartileRange()
-
-
Method Details
-
globalInfo
Returns a string describing this filter- Specified by:
globalInfo
in classSimpleFilter
- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classFilter
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a list of options for this object. Valid options are:-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classFilter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classFilter
- Returns:
- an array of strings suitable for passing to setOptions
-
attributeIndicesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
Gets the current range selection- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value
- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
IllegalArgumentException
- if an invalid range list is supplied
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] value) Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used).- Parameters:
value
- an array containing indexes of attributes to work on. Since the array will typically come from a program, attributes are indexed from 0.- Throws:
IllegalArgumentException
- if an invalid set of ranges is supplied
-
outlierFactorTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutlierFactor
public void setOutlierFactor(double value) Sets the factor for determining the thresholds for outliers.- Parameters:
value
- the factor.
-
getOutlierFactor
public double getOutlierFactor()Gets the factor for determining the thresholds for outliers.- Returns:
- the factor.
-
extremeValuesFactorTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesFactor
public void setExtremeValuesFactor(double value) Sets the factor for determining the thresholds for extreme values.- Parameters:
value
- the factor.
-
getExtremeValuesFactor
public double getExtremeValuesFactor()Gets the factor for determining the thresholds for extreme values.- Returns:
- the factor.
-
extremeValuesAsOutliersTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setExtremeValuesAsOutliers
public void setExtremeValuesAsOutliers(boolean value) Set whether extreme values are also tagged as outliers.- Parameters:
value
- whether or not to tag extreme values also as outliers.
-
getExtremeValuesAsOutliers
public boolean getExtremeValuesAsOutliers()Get whether extreme values are also tagged as outliers.- Returns:
- true if extreme values are also tagged as outliers.
-
detectionPerAttributeTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDetectionPerAttribute
public void setDetectionPerAttribute(boolean value) Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Parameters:
value
- whether or not to generate indicator attribute pairs for each numeric attribute.
-
getDetectionPerAttribute
public boolean getDetectionPerAttribute()Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false").- Returns:
- true if indicator attribute pairs are generated for each numeric attribute.
-
outputOffsetMultiplierTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutputOffsetMultiplier
public void setOutputOffsetMultiplier(boolean value) Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Parameters:
value
- whether or not to generate the additional attribute.
-
getOutputOffsetMultiplier
public boolean getOutputOffsetMultiplier()Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR.- Returns:
- true if the additional attribute is generated.
-
getCapabilities
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classFilter
- Returns:
- the capabilities of this object
- See Also:
-
getValues
Returns the values for the specified type.- Parameters:
type
- the type of values to return- Returns:
- the values
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classFilter
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
args
- should contain arguments to the filter: use -h for help
-