Class MergeNominalValues
java.lang.Object
weka.filters.Filter
weka.filters.SimpleFilter
weka.filters.SimpleBatchFilter
weka.filters.supervised.attribute.MergeNominalValues
- All Implemented Interfaces:
Serializable
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
,WeightedAttributesHandler
,WeightedInstancesHandler
,SupervisedFilter
public class MergeNominalValues
extends SimpleBatchFilter
implements SupervisedFilter, WeightedInstancesHandler, WeightedAttributesHandler, TechnicalInformationHandler
Merges values of all nominal attributes among the
specified attributes, excluding the class attribute, using the CHAID method,
but without considering re-splitting of merged subsets. It implements Steps 1 and
2 described by Kass (1980), see
Gordon V. Kass (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics. 29(2):119-127.
Once attribute values have been merged, a chi-squared test using the Bonferroni correction is applied to check if the resulting attribute is a valid predictor, based on the Bonferroni multiplier in Equation 3.2 in Kass (1980). If an attribute does not pass this test, all remaining values (if any) are merged. Nevertheless, useless predictors can slip through without being fully merged, e.g. identifier attributes.
The code applies the Yates correction when the chi-squared statistic is computed.
Note that the algorithm is quadratic in the number of attribute values for an attribute. Valid options are:
Gordon V. Kass (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics. 29(2):119-127.
Once attribute values have been merged, a chi-squared test using the Bonferroni correction is applied to check if the resulting attribute is a valid predictor, based on the Bonferroni multiplier in Equation 3.2 in Kass (1980). If an attribute does not pass this test, all remaining values (if any) are merged. Nevertheless, useless predictors can slip through without being fully merged, e.g. identifier attributes.
The code applies the Yates correction when the chi-squared statistic is computed.
Note that the algorithm is quadratic in the number of attribute values for an attribute. Valid options are:
-D Turns on output of debugging information.
-L <double> The significance level (default: 0.05).
-R <range> Sets list of attributes to act on (or its inverse). 'first and 'last' are accepted as well.' E.g.: first-5,7,9,20-last (default: first-last)
-V Invert matching sense (i.e. act on all attributes not specified in list)
-O Use short identifiers for merged subsets.
- Version:
- $Revision: 14508 $
- Author:
- Eibe Frank
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
We need access to the full input data in determineOutputFormat.Returns the tip text for this propertyGet the current range selection.Returns the Capabilities of this filter.boolean
Get whether the supplied attributes are to be acted on or all other attributes.String[]
Gets the current settings of the filter.Returns the revision string.double
Gets the significance level.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
Get whether short identifiers are to be output.Returns a string describing this filter.Returns the tip text for this propertyReturns an enumeration describing the available options.static void
runs the filter with the given argumentsvoid
setAttributeIndices
(String rangeList) Set which attributes are to be acted on (or not, if invert is true)void
setAttributeIndicesArray
(int[] attributes) Set which attributes are to be acted on (or not, if invert is true)void
setInvertSelection
(boolean invert) Set whether selected attributes should be acted on or all other attributes.void
setOptions
(String[] options) Parses a given list of options.void
setSignificanceLevel
(double sF) Sets the significance level.void
setUseShortIdentifiers
(boolean b) Set whether to output short identifiers for merged values.Returns the tip text for this propertyReturns the tip text for this propertyMethods inherited from class weka.filters.SimpleBatchFilter
batchFinished, input, input
Methods inherited from class weka.filters.SimpleFilter
setInputFormat
Methods inherited from class weka.filters.Filter
batchFilterFile, debugTipText, doNotCheckCapabilitiesTipText, filterFile, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, postExecution, preExecution, run, runFilter, setDebug, setDoNotCheckCapabilities, toString, useFilter, wekaStaticWrapper
-
Constructor Details
-
MergeNominalValues
public MergeNominalValues()
-
-
Method Details
-
globalInfo
Returns a string describing this filter.- Specified by:
globalInfo
in classSimpleFilter
- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classFilter
- Returns:
- an enumeration of all the available options.
-
getOptions
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classFilter
- Returns:
- an array of strings suitable for passing to setOptions
-
setOptions
Parses a given list of options. Valid options are:-D Turns on output of debugging information.
-L <double> The significance level (default: 0.05).
-R <range> Sets list of attributes to act on (or its inverse). 'first and 'last' are accepted as well.' E.g.: first-5,7,9,20-last (default: first-last)
-V Invert matching sense (i.e. act on all attributes not specified in list)
-O Use short identifiers for merged subsets.
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classFilter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
significanceLevelTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getSignificanceLevel
public double getSignificanceLevel()Gets the significance level.- Returns:
- int the significance level.
-
setSignificanceLevel
public void setSignificanceLevel(double sF) Sets the significance level.- Parameters:
sF
- the significance level as a double.
-
attributeIndicesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
Get the current range selection.- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
Set which attributes are to be acted on (or not, if invert is true)- Parameters:
rangeList
- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] attributes) Set which attributes are to be acted on (or not, if invert is true)- Parameters:
attributes
- an array containing indexes of attributes to select. Since the array will typically come from a program, attributes are indexed from 0.
-
invertSelectionTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getInvertSelection
public boolean getInvertSelection()Get whether the supplied attributes are to be acted on or all other attributes.- Returns:
- true if the supplied attributes will be kept
-
setInvertSelection
public void setInvertSelection(boolean invert) Set whether selected attributes should be acted on or all other attributes.- Parameters:
invert
- the new invert setting
-
useShortIdentifiersTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUseShortIdentifiers
public boolean getUseShortIdentifiers()Get whether short identifiers are to be output.- Returns:
- true if short IDs are output
-
setUseShortIdentifiers
public void setUseShortIdentifiers(boolean b) Set whether to output short identifiers for merged values.- Parameters:
b
- if true, short IDs are output
-
allowAccessToFullInputFormat
public boolean allowAccessToFullInputFormat()We need access to the full input data in determineOutputFormat.- Overrides:
allowAccessToFullInputFormat
in classSimpleBatchFilter
- Returns:
- whether determineOutputFormat has access to the full input dataset
-
getCapabilities
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classFilter
- Returns:
- the capabilities of this object
- See Also:
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classFilter
- Returns:
- the revision
-
main
runs the filter with the given arguments- Parameters:
args
- the commandline arguments
-