Class RemoveFrequentValues
java.lang.Object
weka.filters.Filter
weka.filters.unsupervised.instance.RemoveFrequentValues
- All Implemented Interfaces:
Serializable
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,RevisionHandler
,WeightedAttributesHandler
,UnsupervisedFilter
public class RemoveFrequentValues
extends Filter
implements OptionHandler, UnsupervisedFilter, WeightedAttributesHandler
Determines which values (frequent or infrequent
ones) of an (nominal) attribute are retained and filters the instances
accordingly. In case of values with the same frequency, they are kept in the
way they appear in the original instances object. E.g. if you have the values
"1,2,3,4" with the frequencies "10,5,5,3" and you chose to keep the 2 most
common values, the values "1,2" would be returned, since the value "2" comes
before "3", even though they have the same frequency.
Valid options are:
-C <num> Choose attribute to be used for selection.
-N <num> Number of values to retain for the specified attribute, i.e. the ones with the most instances (default 2).
-L Instead of values with the most instances the ones with the least are retained.
-H When selecting on nominal attributes, removes header references to excluded values.
-V Invert matching sense.
- Version:
- $Revision: 14508 $
- Author:
- FracPete (fracpete at waikato dot ac dot nz)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyboolean
Signifies that this batch of input to the filter is finished.void
determineValues
(Instances inst) determines the values to retain, it is always at least 1 and up to the maximum number of distinct valuesGet the index of the attribute used.Returns the Capabilities of this filter.boolean
Get whether the supplied columns are to be removed or keptboolean
Gets whether the header will be modified when selecting on nominal attributes.int
Gets how many values are retainedString[]
Gets the current settings of the filter.Returns the revision string.boolean
Gets whether to use values with least or most instancesReturns a string describing this filterboolean
Input an instance for filtering.Returns the tip text for this propertyboolean
Returns true if selection attribute is nominal.Returns an enumeration describing the available options.static void
Main method for testing this class.Returns the tip text for this propertyReturns the tip text for this propertyvoid
setAttributeIndex
(String attIndex) Sets index of the attribute used.boolean
setInputFormat
(Instances instanceInfo) Sets the format of the input instances.void
setInvertSelection
(boolean invert) Set whether selected values should be removed or kept.void
setModifyHeader
(boolean newModifyHeader) Sets whether the header will be modified when selecting on nominal attributes.void
setNumValues
(int numValues) Sets how many values are retainedvoid
setOptions
(String[] options) Parses a given list of options.void
setUseLeastValues
(boolean leastValues) Sets whether to use values with least or most instancesReturns the tip text for this propertyMethods inherited from class weka.filters.Filter
batchFilterFile, debugTipText, doNotCheckCapabilitiesTipText, filterFile, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, postExecution, preExecution, run, runFilter, setDebug, setDoNotCheckCapabilities, toString, useFilter, wekaStaticWrapper
-
Constructor Details
-
RemoveFrequentValues
public RemoveFrequentValues()
-
-
Method Details
-
globalInfo
Returns a string describing this filter- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classFilter
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-C <num> Choose attribute to be used for selection.
-N <num> Number of values to retain for the sepcified attribute, i.e. the ones with the most instances (default 2).
-L Instead of values with the most instances the ones with the least are retained.
-H When selecting on nominal attributes, removes header references to excluded values.
-V Invert matching sense.
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classFilter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classFilter
- Returns:
- an array of strings suitable for passing to setOptions
-
attributeIndexTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndex
Get the index of the attribute used.- Returns:
- the index of the attribute
-
setAttributeIndex
Sets index of the attribute used.- Parameters:
attIndex
- the index of the attribute
-
numValuesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumValues
public int getNumValues()Gets how many values are retained- Returns:
- how many values are retained
-
setNumValues
public void setNumValues(int numValues) Sets how many values are retained- Parameters:
numValues
- the number of values to retain
-
useLeastValuesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUseLeastValues
public boolean getUseLeastValues()Gets whether to use values with least or most instances- Returns:
- true if values with least instances are retained
-
setUseLeastValues
public void setUseLeastValues(boolean leastValues) Sets whether to use values with least or most instances- Parameters:
leastValues
- whether values with least or most instances are retained
-
modifyHeaderTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getModifyHeader
public boolean getModifyHeader()Gets whether the header will be modified when selecting on nominal attributes.- Returns:
- true if so.
-
setModifyHeader
public void setModifyHeader(boolean newModifyHeader) Sets whether the header will be modified when selecting on nominal attributes.- Parameters:
newModifyHeader
- true if so.
-
invertSelectionTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getInvertSelection
public boolean getInvertSelection()Get whether the supplied columns are to be removed or kept- Returns:
- true if the supplied columns will be kept
-
setInvertSelection
public void setInvertSelection(boolean invert) Set whether selected values should be removed or kept. If true the selected values are kept and unselected values are deleted.- Parameters:
invert
- the new invert setting
-
isNominal
public boolean isNominal()Returns true if selection attribute is nominal.- Returns:
- true if selection attribute is nominal
-
determineValues
determines the values to retain, it is always at least 1 and up to the maximum number of distinct values- Parameters:
inst
- the Instances to determine the values from which are kept
-
getCapabilities
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classFilter
- Returns:
- the capabilities of this object
- See Also:
-
setInputFormat
Sets the format of the input instances.- Overrides:
setInputFormat
in classFilter
- Parameters:
instanceInfo
- an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).- Returns:
- true if the outputFormat can be collected immediately
- Throws:
UnsupportedAttributeTypeException
- if the specified attribute is not nominal.Exception
- if the inputFormat can't be set successfully
-
input
Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.- Overrides:
input
in classFilter
- Parameters:
instance
- the input instance- Returns:
- true if the filtered instance may now be collected with output().
- Throws:
IllegalStateException
- if no input format has been set.
-
batchFinished
public boolean batchFinished()Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.- Overrides:
batchFinished
in classFilter
- Returns:
- true if there are instances pending output
- Throws:
IllegalStateException
- if no input structure has been defined
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classFilter
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- should contain arguments to the filter: use -h for help
-