Class Discretize
java.lang.Object
weka.filters.Filter
weka.filters.supervised.attribute.Discretize
- All Implemented Interfaces:
Serializable
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,RevisionHandler
,TechnicalInformationHandler
,WeightedAttributesHandler
,WeightedInstancesHandler
,SupervisedFilter
public class Discretize
extends Filter
implements SupervisedFilter, OptionHandler, WeightedInstancesHandler, WeightedAttributesHandler, TechnicalInformationHandler
An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the default).
For more information, see:
Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of continuousvalued attributes for classification learning. In: Thirteenth International Joint Conference on Articial Intelligence, 1022-1027, 1993.
Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, 1034-1040, 1995. BibTeX:
For more information, see:
Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of continuousvalued attributes for classification learning. In: Thirteenth International Joint Conference on Articial Intelligence, 1022-1027, 1993.
Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, 1034-1040, 1995. BibTeX:
@inproceedings{Fayyad1993, author = {Usama M. Fayyad and Keki B. Irani}, booktitle = {Thirteenth International Joint Conference on Articial Intelligence}, pages = {1022-1027}, publisher = {Morgan Kaufmann Publishers}, title = {Multi-interval discretization of continuousvalued attributes for classification learning}, volume = {2}, year = {1993} } @inproceedings{Kononenko1995, author = {Igor Kononenko}, booktitle = {14th International Joint Conference on Articial Intelligence}, pages = {1034-1040}, title = {On Biases in Estimating Multi-Valued Attributes}, year = {1995}, PS = {http://ai.fri.uni-lj.si/papers/kononenko95-ijcai.ps.gz} }Valid options are:
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default none)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
-Y Use bin numbers rather than ranges for discretized attributes.
-E Use better encoding of split point for MDL.
-K Use Kononenko's MDL criterion.
-precision <integer> Precision for bin boundary labels. (default = 6 decimal places).
-spread-attribute-weight When generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.
- Version:
- $Revision: 14509 $
- Author:
- Len Trigg (trigg@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyboolean
Signifies that this batch of input to the filter is finished.Returns the tip text for this propertyGets the current range selectionint
Get the precision for bin boundaries.getBinRangesString
(int attributeIndex) Gets the bin ranges string for an attributeReturns the Capabilities of this filter.double[]
getCutPoints
(int attributeIndex) Gets the cut points for an attributeboolean
Gets whether the supplied columns are to be removed or keptboolean
Gets whether binary attributes should be made for discretized ones.String[]
Gets the current settings of the filter.Returns the revision string.boolean
If true, when generating binary attributes, spread weight of old attribute across new attributes.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
Gets whether better encoding is to be used for MDL.boolean
Gets whether bin numbers rather than ranges should be used for discretized attributes.boolean
Gets whether Kononenko's MDL criterion is to be used.Returns a string describing this filterboolean
Input an instance for filtering.Returns the tip text for this propertyGets an enumeration describing the available options.static void
Main method for testing this class.Returns the tip text for this propertyvoid
setAttributeIndices
(String rangeList) Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).void
setAttributeIndicesArray
(int[] attributes) Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).void
setBinRangePrecision
(int p) Set the precision for bin boundaries.boolean
setInputFormat
(Instances instanceInfo) Sets the format of the input instances.void
setInvertSelection
(boolean invert) Sets whether selected columns should be removed or kept.void
setMakeBinary
(boolean makeBinary) Sets whether binary attributes should be made for discretized ones.void
setOptions
(String[] options) Parses a given list of options.void
setSpreadAttributeWeight
(boolean p) If true, when generating binary attributes, spread weight of old attribute across new attributes.void
setUseBetterEncoding
(boolean useBetterEncoding) Sets whether better encoding is to be used for MDL.void
setUseBinNumbers
(boolean useBinNumbers) Sets whether bin numbers rather than ranges should be used for discretized attributes.void
setUseKononenko
(boolean useKon) Sets whether Kononenko's MDL criterion is to be used.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyMethods inherited from class weka.filters.Filter
batchFilterFile, debugTipText, doNotCheckCapabilitiesTipText, filterFile, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, postExecution, preExecution, run, runFilter, setDebug, setDoNotCheckCapabilities, toString, useFilter, wekaStaticWrapper
-
Constructor Details
-
Discretize
public Discretize()Constructor - initialises the filter
-
-
Method Details
-
listOptions
Gets an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classFilter
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default none)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
-Y Use bin numbers rather than ranges for discretized attributes.
-E Use better encoding of split point for MDL.
-K Use Kononenko's MDL criterion.
-precision <integer> Precision for bin boundary labels. (default = 6 decimal places).
-spread-attribute-weight When generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classFilter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classFilter
- Returns:
- an array of strings suitable for passing to setOptions
-
getCapabilities
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classFilter
- Returns:
- the capabilities of this object
- See Also:
-
setInputFormat
Sets the format of the input instances.- Overrides:
setInputFormat
in classFilter
- Parameters:
instanceInfo
- an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).- Returns:
- true if the outputFormat may be collected immediately
- Throws:
Exception
- if the input format can't be set successfully
-
input
Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.- Overrides:
input
in classFilter
- Parameters:
instance
- the input instance- Returns:
- true if the filtered instance may now be collected with output().
- Throws:
IllegalStateException
- if no input format has been defined.
-
batchFinished
public boolean batchFinished()Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.- Overrides:
batchFinished
in classFilter
- Returns:
- true if there are instances pending output
- Throws:
IllegalStateException
- if no input structure has been defined
-
globalInfo
Returns a string describing this filter- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
spreadAttributeWeightTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSpreadAttributeWeight
public void setSpreadAttributeWeight(boolean p) If true, when generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.- Parameters:
p
- whether weight is spread
-
getSpreadAttributeWeight
public boolean getSpreadAttributeWeight()If true, when generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.- Returns:
- whether weight is spread
-
binRangePrecisionTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setBinRangePrecision
public void setBinRangePrecision(int p) Set the precision for bin boundaries. Only affects the boundary values used in the labels for the converted attributes; internal cutpoints are at full double precision.- Parameters:
p
- the precision for bin boundaries
-
getBinRangePrecision
public int getBinRangePrecision()Get the precision for bin boundaries. Only affects the boundary values used in the labels for the converted attributes; internal cutpoints are at full double precision.- Returns:
- the precision for bin boundaries
-
makeBinaryTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getMakeBinary
public boolean getMakeBinary()Gets whether binary attributes should be made for discretized ones.- Returns:
- true if attributes will be binarized
-
setMakeBinary
public void setMakeBinary(boolean makeBinary) Sets whether binary attributes should be made for discretized ones.- Parameters:
makeBinary
- if binary attributes are to be made
-
useBinNumbersTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUseBinNumbers
public boolean getUseBinNumbers()Gets whether bin numbers rather than ranges should be used for discretized attributes.- Returns:
- true if bin numbers should be used
-
setUseBinNumbers
public void setUseBinNumbers(boolean useBinNumbers) Sets whether bin numbers rather than ranges should be used for discretized attributes.- Parameters:
useBinNumbers
- if bin numbers should be used
-
useKononenkoTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUseKononenko
public boolean getUseKononenko()Gets whether Kononenko's MDL criterion is to be used.- Returns:
- true if Kononenko's criterion will be used.
-
setUseKononenko
public void setUseKononenko(boolean useKon) Sets whether Kononenko's MDL criterion is to be used.- Parameters:
useKon
- true if Kononenko's one is to be used
-
useBetterEncodingTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getUseBetterEncoding
public boolean getUseBetterEncoding()Gets whether better encoding is to be used for MDL.- Returns:
- true if the better MDL encoding will be used
-
setUseBetterEncoding
public void setUseBetterEncoding(boolean useBetterEncoding) Sets whether better encoding is to be used for MDL.- Parameters:
useBetterEncoding
- true if better encoding to be used.
-
invertSelectionTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getInvertSelection
public boolean getInvertSelection()Gets whether the supplied columns are to be removed or kept- Returns:
- true if the supplied columns will be kept
-
setInvertSelection
public void setInvertSelection(boolean invert) Sets whether selected columns should be removed or kept. If true the selected columns are kept and unselected columns are deleted. If false selected columns are deleted and unselected columns are kept.- Parameters:
invert
- the new invert setting
-
attributeIndicesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getAttributeIndices
Gets the current range selection- Returns:
- a string containing a comma separated list of ranges
-
setAttributeIndices
Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).- Parameters:
rangeList
- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
IllegalArgumentException
- if an invalid range list is supplied
-
setAttributeIndicesArray
public void setAttributeIndicesArray(int[] attributes) Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).- Parameters:
attributes
- an array containing indexes of attributes to Discretize. Since the array will typically come from a program, attributes are indexed from 0.- Throws:
IllegalArgumentException
- if an invalid set of ranges is supplied
-
getCutPoints
public double[] getCutPoints(int attributeIndex) Gets the cut points for an attribute- Parameters:
attributeIndex
- the index (from 0) of the attribute to get the cut points of- Returns:
- an array containing the cutpoints (or null if the attribute requested isn't being Discretized
-
getBinRangesString
Gets the bin ranges string for an attribute- Parameters:
attributeIndex
- the index (from 0) of the attribute to get the bin ranges string of- Returns:
- the bin ranges string (or null if the attribute requested has been discretized into only one interval.)
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classFilter
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- should contain arguments to the filter: use -h for help
-