Class RandomProjection
java.lang.Object
weka.filters.Filter
weka.filters.unsupervised.attribute.RandomProjection
- All Implemented Interfaces:
Serializable
,CapabilitiesHandler
,CapabilitiesIgnorer
,CommandlineRunnable
,OptionHandler
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
,WeightedInstancesHandler
,UnsupervisedFilter
public class RandomProjection
extends Filter
implements UnsupervisedFilter, OptionHandler, TechnicalInformationHandler, Randomizable, WeightedInstancesHandler
Reduces the dimensionality of the data by projecting it onto a lower dimensional subspace using a random matrix with columns of unit length. It will reduce the number of attributes in the data while preserving much of its variation like PCA, but at a much less computational cost.
It first applies the NominalToBinary filter to convert all attributes to numeric before reducing the dimension. It preserves the class attribute.
For more information, see:
Dmitriy Fradkin, David Madigan: Experiments with random projections for machine learning. In: KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 517-522, 003. BibTeX:
It first applies the NominalToBinary filter to convert all attributes to numeric before reducing the dimension. It preserves the class attribute.
For more information, see:
Dmitriy Fradkin, David Madigan: Experiments with random projections for machine learning. In: KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 517-522, 003. BibTeX:
@inproceedings{Fradkin003, address = {New York, NY, USA}, author = {Dmitriy Fradkin and David Madigan}, booktitle = {KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining}, pages = {517-522}, publisher = {ACM Press}, title = {Experiments with random projections for machine learning}, year = {003} }Valid options are:
-N <number> The number of dimensions (attributes) the data should be reduced to (default 10; exclusive of the class attribute, if it is set).
-D [SPARSE1|SPARSE2|GAUSSIAN] The distribution to use for calculating the random matrix. Sparse1 is: sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)} Sparse2 is: {-1 with prob(1/2), +1 with prob(1/2)}
-P <percent> The percentage of dimensions (attributes) the data should be reduced to (exclusive of the class attribute, if it is set). The -N option is ignored if this option is present and is greater than zero.
-M Replace missing values using the ReplaceMissingValues filter instead of just skipping them.
-R <num> The random seed for the random number generator used for calculating the random matrix (default 42).
- Version:
- $Revision: 14609 $ [1.0 - 22 July 2003 - Initial version (Ashraf M. Kibriya)]
- Author:
- Ashraf M. Kibriya (amk14@cs.waikato.ac.nz)
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
distribution type: gaussianstatic final int
distribution type: sparse 1static final int
distribution type: sparse 2static final Tag[]
The types of distributions that can be used for calculating the random matrix -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
Signify that this batch of input to the filter is finished.Returns the tip text for this propertyReturns the Capabilities of this filter.Returns the current distribution that'll be used for calculating the random matrixint
Gets the current number of attributes (dimensionality) to which the data will be reduced to.String[]
Gets the current settings of the filter.double
Gets the percent the attributes (dimensions) of the data will be reduced toboolean
Gets the current setting for using ReplaceMissingValues filterReturns the revision string.int
getSeed()
Gets the random seed of the random number generatorReturns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.Returns a string describing this filterboolean
Input an instance for filtering.Returns an enumeration describing the available options.static void
Main method for testing this class.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyvoid
setDistribution
(SelectedTag newDstr) Sets the distribution to use for calculating the random matrixboolean
setInputFormat
(Instances instanceInfo) Sets the format of the input instances.void
setNumberOfAttributes
(int newAttNum) Sets the number of attributes (dimensions) the data should be reduced tovoid
setOptions
(String[] options) Parses a given list of options.void
setPercent
(double newPercent) Sets the percent the attributes (dimensions) of the data should be reduced tovoid
setReplaceMissingValues
(boolean t) Sets either to use replace missing values filter or notvoid
setSeed
(int seed) Sets the random seed of the random number generatorMethods inherited from class weka.filters.Filter
batchFilterFile, debugTipText, doNotCheckCapabilitiesTipText, filterFile, getCapabilities, getCopyOfInputFormat, getDebug, getDoNotCheckCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, mayRemoveInstanceAfterFirstBatchDone, numPendingOutput, output, outputPeek, postExecution, preExecution, run, runFilter, setDebug, setDoNotCheckCapabilities, toString, useFilter, wekaStaticWrapper
-
Field Details
-
SPARSE1
public static final int SPARSE1distribution type: sparse 1- See Also:
-
SPARSE2
public static final int SPARSE2distribution type: sparse 2- See Also:
-
GAUSSIAN
public static final int GAUSSIANdistribution type: gaussian- See Also:
-
TAGS_DSTRS_TYPE
The types of distributions that can be used for calculating the random matrix
-
-
Constructor Details
-
RandomProjection
public RandomProjection()
-
-
Method Details
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classFilter
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-N <number> The number of dimensions (attributes) the data should be reduced to (default 10; exclusive of the class attribute, if it is set).
-D [SPARSE1|SPARSE2|GAUSSIAN] The distribution to use for calculating the random matrix. Sparse1 is: sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)} Sparse2 is: {-1 with prob(1/2), +1 with prob(1/2)}
-P <percent> The percentage of dimensions (attributes) the data should be reduced to (exclusive of the class attribute, if it is set). The -N option is ignored if this option is present and is greater than zero.
-M Replace missing values using the ReplaceMissingValues filter instead of just skipping them.
-R <num> The random seed for the random number generator used for calculating the random matrix (default 42).
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classFilter
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the filter.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classFilter
- Returns:
- an array of strings suitable for passing to setOptions
-
globalInfo
Returns a string describing this filter- Returns:
- a description of the filter suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
numberOfAttributesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumberOfAttributes
public void setNumberOfAttributes(int newAttNum) Sets the number of attributes (dimensions) the data should be reduced to- Parameters:
newAttNum
- the goal for the dimensions
-
getNumberOfAttributes
public int getNumberOfAttributes()Gets the current number of attributes (dimensionality) to which the data will be reduced to.- Returns:
- the number of dimensions
-
percentTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setPercent
public void setPercent(double newPercent) Sets the percent the attributes (dimensions) of the data should be reduced to- Parameters:
newPercent
- the percentage of attributes
-
getPercent
public double getPercent()Gets the percent the attributes (dimensions) of the data will be reduced to- Returns:
- the percentage of attributes
-
seedTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setSeed
public void setSeed(int seed) Sets the random seed of the random number generator- Specified by:
setSeed
in interfaceRandomizable
- Parameters:
seed
- the random seed value
-
getSeed
public int getSeed()Gets the random seed of the random number generator- Specified by:
getSeed
in interfaceRandomizable
- Returns:
- the random seed value
-
distributionTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDistribution
Sets the distribution to use for calculating the random matrix- Parameters:
newDstr
- the distribution to use
-
getDistribution
Returns the current distribution that'll be used for calculating the random matrix- Returns:
- the current distribution
-
replaceMissingValuesTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setReplaceMissingValues
public void setReplaceMissingValues(boolean t) Sets either to use replace missing values filter or not- Parameters:
t
- if true then the replace missing values is used
-
getReplaceMissingValues
public boolean getReplaceMissingValues()Gets the current setting for using ReplaceMissingValues filter- Returns:
- true if the replace missing values filter is used
-
getCapabilities
Returns the Capabilities of this filter.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classFilter
- Returns:
- the capabilities of this object
- See Also:
-
setInputFormat
Sets the format of the input instances.- Overrides:
setInputFormat
in classFilter
- Parameters:
instanceInfo
- an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).- Returns:
- true if the outputFormat may be collected immediately
- Throws:
Exception
- if the input format can't be set successfully
-
input
Input an instance for filtering.- Overrides:
input
in classFilter
- Parameters:
instance
- the input instance- Returns:
- true if the filtered instance may now be collected with output().
- Throws:
IllegalStateException
- if no input format has been setNullPointerException
- if the input format has not been defined.Exception
- if the input instance was not of the correct format or if there was a problem with the filtering.
-
batchFinished
Signify that this batch of input to the filter is finished.- Overrides:
batchFinished
in classFilter
- Returns:
- true if there are instances pending output
- Throws:
NullPointerException
- if no input structure has been defined,Exception
- if there was a problem finishing the batch.
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classFilter
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- should contain arguments to the filter: use -h for help
-