Class RandomProjection

java.lang.Object
weka.filters.Filter
weka.filters.unsupervised.attribute.RandomProjection
All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler, UnsupervisedFilter

Reduces the dimensionality of the data by projecting it onto a lower dimensional subspace using a random matrix with columns of unit length. It will reduce the number of attributes in the data while preserving much of its variation like PCA, but at a much less computational cost.
It first applies the NominalToBinary filter to convert all attributes to numeric before reducing the dimension. It preserves the class attribute.

For more information, see:

Dmitriy Fradkin, David Madigan: Experiments with random projections for machine learning. In: KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, New York, NY, USA, 517-522, 003.

BibTeX:

 @inproceedings{Fradkin003,
    address = {New York, NY, USA},
    author = {Dmitriy Fradkin and David Madigan},
    booktitle = {KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining},
    pages = {517-522},
    publisher = {ACM Press},
    title = {Experiments with random projections for machine learning},
    year = {003}
 }
 

Valid options are:

 -N <number>
  The number of dimensions (attributes) the data should be reduced to
  (default 10; exclusive of the class attribute, if it is set).
 -D [SPARSE1|SPARSE2|GAUSSIAN]
  The distribution to use for calculating the random matrix.
  Sparse1 is:
    sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)}
  Sparse2 is:
    {-1 with prob(1/2), +1 with prob(1/2)}
 
 -P <percent>
  The percentage of dimensions (attributes) the data should
  be reduced to (exclusive of the class attribute, if it is set). The -N
  option is ignored if this option is present and is greater
  than zero.
 -M
  Replace missing values using the ReplaceMissingValues filter instead of just skipping them.
 -R <num>
  The random seed for the random number generator used for
  calculating the random matrix (default 42).
Version:
$Revision: 14609 $ [1.0 - 22 July 2003 - Initial version (Ashraf M. Kibriya)]
Author:
Ashraf M. Kibriya (amk14@cs.waikato.ac.nz)
See Also:
  • Field Details

    • SPARSE1

      public static final int SPARSE1
      distribution type: sparse 1
      See Also:
    • SPARSE2

      public static final int SPARSE2
      distribution type: sparse 2
      See Also:
    • GAUSSIAN

      public static final int GAUSSIAN
      distribution type: gaussian
      See Also:
    • TAGS_DSTRS_TYPE

      public static final Tag[] TAGS_DSTRS_TYPE
      The types of distributions that can be used for calculating the random matrix
  • Constructor Details

    • RandomProjection

      public RandomProjection()
  • Method Details

    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N <number>
        The number of dimensions (attributes) the data should be reduced to
        (default 10; exclusive of the class attribute, if it is set).
       -D [SPARSE1|SPARSE2|GAUSSIAN]
        The distribution to use for calculating the random matrix.
        Sparse1 is:
          sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)}
        Sparse2 is:
          {-1 with prob(1/2), +1 with prob(1/2)}
       
       -P <percent>
        The percentage of dimensions (attributes) the data should
        be reduced to (exclusive of the class attribute, if it is set). The -N
        option is ignored if this option is present and is greater
        than zero.
       -M
        Replace missing values using the ReplaceMissingValues filter instead of just skipping them.
       -R <num>
        The random seed for the random number generator used for
        calculating the random matrix (default 42).
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • globalInfo

      public String globalInfo()
      Returns a string describing this filter
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • numberOfAttributesTipText

      public String numberOfAttributesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumberOfAttributes

      public void setNumberOfAttributes(int newAttNum)
      Sets the number of attributes (dimensions) the data should be reduced to
      Parameters:
      newAttNum - the goal for the dimensions
    • getNumberOfAttributes

      public int getNumberOfAttributes()
      Gets the current number of attributes (dimensionality) to which the data will be reduced to.
      Returns:
      the number of dimensions
    • percentTipText

      public String percentTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPercent

      public void setPercent(double newPercent)
      Sets the percent the attributes (dimensions) of the data should be reduced to
      Parameters:
      newPercent - the percentage of attributes
    • getPercent

      public double getPercent()
      Gets the percent the attributes (dimensions) of the data will be reduced to
      Returns:
      the percentage of attributes
    • seedTipText

      public String seedTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSeed

      public void setSeed(int seed)
      Sets the random seed of the random number generator
      Specified by:
      setSeed in interface Randomizable
      Parameters:
      seed - the random seed value
    • getSeed

      public int getSeed()
      Gets the random seed of the random number generator
      Specified by:
      getSeed in interface Randomizable
      Returns:
      the random seed value
    • distributionTipText

      public String distributionTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDistribution

      public void setDistribution(SelectedTag newDstr)
      Sets the distribution to use for calculating the random matrix
      Parameters:
      newDstr - the distribution to use
    • getDistribution

      public SelectedTag getDistribution()
      Returns the current distribution that'll be used for calculating the random matrix
      Returns:
      the current distribution
    • replaceMissingValuesTipText

      public String replaceMissingValuesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setReplaceMissingValues

      public void setReplaceMissingValues(boolean t)
      Sets either to use replace missing values filter or not
      Parameters:
      t - if true then the replace missing values is used
    • getReplaceMissingValues

      public boolean getReplaceMissingValues()
      Gets the current setting for using ReplaceMissingValues filter
      Returns:
      true if the replace missing values filter is used
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • setInputFormat

      public boolean setInputFormat(Instances instanceInfo) throws Exception
      Sets the format of the input instances.
      Overrides:
      setInputFormat in class Filter
      Parameters:
      instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
      Returns:
      true if the outputFormat may be collected immediately
      Throws:
      Exception - if the input format can't be set successfully
    • input

      public boolean input(Instance instance) throws Exception
      Input an instance for filtering.
      Overrides:
      input in class Filter
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      IllegalStateException - if no input format has been set
      NullPointerException - if the input format has not been defined.
      Exception - if the input instance was not of the correct format or if there was a problem with the filtering.
    • batchFinished

      public boolean batchFinished() throws Exception
      Signify that this batch of input to the filter is finished.
      Overrides:
      batchFinished in class Filter
      Returns:
      true if there are instances pending output
      Throws:
      NullPointerException - if no input structure has been defined,
      Exception - if there was a problem finishing the batch.
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain arguments to the filter: use -h for help