Class PrincipalComponents

All Implemented Interfaces:
Serializable, AttributeEvaluator, AttributeTransformer, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler

public class PrincipalComponents extends UnsupervisedAttributeEvaluator implements AttributeTransformer, OptionHandler
Performs a principal components analysis and transformation of the data. Use in conjunction with a Ranker search. Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data---default 0.95 (95%). Attribute noise can be filtered by transforming to the PC space, eliminating some of the worst eigenvectors, and then transforming back to the original space.

Valid options are:

 -C
  Center (rather than standardize) the
  data and compute PCA using the covariance (rather
   than the correlation) matrix.
 
 -R
  Retain enough PC attributes to account 
  for this proportion of variance in the original data.
  (default = 0.95)
 
 -O
  Transform through the PC space and 
  back to the original space.
 
 -A
  Maximum number of attributes to include in 
  transformed attribute names. (-1 = include all)
 
Version:
$Revision: 15519 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz), Gabi Schmidberger (gabi@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • PrincipalComponents

      public PrincipalComponents()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this attribute transformer
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.

      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class ASEvaluation
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -C
        Center (rather than standardize) the
        data and compute PCA using the covariance (rather
         than the correlation) matrix.
       
       -R
        Retain enough PC attributes to account
        for this proportion of variance in the original data.
        (default = 0.95)
       
       -O
        Transform through the PC space and
        back to the original space.
       
       -A
        Maximum number of attributes to include in
        transformed attribute names. (-1 = include all)
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class ASEvaluation
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • centerDataTipText

      public String centerDataTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setCenterData

      public void setCenterData(boolean center)
      Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.
      Parameters:
      center - true if the data is to be centered rather than standardized
    • getCenterData

      public boolean getCenterData()
      Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.
      Returns:
      true if the data is to be centered rather than standardized.
    • varianceCoveredTipText

      public String varianceCoveredTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setVarianceCovered

      public void setVarianceCovered(double vc)
      Sets the amount of variance to account for when retaining principal components
      Parameters:
      vc - the proportion of total variance to account for
    • getVarianceCovered

      public double getVarianceCovered()
      Gets the proportion of total variance to account for when retaining principal components
      Returns:
      the proportion of variance to account for
    • maximumAttributeNamesTipText

      public String maximumAttributeNamesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaximumAttributeNames

      public void setMaximumAttributeNames(int m)
      Sets maximum number of attributes to include in transformed attribute names.
      Parameters:
      m - the maximum number of attributes
    • getMaximumAttributeNames

      public int getMaximumAttributeNames()
      Gets maximum number of attributes to include in transformed attribute names.
      Returns:
      the maximum number of attributes
    • transformBackToOriginalTipText

      public String transformBackToOriginalTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setTransformBackToOriginal

      public void setTransformBackToOriginal(boolean b)
      Sets whether the data should be transformed back to the original space
      Parameters:
      b - true if the data should be transformed back to the original space
    • getTransformBackToOriginal

      public boolean getTransformBackToOriginal()
      Gets whether the data is to be transformed back to the original space.
      Returns:
      true if the data is to be transformed back to the original space
    • getOptions

      public String[] getOptions()
      Gets the current settings of PrincipalComponents
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class ASEvaluation
      Returns:
      an array of strings suitable for passing to setOptions()
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the capabilities of this evaluator.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class ASEvaluation
      Returns:
      the capabilities of this evaluator
      See Also:
    • buildEvaluator

      public void buildEvaluator(Instances data) throws Exception
      Initializes principal components and performs the analysis
      Specified by:
      buildEvaluator in class ASEvaluation
      Parameters:
      data - the instances to analyse/transform
      Throws:
      Exception - if analysis fails
    • initializeAndComputeMatrix

      public void initializeAndComputeMatrix(Instances data) throws Exception
      Intializes the evaluator, filters the input data and computes the correlation/covariance matrix.
      Parameters:
      data - the instances to analyse
      Throws:
      Exception - if a problem occurs
    • transformedHeader

      public Instances transformedHeader() throws Exception
      Returns just the header for the transformed data (ie. an empty set of instances. This is so that AttributeSelection can determine the structure of the transformed data without actually having to get all the transformed data through transformedData().
      Specified by:
      transformedHeader in interface AttributeTransformer
      Returns:
      the header of the transformed data.
      Throws:
      Exception - if the header of the transformed data can't be determined.
    • getFilteredInputFormat

      public Instances getFilteredInputFormat()
      Return the header of the training data after all filtering - i.e missing values and nominal to binary.
      Returns:
      the header of the training data after all filtering.
    • getCorrelationMatrix

      public double[][] getCorrelationMatrix()
      Return the correlation/covariance matrix
      Returns:
      the correlation or covariance matrix
    • getUnsortedEigenVectors

      public double[][] getUnsortedEigenVectors()
      Return the unsorted eigenvectors
      Returns:
      the unsorted eigenvectors
    • getEigenValues

      public double[] getEigenValues()
      Return the eigenvalues corresponding to the eigenvectors
      Returns:
      the eigenvalues
    • transformedData

      public Instances transformedData(Instances data) throws Exception
      Gets the transformed training data.
      Specified by:
      transformedData in interface AttributeTransformer
      Returns:
      the transformed training data
      Throws:
      Exception - if transformed data can't be returned
    • evaluateAttribute

      public double evaluateAttribute(int att) throws Exception
      Evaluates the merit of a transformed attribute. This is defined to be 1 minus the cumulative variance explained. Merit can't be meaningfully evaluated if the data is to be transformed back to the original space.
      Specified by:
      evaluateAttribute in interface AttributeEvaluator
      Parameters:
      att - the attribute to be evaluated
      Returns:
      the merit of a transformed attribute
      Throws:
      Exception - if attribute can't be evaluated
    • toString

      public String toString()
      Returns a description of this attribute transformer
      Overrides:
      toString in class Object
      Returns:
      a String describing this attribute transformer
    • matrixToString

      public static String matrixToString(double[][] matrix)
      Return a matrix as a String
      Parameters:
      matrix - that is decribed as a string
      Returns:
      a String describing a matrix
    • convertInstance

      public Instance convertInstance(Instance instance) throws Exception
      Transform an instance in original (unormalized) format. Convert back to the original space if requested.
      Specified by:
      convertInstance in interface AttributeTransformer
      Parameters:
      instance - an instance in the original (unormalized) format
      Returns:
      a transformed instance
      Throws:
      Exception - if instance cant be transformed
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class ASEvaluation
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class
      Parameters:
      argv - should contain the command line arguments to the evaluator/transformer (see AttributeSelection)