Class ClusterEvaluation

java.lang.Object
weka.clusterers.ClusterEvaluation
All Implemented Interfaces:
Serializable, RevisionHandler

public class ClusterEvaluation extends Object implements Serializable, RevisionHandler
Class for evaluating clustering models.

Valid options are:

-t name of the training file
Specify the training file.

-T name of the test file
Specify the test file to apply clusterer to.

-force-batch-training
Always train the clusterer in batch mode, never incrementally.

-d name of file to save clustering model to
Specify output file.

-l name of file to load clustering model from
Specifiy input file.

-p attribute range
Output predictions. Predictions are for the training file if only the training file is specified, otherwise they are for the test file. The range specifies attribute values to be output with the predictions. Use '-p 0' for none.

-x num folds
Set the number of folds for a cross validation of the training data. Cross validation can only be done for distribution clusterers and will be performed if the test file is missing.

-s num
Sets the seed for randomizing the data for cross-validation.

-c class
Set the class attribute. If set, then class based evaluation of clustering is performed.

-g name of graph file
Outputs the graph representation of the clusterer to the file. Only for clusterer that implemented the weka.core.Drawable interface.

Version:
$Revision: 15203 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • ClusterEvaluation

      public ClusterEvaluation()
      Constructor. Sets defaults for each member variable. Default Clusterer is EM.
  • Method Details

    • setClusterer

      public void setClusterer(Clusterer clusterer)
      set the clusterer
      Parameters:
      clusterer - the clusterer to use
    • clusterResultsToString

      public String clusterResultsToString()
      return the results of clustering.
      Returns:
      a string detailing the results of clustering a data set
    • getNumClusters

      public int getNumClusters()
      Return the number of clusters found for the most recent call to evaluateClusterer
      Returns:
      the number of clusters found
    • getClusterAssignments

      public double[] getClusterAssignments()
      Return an array of cluster assignments corresponding to the most recent set of instances clustered.
      Returns:
      an array of cluster assignments
    • getClassesToClusters

      public int[] getClassesToClusters()
      Return the array (ordered by cluster number) of minimum error class to cluster mappings
      Returns:
      an array of class to cluster mappings
    • getLogLikelihood

      public double getLogLikelihood()
      Return the log likelihood corresponding to the most recent set of instances clustered.
      Returns:
      a double value
    • evaluateClusterer

      public void evaluateClusterer(Instances test) throws Exception
      Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
      Parameters:
      test - the set of instances to cluster
      Throws:
      Exception - if something goes wrong
    • evaluateClusterer

      public void evaluateClusterer(Instances test, String testFileName) throws Exception
      Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
      Parameters:
      test - the set of instances to cluster
      testFileName - the name of the test file for incremental testing, if "" or null then not used
      Throws:
      Exception - if something goes wrong
    • evaluateClusterer

      public void evaluateClusterer(Instances test, String testFileName, boolean outputModel) throws Exception
      Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
      Parameters:
      test - the set of instances to cluster
      testFileName - the name of the test file for incremental testing, if "" or null then not used
      outputModel - true if the clustering model is to be output as well as the stats
      Throws:
      Exception - if something goes wrong
    • mapClasses

      public static void mapClasses(int numClusters, int lev, int[][] counts, int[] clusterTotals, double[] current, double[] best, int error)
      Finds the minimum error mapping of classes to clusters. Recursively considers all possible class to cluster assignments.
      Parameters:
      numClusters - the number of clusters
      lev - the cluster being processed
      counts - the counts of classes in clusters
      clusterTotals - the total number of examples in each cluster
      current - the current path through the class to cluster assignment tree
      best - the best assignment path seen
      error - accumulates the error for a particular path
    • evaluateClusterer

      public static String evaluateClusterer(Clusterer clusterer, String[] options) throws Exception
      Evaluates a clusterer with the options given in an array of strings. It takes the string indicated by "-t" as training file, the string indicated by "-T" as test file. If the test file is missing, a stratified ten-fold cross-validation is performed (distribution clusterers only). Using "-x" you can change the number of folds to be used, and using "-s" the random seed. If the "-p" option is present it outputs the classification for each test instance. If you provide the name of an object file using "-l", a clusterer will be loaded from the given file. If you provide the name of an object file using "-d", the clusterer built from the training data will be saved to the given file.
      Parameters:
      clusterer - machine learning clusterer
      options - the array of string containing the options
      Returns:
      a string describing the results
      Throws:
      Exception - if model could not be evaluated successfully
    • crossValidateModel

      public static double crossValidateModel(DensityBasedClusterer clusterer, Instances data, int numFolds, Random random) throws Exception
      Perform a cross-validation for DensityBasedClusterer on a set of instances.
      Parameters:
      clusterer - the clusterer to use
      data - the training data
      numFolds - number of folds of cross validation to perform
      random - random number seed for cross-validation
      Returns:
      the cross-validated log-likelihood
      Throws:
      Exception - if an error occurs
    • crossValidateModel

      public static String crossValidateModel(String clustererString, Instances data, int numFolds, String[] options, Random random) throws Exception
      Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.
      Parameters:
      clustererString - a string naming the class of the clusterer
      data - the data on which the cross-validation is to be performed
      numFolds - the number of folds for the cross-validation
      options - the options to the clusterer
      random - a random number generator
      Returns:
      a string containing the cross validated log likelihood
      Throws:
      Exception - if a clusterer could not be generated
    • equals

      public boolean equals(Object obj)
      Tests whether the current evaluation object is equal to another evaluation object
      Overrides:
      equals in class Object
      Parameters:
      obj - the object to compare against
      Returns:
      true if the two objects are equal
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing this class.
      Parameters:
      args - the options