weka.clusterers.ClusterEvaluation

All Implemented Interfaces:: Serializable, RevisionHandler

public class ClusterEvaluation extends Object implements Serializable, RevisionHandler

Class for evaluating clustering models.

Valid options are:

-t name of the training file
Specify the training file.

-T name of the test file
Specify the test file to apply clusterer to.

-force-batch-training
Always train the clusterer in batch mode, never incrementally.

-d name of file to save clustering model to
Specify output file.

-l name of file to load clustering model from
Specifiy input file.

-p attribute range
Output predictions. Predictions are for the training file if only the training file is specified, otherwise they are for the test file. The range specifies attribute values to be output with the predictions. Use '-p 0' for none.

-x num folds
Set the number of folds for a cross validation of the training data. Cross validation can only be done for distribution clusterers and will be performed if the test file is missing.

-s num
Sets the seed for randomizing the data for cross-validation.

-c class
Set the class attribute. If set, then class based evaluation of clustering is performed.

-g name of graph file
Outputs the graph representation of the clusterer to the file. Only for clusterer that implemented the weka.core.Drawable interface.

Version:

$Revision: 15203 $

Author:

Mark Hall (mhall@cs.waikato.ac.nz)

See Also:

Constructor Summary

Constructors

Constructor

Description

ClusterEvaluation()

Constructor.
Method Summary

Modifier and Type

Method

Description

String

clusterResultsToString()

return the results of clustering.

static String

crossValidateModel(String clustererString, Instances data, int numFolds, String[] options, Random random)

Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.

static double

crossValidateModel(DensityBasedClusterer clusterer, Instances data, int numFolds, Random random)

Perform a cross-validation for DensityBasedClusterer on a set of instances.

boolean

equals(Object obj)

Tests whether the current evaluation object is equal to another evaluation object

static String

evaluateClusterer(Clusterer clusterer, String[] options)

Evaluates a clusterer with the options given in an array of strings.

void

evaluateClusterer(Instances test)

Evaluate the clusterer on a set of instances.

void

evaluateClusterer(Instances test, String testFileName)

Evaluate the clusterer on a set of instances.

void

evaluateClusterer(Instances test, String testFileName, boolean outputModel)

Evaluate the clusterer on a set of instances.

int[]

getClassesToClusters()

Return the array (ordered by cluster number) of minimum error class to cluster mappings

double[]

getClusterAssignments()

Return an array of cluster assignments corresponding to the most recent set of instances clustered.

double

getLogLikelihood()

Return the log likelihood corresponding to the most recent set of instances clustered.

int

getNumClusters()

Return the number of clusters found for the most recent call to evaluateClusterer

String

getRevision()

Returns the revision string.

static void

main(String[] args)

Main method for testing this class.

static void

mapClasses(int numClusters, int lev, int[][] counts, int[] clusterTotals, double[] current, double[] best, int error)

Finds the minimum error mapping of classes to clusters.

void

setClusterer(Clusterer clusterer)

set the clusterer

Methods inherited from class java.lang.Object
getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ClusterEvaluation
  
  public ClusterEvaluation()
  
  Constructor. Sets defaults for each member variable. Default Clusterer is EM.
Method Details
- setClusterer
  
  public void setClusterer(Clusterer clusterer)
  
  set the clusterer
  
  Parameters:
  
  clusterer - the clusterer to use
- clusterResultsToString
  
  public String clusterResultsToString()
  
  return the results of clustering.
  
  Returns:
  
  a string detailing the results of clustering a data set
- getNumClusters
  
  public int getNumClusters()
  
  Return the number of clusters found for the most recent call to evaluateClusterer
  
  Returns:
  
  the number of clusters found
- getClusterAssignments
  
  public double[] getClusterAssignments()
  
  Return an array of cluster assignments corresponding to the most recent set of instances clustered.
  
  Returns:
  
  an array of cluster assignments
- getClassesToClusters
  
  public int[] getClassesToClusters()
  
  Return the array (ordered by cluster number) of minimum error class to cluster mappings
  
  Returns:
  
  an array of class to cluster mappings
- getLogLikelihood
  
  public double getLogLikelihood()
  
  Return the log likelihood corresponding to the most recent set of instances clustered.
  
  Returns:
  
  a double value
- evaluateClusterer
  
  public void evaluateClusterer(Instances test) throws Exception
  
  Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
  
  Parameters:
  
  test - the set of instances to cluster
  
  Throws:
  
  Exception - if something goes wrong
- evaluateClusterer
  
  public void evaluateClusterer(Instances test, String testFileName) throws Exception
  
  Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
  
  Parameters:
  
  test - the set of instances to cluster
  
  testFileName - the name of the test file for incremental testing, if "" or null then not used
  
  Throws:
  
  Exception - if something goes wrong
- evaluateClusterer
  
  public void evaluateClusterer(Instances test, String testFileName, boolean outputModel) throws Exception
  
  Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments
  
  Parameters:
  
  test - the set of instances to cluster
  
  testFileName - the name of the test file for incremental testing, if "" or null then not used
  
  outputModel - true if the clustering model is to be output as well as the stats
  
  Throws:
  
  Exception - if something goes wrong
- mapClasses
  
  public static void mapClasses(int numClusters, int lev, int[][] counts, int[] clusterTotals, double[] current, double[] best, int error)
  
  Finds the minimum error mapping of classes to clusters. Recursively considers all possible class to cluster assignments.
  
  Parameters:
  
  numClusters - the number of clusters
  
  lev - the cluster being processed
  
  counts - the counts of classes in clusters
  
  clusterTotals - the total number of examples in each cluster
  
  current - the current path through the class to cluster assignment tree
  
  best - the best assignment path seen
  
  error - accumulates the error for a particular path
- evaluateClusterer
  
  public static String evaluateClusterer(Clusterer clusterer, String[] options) throws Exception
  
  Evaluates a clusterer with the options given in an array of strings. It takes the string indicated by "-t" as training file, the string indicated by "-T" as test file. If the test file is missing, a stratified ten-fold cross-validation is performed (distribution clusterers only). Using "-x" you can change the number of folds to be used, and using "-s" the random seed. If the "-p" option is present it outputs the classification for each test instance. If you provide the name of an object file using "-l", a clusterer will be loaded from the given file. If you provide the name of an object file using "-d", the clusterer built from the training data will be saved to the given file.
  
  Parameters:
  
  clusterer - machine learning clusterer
  
  options - the array of string containing the options
  
  Returns:
  
  a string describing the results
  
  Throws:
  
  Exception - if model could not be evaluated successfully
- crossValidateModel
  
  public static double crossValidateModel(DensityBasedClusterer clusterer, Instances data, int numFolds, Random random) throws Exception
  
  Perform a cross-validation for DensityBasedClusterer on a set of instances.
  
  Parameters:
  
  clusterer - the clusterer to use
  
  data - the training data
  
  numFolds - number of folds of cross validation to perform
  
  random - random number seed for cross-validation
  
  Returns:
  
  the cross-validated log-likelihood
  
  Throws:
  
  Exception - if an error occurs
- crossValidateModel
  
  public static String crossValidateModel(String clustererString, Instances data, int numFolds, String[] options, Random random) throws Exception
  
  Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.
  
  Parameters:
  
  clustererString - a string naming the class of the clusterer
  
  data - the data on which the cross-validation is to be performed
  
  numFolds - the number of folds for the cross-validation
  
  options - the options to the clusterer
  
  random - a random number generator
  
  Returns:
  
  a string containing the cross validated log likelihood
  
  Throws:
  
  Exception - if a clusterer could not be generated
- equals
  
  public boolean equals(Object obj)
  
  Tests whether the current evaluation object is equal to another evaluation object
  
  Overrides:
  
  equals in class Object
  
  Parameters:
  
  obj - the object to compare against
  
  Returns:
  
  true if the two objects are equal
- getRevision
  
  public String getRevision()
  
  Returns the revision string.
  
  Specified by:
  
  getRevision in interface RevisionHandler
  
  Returns:
  
  the revision
- main
  
  public static void main(String[] args)
  
  Main method for testing this class.
  
  Parameters:
  
  args - the options

Class ClusterEvaluation

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

ClusterEvaluation

Method Details

setClusterer

clusterResultsToString

getNumClusters

getClusterAssignments

getClassesToClusters

getLogLikelihood

evaluateClusterer

evaluateClusterer

evaluateClusterer

mapClasses

evaluateClusterer

crossValidateModel

crossValidateModel

equals

getRevision

main