Package weka.clusterers
Class ClusterEvaluation
java.lang.Object
weka.clusterers.ClusterEvaluation
- All Implemented Interfaces:
Serializable
,RevisionHandler
Class for evaluating clustering models.
Valid options are:
-t name of the training file
Specify the training file. -T name of the test file
Specify the test file to apply clusterer to. -force-batch-training
Always train the clusterer in batch mode, never incrementally. -d name of file to save clustering model to
Specify output file. -l name of file to load clustering model from
Specifiy input file. -p attribute range
Output predictions. Predictions are for the training file if only the training file is specified, otherwise they are for the test file. The range specifies attribute values to be output with the predictions. Use '-p 0' for none. -x num folds
Set the number of folds for a cross validation of the training data. Cross validation can only be done for distribution clusterers and will be performed if the test file is missing. -s num
Sets the seed for randomizing the data for cross-validation. -c class
Set the class attribute. If set, then class based evaluation of clustering is performed. -g name of graph file
Outputs the graph representation of the clusterer to the file. Only for clusterer that implemented the
Specify the training file. -T name of the test file
Specify the test file to apply clusterer to. -force-batch-training
Always train the clusterer in batch mode, never incrementally. -d name of file to save clustering model to
Specify output file. -l name of file to load clustering model from
Specifiy input file. -p attribute range
Output predictions. Predictions are for the training file if only the training file is specified, otherwise they are for the test file. The range specifies attribute values to be output with the predictions. Use '-p 0' for none. -x num folds
Set the number of folds for a cross validation of the training data. Cross validation can only be done for distribution clusterers and will be performed if the test file is missing. -s num
Sets the seed for randomizing the data for cross-validation. -c class
Set the class attribute. If set, then class based evaluation of clustering is performed. -g name of graph file
Outputs the graph representation of the clusterer to the file. Only for clusterer that implemented the
weka.core.Drawable
interface.
- Version:
- $Revision: 15203 $
- Author:
- Mark Hall (mhall@cs.waikato.ac.nz)
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionreturn the results of clustering.static String
crossValidateModel
(String clustererString, Instances data, int numFolds, String[] options, Random random) Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.static double
crossValidateModel
(DensityBasedClusterer clusterer, Instances data, int numFolds, Random random) Perform a cross-validation for DensityBasedClusterer on a set of instances.boolean
Tests whether the current evaluation object is equal to another evaluation objectstatic String
evaluateClusterer
(Clusterer clusterer, String[] options) Evaluates a clusterer with the options given in an array of strings.void
evaluateClusterer
(Instances test) Evaluate the clusterer on a set of instances.void
evaluateClusterer
(Instances test, String testFileName) Evaluate the clusterer on a set of instances.void
evaluateClusterer
(Instances test, String testFileName, boolean outputModel) Evaluate the clusterer on a set of instances.int[]
Return the array (ordered by cluster number) of minimum error class to cluster mappingsdouble[]
Return an array of cluster assignments corresponding to the most recent set of instances clustered.double
Return the log likelihood corresponding to the most recent set of instances clustered.int
Return the number of clusters found for the most recent call to evaluateClustererReturns the revision string.static void
Main method for testing this class.static void
mapClasses
(int numClusters, int lev, int[][] counts, int[] clusterTotals, double[] current, double[] best, int error) Finds the minimum error mapping of classes to clusters.void
setClusterer
(Clusterer clusterer) set the clusterer
-
Constructor Details
-
ClusterEvaluation
public ClusterEvaluation()Constructor. Sets defaults for each member variable. Default Clusterer is EM.
-
-
Method Details
-
setClusterer
set the clusterer- Parameters:
clusterer
- the clusterer to use
-
clusterResultsToString
return the results of clustering.- Returns:
- a string detailing the results of clustering a data set
-
getNumClusters
public int getNumClusters()Return the number of clusters found for the most recent call to evaluateClusterer- Returns:
- the number of clusters found
-
getClusterAssignments
public double[] getClusterAssignments()Return an array of cluster assignments corresponding to the most recent set of instances clustered.- Returns:
- an array of cluster assignments
-
getClassesToClusters
public int[] getClassesToClusters()Return the array (ordered by cluster number) of minimum error class to cluster mappings- Returns:
- an array of class to cluster mappings
-
getLogLikelihood
public double getLogLikelihood()Return the log likelihood corresponding to the most recent set of instances clustered.- Returns:
- a
double
value
-
evaluateClusterer
Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments- Parameters:
test
- the set of instances to cluster- Throws:
Exception
- if something goes wrong
-
evaluateClusterer
Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments- Parameters:
test
- the set of instances to clustertestFileName
- the name of the test file for incremental testing, if "" or null then not used- Throws:
Exception
- if something goes wrong
-
evaluateClusterer
public void evaluateClusterer(Instances test, String testFileName, boolean outputModel) throws Exception Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments- Parameters:
test
- the set of instances to clustertestFileName
- the name of the test file for incremental testing, if "" or null then not usedoutputModel
- true if the clustering model is to be output as well as the stats- Throws:
Exception
- if something goes wrong
-
mapClasses
public static void mapClasses(int numClusters, int lev, int[][] counts, int[] clusterTotals, double[] current, double[] best, int error) Finds the minimum error mapping of classes to clusters. Recursively considers all possible class to cluster assignments.- Parameters:
numClusters
- the number of clusterslev
- the cluster being processedcounts
- the counts of classes in clustersclusterTotals
- the total number of examples in each clustercurrent
- the current path through the class to cluster assignment treebest
- the best assignment path seenerror
- accumulates the error for a particular path
-
evaluateClusterer
Evaluates a clusterer with the options given in an array of strings. It takes the string indicated by "-t" as training file, the string indicated by "-T" as test file. If the test file is missing, a stratified ten-fold cross-validation is performed (distribution clusterers only). Using "-x" you can change the number of folds to be used, and using "-s" the random seed. If the "-p" option is present it outputs the classification for each test instance. If you provide the name of an object file using "-l", a clusterer will be loaded from the given file. If you provide the name of an object file using "-d", the clusterer built from the training data will be saved to the given file.- Parameters:
clusterer
- machine learning clustereroptions
- the array of string containing the options- Returns:
- a string describing the results
- Throws:
Exception
- if model could not be evaluated successfully
-
crossValidateModel
public static double crossValidateModel(DensityBasedClusterer clusterer, Instances data, int numFolds, Random random) throws Exception Perform a cross-validation for DensityBasedClusterer on a set of instances.- Parameters:
clusterer
- the clusterer to usedata
- the training datanumFolds
- number of folds of cross validation to performrandom
- random number seed for cross-validation- Returns:
- the cross-validated log-likelihood
- Throws:
Exception
- if an error occurs
-
crossValidateModel
public static String crossValidateModel(String clustererString, Instances data, int numFolds, String[] options, Random random) throws Exception Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.- Parameters:
clustererString
- a string naming the class of the clustererdata
- the data on which the cross-validation is to be performednumFolds
- the number of folds for the cross-validationoptions
- the options to the clustererrandom
- a random number generator- Returns:
- a string containing the cross validated log likelihood
- Throws:
Exception
- if a clusterer could not be generated
-
equals
Tests whether the current evaluation object is equal to another evaluation object -
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
args
- the options
-