Package weka.experiment
Class ExplicitTestsetResultProducer
java.lang.Object
weka.experiment.ExplicitTestsetResultProducer
- All Implemented Interfaces:
Serializable
,AdditionalMeasureProducer
,OptionHandler
,RevisionHandler
,ResultProducer
public class ExplicitTestsetResultProducer
extends Object
implements ResultProducer, OptionHandler, AdditionalMeasureProducer, RevisionHandler
Loads the external test set and calls the
appropriate SplitEvaluator to generate some results.
The filename of the test set is constructed as follows:
<dir> + / + <prefix> + <relation-name> + <suffix>
The relation-name can be modified by using the regular expression to replace the matching sub-string with a specified replacement string. In order to get rid of the string that the Weka filters add to the end of the relation name, just use '.*-weka' as the regular expression to find.
The suffix determines the type of file to load, i.e., one is not restricted to ARFF files. As long as Weka recognizes the extension specified in the suffix, the data will be loaded with one of Weka's converters. Valid options are:
The filename of the test set is constructed as follows:
<dir> + / + <prefix> + <relation-name> + <suffix>
The relation-name can be modified by using the regular expression to replace the matching sub-string with a specified replacement string. In order to get rid of the string that the Weka filters add to the end of the relation name, just use '.*-weka' as the regular expression to find.
The suffix determines the type of file to load, i.e., one is not restricted to ARFF files. As long as Weka recognizes the extension specified in the suffix, the data will be loaded with one of Weka's converters. Valid options are:
-D Save raw split evaluator output.
-O <file/directory name/path> The filename where raw output will be stored. If a directory name is specified then then individual outputs will be gzipped, otherwise all output will be zipped to the named file. Use in conjuction with -D. (default: splitEvalutorOut.zip)
-W <class name> The full class name of a SplitEvaluator. eg: weka.experiment.ClassifierSplitEvaluator
-R Set when data is to be randomized.
-dir <directory> The directory containing the test sets. (default: current directory)
-prefix <string> An optional prefix for the test sets (before the relation name). (default: empty string)
-suffix <string> The suffix to append to the test set. (default: _test.arff)
-find <regular expression> The regular expression to search the relation name with. Not used if an empty string. (default: empty string)
-replace <string> The replacement string for the all the matches of '-find'. (default: empty string)
Options specific to split evaluator weka.experiment.ClassifierSplitEvaluator:
-W <class name> The full class name of the classifier. eg: weka.classifiers.bayes.NaiveBayes
-C <index> The index of the class for which IR statistics are to be output. (default 1)
-I <index> The index of an attribute to output in the results. This attribute should identify an instance in order to know which instances are in the test set of a cross validation. if 0 no output (default 0).
-P Add target and prediction columns to the result for each fold.
Options specific to classifier weka.classifiers.rules.ZeroR:
-D If set, classifier is run in debug mode and may output additional info to the consoleAll options after -- will be passed to the split evaluator.
- Version:
- $Revision: 10203 $
- Author:
- Len Trigg (trigg@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
- See Also:
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
doRun
(int run) Gets the results for a specified run number.void
doRunKeys
(int run) Gets the keys for a specified run number.Returns an enumeration of any additional measure names that might be in the SplitEvaluator.Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface).String[]
Gets the names of each of the columns produced for a single run.Object[]
Gets the data types of each of the columns produced for a single run.double
getMeasure
(String additionalMeasureName) Returns the value of the named measure.String[]
Gets the current settings of the result producer.Get the value of OutputFile.boolean
Get if dataset is to be randomized.boolean
Get if raw split evaluator output is to be saved.Returns the currently set regular expression to use on the relation name.Returns the currently set replacement string to use on the relation name.String[]
Gets the names of each of the columns produced for a single run.Object[]
Gets the data types of each of the columns produced for a single run.Returns the revision string.Get the SplitEvaluator.Returns the currently set directory for the test sets.Returns the currently set prefix.Returns the currently set suffix.static Double
Gets a Double representing the current date and time.Returns a string describing this result producer.Returns an enumeration describing the available options..Returns the tip text for this property.void
Perform any postprocessing.void
Prepare to generate results.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.void
setAdditionalMeasures
(String[] additionalMeasures) Set a list of method names for additional measures to look for in SplitEvaluators.void
setInstances
(Instances instances) Sets the dataset that results will be obtained for.void
setOptions
(String[] options) Parses a given list of options.void
setOutputFile
(File value) Set the value of OutputFile.void
setRandomizeData
(boolean value) Set to true if dataset is to be randomized.void
setRawOutput
(boolean value) Set to true if raw split evaluator output is to be saved.void
setRelationFind
(String value) Sets the regular expression to use on the relation name.void
setRelationReplace
(String value) Sets the replacement string to use on the relation name.void
setResultListener
(ResultListener listener) Sets the object to send results of each run to.void
setSplitEvaluator
(SplitEvaluator value) Set the SplitEvaluator.void
setTestsetDir
(File value) Sets the directory to use for the test sets.void
setTestsetPrefix
(String value) Sets the prefix to use for the test sets.void
setTestsetSuffix
(String value) Sets the suffix to use for the test sets.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.toString()
Gets a text descrption of the result producer.
-
Field Details
-
DEFAULT_SUFFIX
the default suffix.- See Also:
-
DATASET_FIELD_NAME
The name of the key field containing the dataset name. -
RUN_FIELD_NAME
The name of the key field containing the run number. -
TIMESTAMP_FIELD_NAME
The name of the result field containing the timestamp.
-
-
Constructor Details
-
ExplicitTestsetResultProducer
public ExplicitTestsetResultProducer()
-
-
Method Details
-
globalInfo
Returns a string describing this result producer.- Returns:
- a description of the result producer suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration describing the available options..- Specified by:
listOptions
in interfaceOptionHandler
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-D Save raw split evaluator output.
-O <file/directory name/path> The filename where raw output will be stored. If a directory name is specified then then individual outputs will be gzipped, otherwise all output will be zipped to the named file. Use in conjuction with -D. (default: splitEvalutorOut.zip)
-W <class name> The full class name of a SplitEvaluator. eg: weka.experiment.ClassifierSplitEvaluator
-R Set when data is to be randomized.
-dir <directory> The directory containing the test sets. (default: current directory)
-prefix <string> An optional prefix for the test sets (before the relation name). (default: empty string)
-suffix <string> The suffix to append to the test set. (default: _test.arff)
-find <regular expression> The regular expression to search the relation name with. Not used if an empty string. (default: empty string)
-replace <string> The replacement string for the all the matches of '-find'. (default: empty string)
Options specific to split evaluator weka.experiment.ClassifierSplitEvaluator:
-W <class name> The full class name of the classifier. eg: weka.classifiers.bayes.NaiveBayes
-C <index> The index of the class for which IR statistics are to be output. (default 1)
-I <index> The index of an attribute to output in the results. This attribute should identify an instance in order to know which instances are in the test set of a cross validation. if 0 no output (default 0).
-P Add target and prediction columns to the result for each fold.
Options specific to classifier weka.classifiers.rules.ZeroR:
-D If set, classifier is run in debug mode and may output additional info to the console
All options after -- will be passed to the split evaluator.- Specified by:
setOptions
in interfaceOptionHandler
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the result producer.- Specified by:
getOptions
in interfaceOptionHandler
- Returns:
- an array of strings suitable for passing to setOptions
-
setInstances
Sets the dataset that results will be obtained for.- Specified by:
setInstances
in interfaceResultProducer
- Parameters:
instances
- a value of type 'Instances'.
-
setAdditionalMeasures
Set a list of method names for additional measures to look for in SplitEvaluators. This could contain many measures (of which only a subset may be produceable by the current SplitEvaluator) if an experiment is the type that iterates over a set of properties.- Specified by:
setAdditionalMeasures
in interfaceResultProducer
- Parameters:
additionalMeasures
- an array of measure names, null if none
-
enumerateMeasures
Returns an enumeration of any additional measure names that might be in the SplitEvaluator.- Specified by:
enumerateMeasures
in interfaceAdditionalMeasureProducer
- Returns:
- an enumeration of the measure names
-
getMeasure
Returns the value of the named measure.- Specified by:
getMeasure
in interfaceAdditionalMeasureProducer
- Parameters:
additionalMeasureName
- the name of the measure to query for its value- Returns:
- the value of the named measure
- Throws:
IllegalArgumentException
- if the named measure is not supported
-
setResultListener
Sets the object to send results of each run to.- Specified by:
setResultListener
in interfaceResultProducer
- Parameters:
listener
- a value of type 'ResultListener'
-
getTimestamp
Gets a Double representing the current date and time. eg: 1:46pm on 20/5/1999 -> 19990520.1346- Returns:
- a value of type Double
-
preProcess
Prepare to generate results.- Specified by:
preProcess
in interfaceResultProducer
- Throws:
Exception
- if an error occurs during preprocessing.
-
postProcess
Perform any postprocessing. When this method is called, it indicates that no more requests to generate results for the current experiment will be sent.- Specified by:
postProcess
in interfaceResultProducer
- Throws:
Exception
- if an error occurs
-
doRunKeys
Gets the keys for a specified run number. Different run numbers correspond to different randomizations of the data. Keys produced should be sent to the current ResultListener- Specified by:
doRunKeys
in interfaceResultProducer
- Parameters:
run
- the run number to get keys for.- Throws:
Exception
- if a problem occurs while getting the keys
-
doRun
Gets the results for a specified run number. Different run numbers correspond to different randomizations of the data. Results produced should be sent to the current ResultListener- Specified by:
doRun
in interfaceResultProducer
- Parameters:
run
- the run number to get results for.- Throws:
Exception
- if a problem occurs while getting the results
-
getKeyNames
Gets the names of each of the columns produced for a single run. This method should really be static.- Specified by:
getKeyNames
in interfaceResultProducer
- Returns:
- an array containing the name of each column
-
getKeyTypes
Gets the data types of each of the columns produced for a single run. This method should really be static.- Specified by:
getKeyTypes
in interfaceResultProducer
- Returns:
- an array containing objects of the type of each column. The objects should be Strings, or Doubles.
-
getResultNames
Gets the names of each of the columns produced for a single run. This method should really be static.- Specified by:
getResultNames
in interfaceResultProducer
- Returns:
- an array containing the name of each column
-
getResultTypes
Gets the data types of each of the columns produced for a single run. This method should really be static.- Specified by:
getResultTypes
in interfaceResultProducer
- Returns:
- an array containing objects of the type of each column. The objects should be Strings, or Doubles.
-
getCompatibilityState
Gets a description of the internal settings of the result producer, sufficient for distinguishing a ResultProducer instance from another with different settings (ignoring those settings set through this interface). For example, a cross-validation ResultProducer may have a setting for the number of folds. For a given state, the results produced should be compatible. Typically if a ResultProducer is an OptionHandler, this string will represent the command line arguments required to set the ResultProducer to that state.- Specified by:
getCompatibilityState
in interfaceResultProducer
- Returns:
- the description of the ResultProducer state, or null if no state is defined
-
outputFileTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getOutputFile
Get the value of OutputFile.- Returns:
- Value of OutputFile.
-
setOutputFile
Set the value of OutputFile.- Parameters:
value
- Value to assign to OutputFile.
-
randomizeDataTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getRandomizeData
public boolean getRandomizeData()Get if dataset is to be randomized.- Returns:
- true if dataset is to be randomized
-
setRandomizeData
public void setRandomizeData(boolean value) Set to true if dataset is to be randomized.- Parameters:
value
- true if dataset is to be randomized
-
rawOutputTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getRawOutput
public boolean getRawOutput()Get if raw split evaluator output is to be saved.- Returns:
- true if raw split evalutor output is to be saved
-
setRawOutput
public void setRawOutput(boolean value) Set to true if raw split evaluator output is to be saved.- Parameters:
value
- true if output is to be saved
-
splitEvaluatorTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getSplitEvaluator
Get the SplitEvaluator.- Returns:
- the SplitEvaluator.
-
setSplitEvaluator
Set the SplitEvaluator.- Parameters:
value
- new SplitEvaluator to use.
-
testsetDirTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getTestsetDir
Returns the currently set directory for the test sets.- Returns:
- the directory
-
setTestsetDir
Sets the directory to use for the test sets.- Parameters:
value
- the directory to use
-
testsetPrefixTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getTestsetPrefix
Returns the currently set prefix.- Returns:
- the prefix
-
setTestsetPrefix
Sets the prefix to use for the test sets.- Parameters:
value
- the prefix
-
testsetSuffixTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getTestsetSuffix
Returns the currently set suffix.- Returns:
- the suffix
-
setTestsetSuffix
Sets the suffix to use for the test sets.- Parameters:
value
- the suffix
-
relationFindTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getRelationFind
Returns the currently set regular expression to use on the relation name.- Returns:
- the regular expression
-
setRelationFind
Sets the regular expression to use on the relation name.- Parameters:
value
- the regular expression
-
relationReplaceTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getRelationReplace
Returns the currently set replacement string to use on the relation name.- Returns:
- the replacement string
-
setRelationReplace
Sets the replacement string to use on the relation name.- Parameters:
value
- the regular expression
-
toString
Gets a text descrption of the result producer. -
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-