Package weka.datagenerators.clusterers
Class SubspaceCluster
java.lang.Object
weka.datagenerators.DataGenerator
weka.datagenerators.ClusterGenerator
weka.datagenerators.clusterers.SubspaceCluster
- All Implemented Interfaces:
Serializable
,OptionHandler
,Randomizable
,RevisionHandler
A data generator that produces data points in
hyperrectangular subspace clusters.
Valid options are:
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Uses a random uniform distribution for the instances in the cluster.
-U <range> Generates totally uniformly distributed instances in the cluster.
-G <range> Uses a Gaussian distribution for the instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
- Version:
- $Revision: 15747 $
- Author:
- Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
cluster subtype: continuousstatic final int
cluster type: gaussianstatic final int
cluster subtype: integerstatic final Tag[]
the tags for the cluster typesstatic final Tag[]
the tags for the cluster typesstatic final int
cluster type: total uniformstatic final int
cluster type: uniform/random -
Constructor Summary
ConstructorDescriptioninitializes the generator, sets the number of clusters to 0, since user has to specify them explicitly -
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this propertyReturns the tip text for this propertyInitializes the format for the dataset produced.Generate an example of the dataset.Generate all examples of the dataset.Compiles documentation about the data generation after the generation processCompiles documentation about the data generation before the generation processreturns the range of boolean attributes.returns the currently set clustersreturns the range of nominal attributesint[]
returns array that stores the number of values for a nominal attribute.String[]
Gets the current settings of the datagenerator.Returns the revision string.boolean
Gets the single mode flag.Returns a string describing this data generator.boolean
isBoolean
(int index) Returns true if attribute is booleanboolean
isNominal
(int index) Returns true if attribute is nominalReturns an enumeration describing the available options.static void
Main method for testing this class.Returns the tip text for this propertyvoid
setBooleanCols
(Range value) Sets which attributes are boolean.void
setBooleanIndices
(String rangeList) Sets which attributes are booleanvoid
setClusterDefinitions
(ClusterDefinition[] value) sets the clusters to usevoid
setNominalCols
(Range value) Sets which attributes are nominal.void
setNominalIndices
(String rangeList) Sets which attributes are nominalvoid
setOptions
(String[] options) Parses a list of options for this object.Methods inherited from class weka.datagenerators.ClusterGenerator
classFlagTipText, getClassFlag, getNumAttributes, numAttributesTipText, setClassFlag, setNumAttributes
Methods inherited from class weka.datagenerators.DataGenerator
debugTipText, defaultOutput, enumToVector, formatTipText, getDatasetFormat, getDebug, getEpilogue, getNumExamplesAct, getOutput, getPrologue, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, runDataGenerator, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed
-
Field Details
-
UNIFORM_RANDOM
public static final int UNIFORM_RANDOMcluster type: uniform/random- See Also:
-
TOTAL_UNIFORM
public static final int TOTAL_UNIFORMcluster type: total uniform- See Also:
-
GAUSSIAN
public static final int GAUSSIANcluster type: gaussian- See Also:
-
TAGS_CLUSTERTYPE
the tags for the cluster types -
CONTINUOUS
public static final int CONTINUOUScluster subtype: continuous- See Also:
-
INTEGER
public static final int INTEGERcluster subtype: integer- See Also:
-
TAGS_CLUSTERSUBTYPE
the tags for the cluster types
-
-
Constructor Details
-
SubspaceCluster
public SubspaceCluster()initializes the generator, sets the number of clusters to 0, since user has to specify them explicitly
-
-
Method Details
-
globalInfo
Returns a string describing this data generator.- Returns:
- a description of the data generator suitable for displaying in the explorer/experimenter gui
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classClusterGenerator
- Returns:
- an enumeration of all the available options
-
setOptions
Parses a list of options for this object. Valid options are:-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Uses a random uniform distribution for the instances in the cluster.
-U <range> Generates totally uniformly distributed instances in the cluster.
-G <range> Uses a Gaussian distribution for the instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classClusterGenerator
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
setBooleanIndices
Sets which attributes are boolean- Parameters:
rangeList
- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
IllegalArgumentException
- if an invalid range list is supplied
-
setBooleanCols
Sets which attributes are boolean.- Parameters:
value
- the range to use
-
getBooleanCols
returns the range of boolean attributes.- Returns:
- the range of boolean attributes
-
booleanColsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNominalIndices
Sets which attributes are nominal- Parameters:
rangeList
- a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
eg: first-3,5,6-last- Throws:
IllegalArgumentException
- if an invalid range list is supplied
-
setNominalCols
Sets which attributes are nominal.- Parameters:
value
- the range to use
-
getNominalCols
returns the range of nominal attributes- Returns:
- the range of nominal attributes
-
nominalColsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getOptions
Gets the current settings of the datagenerator.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classClusterGenerator
- Returns:
- an array of strings suitable for passing to setOptions
- See Also:
-
DataGenerator.removeBlacklist(String[])
-
getClusterDefinitions
returns the currently set clusters- Returns:
- the currently set clusters
-
setClusterDefinitions
sets the clusters to use- Parameters:
value
- the clusters do use- Throws:
Exception
- if clusters are not the correct class
-
clusterDefinitionsTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getSingleModeFlag
public boolean getSingleModeFlag()Gets the single mode flag.- Specified by:
getSingleModeFlag
in classDataGenerator
- Returns:
- true if methode generateExample can be used.
-
defineDataFormat
Initializes the format for the dataset produced.- Overrides:
defineDataFormat
in classDataGenerator
- Returns:
- the output data format
- Throws:
Exception
- data format could not be defined- See Also:
-
DataGenerator.defaultRelationName()
-
isBoolean
public boolean isBoolean(int index) Returns true if attribute is boolean- Parameters:
index
- of the attribute- Returns:
- true if the attribute is boolean
-
isNominal
public boolean isNominal(int index) Returns true if attribute is nominal- Parameters:
index
- of the attribute- Returns:
- true if the attribute is nominal
-
getNumValues
public int[] getNumValues()returns array that stores the number of values for a nominal attribute.- Returns:
- the array that stores the number of values for a nominal attribute
-
generateExample
Generate an example of the dataset.- Specified by:
generateExample
in classDataGenerator
- Returns:
- the instance generated
- Throws:
Exception
- if format not defined or generating
examples one by one is not possible, because voting is chosen
-
generateExamples
Generate all examples of the dataset.- Specified by:
generateExamples
in classDataGenerator
- Returns:
- the instance generated
- Throws:
Exception
- if format not defined
-
generateFinished
Compiles documentation about the data generation after the generation process- Specified by:
generateFinished
in classDataGenerator
- Returns:
- string with additional information about generated dataset
- Throws:
Exception
- no input structure has been defined
-
generateStart
Compiles documentation about the data generation before the generation process- Specified by:
generateStart
in classDataGenerator
- Returns:
- string with additional information
-
getRevision
Returns the revision string.- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
args
- should contain arguments for the data producer:
-