Package weka.core
Class Instances
- All Implemented Interfaces:
Serializable
,Iterable<Instance>
,Collection<Instance>
,List<Instance>
,RevisionHandler
Class for handling an ordered set of weighted instances.
Typical usage:
import weka.core.converters.ConverterUtils.DataSource; ... // Read all the instances in the file (ARFF, CSV, XRFF, ...) DataSource source = new DataSource(filename); Instances instances = source.getDataSet(); // Make the last attribute be the class instances.setClassIndex(instances.numAttributes() - 1); // Print header and instances. System.out.println("\nDataset:\n"); System.out.println(instances); ...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
- Version:
- $Revision: 15569 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
The keyword used to denote the start of the arff data sectionstatic final String
The keyword used to denote the start of an arff headerstatic final String
The filename extension that should be used for arff filesstatic final String
The filename extension that should be used for bin. -
Constructor Summary
ConstructorDescriptionReads an ARFF file from a reader, and assigns a weight of one to each instance.Deprecated.Creates an empty set of instances.Constructor copying all instances and references to the header information from the given set of instances.Constructor creating an empty set of instances.Creates a new set of instances by copying a subset of another set. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Adds one instance at the given position in the list.boolean
Adds one instance to the end of the set.boolean
Returns true if all attribute weights are the same and false otherwise.boolean
Returns true if all instance weights are the same and false otherwise.attribute
(int index) Returns an attribute.Returns an attribute given its name.attributeStats
(int index) Calculates summary statistics on the values that appear in this set of instances for a specified attribute.double[]
attributeToDoubleArray
(int index) Gets the value of all instances in this dataset for a particular attribute.boolean
checkForAttributeType
(int attType) Checks for attributes of the given type in the datasetboolean
Checks for string attributes in the datasetboolean
checkInstance
(Instance instance) Checks if the given instance is compatible with this dataset.Returns the class attribute.int
Returns the class attribute's index.void
Compactifies the set of instances.void
delete()
Removes all instances from the set.void
delete
(int index) Removes an instance at the given position from the set.void
deleteAttributeAt
(int position) Deletes an attribute at the given position (0 to numAttributes() - 1).void
deleteAttributeType
(int attType) Deletes all attributes of the given type in the dataset.void
Deletes all string attributes in the dataset.void
deleteWithMissing
(int attIndex) Removes all instances with missing values for a particular attribute from the dataset.void
Removes all instances with missing values for a particular attribute from the dataset.void
Removes all instances with a missing class value from the dataset.Returns an enumeration of all the attributes.Returns an enumeration of all instances in the dataset.boolean
equalHeaders
(Instances dataset) Checks if two headers are equivalent.equalHeadersMsg
(Instances dataset) Checks if two headers are equivalent.Returns the first instance in the set.get
(int index) Returns the instance at the given position.getRandomNumberGenerator
(long seed) Returns a random number generator.Returns the revision string.void
insertAttributeAt
(Attribute att, int position) Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing.instance
(int index) Returns the instance at the given position.double
kthSmallestValue
(int attIndex, int k) Returns the kth-smallest attribute value of a numeric attribute.double
kthSmallestValue
(Attribute att, int k) Returns the kth-smallest attribute value of a numeric attribute.Returns the last instance in the set.static void
Main method for this class.double
meanOrMode
(int attIndex) Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.double
meanOrMode
(Attribute att) Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.static Instances
mergeInstances
(Instances first, Instances second) Merges two sets of Instances together.int
Returns the number of attributes.int
Returns the number of class labels.int
numDistinctValues
(int attIndex) Returns the number of distinct values of a given attribute.int
Returns the number of distinct values of a given attribute.int
Returns the number of instances in the dataset.void
Shuffles the instances in the set so that they are ordered randomly.boolean
readInstance
(Reader reader) Deprecated.instead of using this method in conjunction with thereadInstance(Reader)
method, one should use theArffLoader
orDataSource
class instead.Returns the relation's name.remove
(int index) Removes the instance at the given position.void
renameAttribute
(int att, String name) Renames an attribute.void
renameAttribute
(Attribute att, String name) Renames an attribute.void
renameAttributeValue
(int att, int val, String name) Renames the value of a nominal (or string) attribute value.void
renameAttributeValue
(Attribute att, String val, String name) Renames the value of a nominal (or string) attribute value.void
replaceAttributeAt
(Attribute att, int position) Replaces the attribute at the given position (0 to numAttributes()) with the given attribute and sets all its values to be missing.Creates a new dataset of the same size as this dataset using random sampling with replacement.resampleWithWeights
(Random random) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights.resampleWithWeights
(Random random, boolean representUsingWeights) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights.resampleWithWeights
(Random random, boolean[] sampled) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights.resampleWithWeights
(Random random, boolean[] sampled, boolean representUsingWeights) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights.resampleWithWeights
(Random random, boolean[] sampled, boolean representUsingWeights, double sampleSize) Creates a new dataset from this dataset using random sampling with replacement according to current instance weights.resampleWithWeights
(Random random, double[] weights) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the given weight vector.resampleWithWeights
(Random random, double[] weights, boolean[] sampled) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the given weight vector.resampleWithWeights
(Random random, double[] weights, boolean[] sampled, boolean representUsingWeights) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the given weight vector.resampleWithWeights
(Random random, double[] weights, boolean[] sampled, boolean representUsingWeights, double sampleSize) Creates a new dataset from this dataset using random sampling with replacement according to the given weight vector.Replaces the instance at the given position.void
setAttributeWeight
(int att, double weight) Sets the weight of an attribute.void
setAttributeWeight
(Attribute att, double weight) Sets the weight of an attribute.void
Sets the class attribute.void
setClassIndex
(int classIndex) Sets the class index of the set.void
setRelationName
(String newName) Sets the relation's name.int
size()
Returns the number of instances in the dataset.void
sort
(int attIndex) Sorts the instances based on an attribute.void
Sorts the instances based on an attribute.void
stableSort
(int attIndex) Sorts the instances based on an attribute, using a stable sort.void
stableSort
(Attribute att) Sorts the instances based on an attribute, using a stable sort.void
stratify
(int numFolds) Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).Create a copy of the structure.double
Computes the sum of all the instances' weights.void
swap
(int i, int j) Swaps two instances in the set.static void
Method for testing this class.testCV
(int numFolds, int numFold) Creates the test set for one fold of a cross-validation on the dataset.toString()
Returns the dataset as a string in ARFF format.Generates a string summarizing the set of instances.trainCV
(int numFolds, int numFold) Creates the training set for one fold of a cross-validation on the dataset.Creates the training set for one fold of a cross-validation on the dataset.double
variance
(int attIndex) Computes the variance for a numeric attribute.double
Computes the variance for a numeric attribute.double[]
Computes the variance for all numeric attributes simultaneously.Methods inherited from class java.util.AbstractList
addAll, clear, equals, hashCode, indexOf, iterator, lastIndexOf, listIterator, listIterator, subList
Methods inherited from class java.util.AbstractCollection
addAll, contains, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray
Methods inherited from interface java.util.Collection
parallelStream, removeIf, stream, toArray
Methods inherited from interface java.util.List
addAll, contains, containsAll, isEmpty, remove, removeAll, replaceAll, retainAll, sort, spliterator, toArray, toArray
-
Field Details
-
FILE_EXTENSION
The filename extension that should be used for arff files- See Also:
-
SERIALIZED_OBJ_FILE_EXTENSION
The filename extension that should be used for bin. serialized instances files- See Also:
-
ARFF_RELATION
The keyword used to denote the start of an arff header- See Also:
-
ARFF_DATA
The keyword used to denote the start of the arff data section- See Also:
-
-
Constructor Details
-
Instances
Reads an ARFF file from a reader, and assigns a weight of one to each instance. Lets the index of the class attribute be undefined (negative).- Parameters:
reader
- the reader- Throws:
IOException
- if the ARFF file is not read successfully
-
Instances
Deprecated.instead of using this method in conjunction with thereadInstance(Reader)
method, one should use theArffLoader
orDataSource
class instead.Reads the header of an ARFF file from a reader and reserves space for the given number of instances. Lets the class index be undefined (negative).- Parameters:
reader
- the readercapacity
- the capacity- Throws:
IllegalArgumentException
- if the header is not read successfully or the capacity is negative.IOException
- if there is a problem with the reader.- See Also:
-
Instances
Constructor copying all instances and references to the header information from the given set of instances.- Parameters:
dataset
- the set to be copied
-
Instances
Constructor creating an empty set of instances. Copies references to the header information from the given set of instances. Sets the capacity of the set of instances to 0 if its negative.- Parameters:
dataset
- the instances from which the header information is to be takencapacity
- the capacity of the new dataset
-
Instances
Creates a new set of instances by copying a subset of another set.- Parameters:
source
- the set of instances from which a subset is to be createdfirst
- the index of the first instance to be copiedtoCopy
- the number of instances to be copied- Throws:
IllegalArgumentException
- if first and toCopy are out of range
-
Instances
Creates an empty set of instances. Uses the given attribute information. Sets the capacity of the set of instances to 0 if its negative. Given attribute information must not be changed after this constructor has been used.- Parameters:
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the set- Throws:
IllegalArgumentException
- if attribute names are not unique
-
-
Method Details
-
stringFreeStructure
Create a copy of the structure. If the data has string or relational attributes, theses are replaced by empty copies. Other attributes are left unmodified, but the underlying list structure holding references to the attributes is shallow-copied, so that other Instances objects with a reference to this list are not affected.- Returns:
- a copy of the instance structure.
-
add
Adds one instance to the end of the set. Shallow copies instance before it is added. Increases the size of the dataset if it is not large enough. Does not check if the instance is compatible with the dataset. Note: String or relational values are not transferred.- Specified by:
add
in interfaceCollection<Instance>
- Specified by:
add
in interfaceList<Instance>
- Overrides:
add
in classAbstractList<Instance>
- Parameters:
instance
- the instance to be added
-
add
Adds one instance at the given position in the list. Shallow copies instance before it is added. Increases the size of the dataset if it is not large enough. Does not check if the instance is compatible with the dataset. Note: String or relational values are not transferred. -
allAttributeWeightsIdentical
public boolean allAttributeWeightsIdentical()Returns true if all attribute weights are the same and false otherwise. Returns true if there are no attributes. The class attribute (if set) is skipped when this test is performed. -
allInstanceWeightsIdentical
public boolean allInstanceWeightsIdentical()Returns true if all instance weights are the same and false otherwise. Returns true if there are no instances. -
attribute
Returns an attribute.- Parameters:
index
- the attribute's index (index starts with 0)- Returns:
- the attribute at the given position
-
attribute
Returns an attribute given its name. If there is more than one attribute with the same name, it returns the first one. Returns null if the attribute can't be found.- Parameters:
name
- the attribute's name- Returns:
- the attribute with the given name, null if the attribute can't be found
-
checkForAttributeType
public boolean checkForAttributeType(int attType) Checks for attributes of the given type in the dataset- Parameters:
attType
- the attribute type to look for- Returns:
- true if attributes of the given type are present
-
checkForStringAttributes
public boolean checkForStringAttributes()Checks for string attributes in the dataset- Returns:
- true if string attributes are present, false otherwise
-
checkInstance
Checks if the given instance is compatible with this dataset. Only looks at the size of the instance and the ranges of the values for nominal and string attributes.- Parameters:
instance
- the instance to check- Returns:
- true if the instance is compatible with the dataset
-
classAttribute
Returns the class attribute.- Returns:
- the class attribute
- Throws:
UnassignedClassException
- if the class is not set
-
classIndex
public int classIndex()Returns the class attribute's index. Returns negative number if it's undefined.- Returns:
- the class index as an integer
-
compactify
public void compactify()Compactifies the set of instances. Decreases the capacity of the set so that it matches the number of instances in the set. -
delete
public void delete()Removes all instances from the set. -
delete
public void delete(int index) Removes an instance at the given position from the set.- Parameters:
index
- the instance's position (index starts with 0)
-
deleteAttributeAt
public void deleteAttributeAt(int position) Deletes an attribute at the given position (0 to numAttributes() - 1). Attribute objects after the deletion point are copied so that their indices can be decremented. Creates a fresh list to hold the old and new attribute objects.- Parameters:
position
- the attribute's position (position starts with 0)- Throws:
IllegalArgumentException
- if the given index is out of range or the class attribute is being deleted
-
deleteAttributeType
public void deleteAttributeType(int attType) Deletes all attributes of the given type in the dataset. A deep copy of the attribute information is performed before an attribute is deleted.- Parameters:
attType
- the attribute type to delete- Throws:
IllegalArgumentException
- if attribute couldn't be successfully deleted (probably because it is the class attribute).
-
deleteStringAttributes
public void deleteStringAttributes()Deletes all string attributes in the dataset. A deep copy of the attribute information is performed before an attribute is deleted.- Throws:
IllegalArgumentException
- if string attribute couldn't be successfully deleted (probably because it is the class attribute).- See Also:
-
deleteWithMissing
public void deleteWithMissing(int attIndex) Removes all instances with missing values for a particular attribute from the dataset.- Parameters:
attIndex
- the attribute's index (index starts with 0)
-
deleteWithMissing
Removes all instances with missing values for a particular attribute from the dataset.- Parameters:
att
- the attribute
-
deleteWithMissingClass
public void deleteWithMissingClass()Removes all instances with a missing class value from the dataset.- Throws:
UnassignedClassException
- if class is not set
-
enumerateAttributes
Returns an enumeration of all the attributes. The class attribute (if set) is skipped by this enumeration.- Returns:
- enumeration of all the attributes.
-
enumerateInstances
Returns an enumeration of all instances in the dataset.- Returns:
- enumeration of all instances in the dataset
-
equalHeadersMsg
Checks if two headers are equivalent. If not, then returns a message why they differ.- Parameters:
dataset
- another dataset- Returns:
- null if the header of the given dataset is equivalent to this header, otherwise a message with details on why they differ
-
equalHeaders
Checks if two headers are equivalent.- Parameters:
dataset
- another dataset- Returns:
- true if the header of the given dataset is equivalent to this header
-
firstInstance
Returns the first instance in the set.- Returns:
- the first instance in the set
-
getRandomNumberGenerator
Returns a random number generator. The initial seed of the random number generator depends on the given seed and the hash code of a string representation of a instances chosen based on the given seed.- Parameters:
seed
- the given seed- Returns:
- the random number generator
-
insertAttributeAt
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. Shallow copies the attribute before it is inserted. Existing attribute objects at and after the insertion point are also copied so that their indices can be incremented. Creates a fresh list to hold the old and new attribute objects.- Parameters:
att
- the attribute to be insertedposition
- the attribute's position (position starts with 0)- Throws:
IllegalArgumentException
- if the given index is out of range
-
instance
Returns the instance at the given position.- Parameters:
index
- the instance's index (index starts with 0)- Returns:
- the instance at the given position
-
get
Returns the instance at the given position. -
kthSmallestValue
Returns the kth-smallest attribute value of a numeric attribute.- Parameters:
att
- the Attribute objectk
- the value of k- Returns:
- the kth-smallest value
-
kthSmallestValue
public double kthSmallestValue(int attIndex, int k) Returns the kth-smallest attribute value of a numeric attribute. NOTE CHANGE: Missing values (NaN values) are now treated as Double.MAX_VALUE. Also, the order of the instances in the data is no longer affected.- Parameters:
attIndex
- the attribute's indexk
- the value of k- Returns:
- the kth-smallest value
-
lastInstance
Returns the last instance in the set.- Returns:
- the last instance in the set
-
meanOrMode
public double meanOrMode(int attIndex) Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.- Parameters:
attIndex
- the attribute's index (index starts with 0)- Returns:
- the mean or the mode
-
meanOrMode
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.- Parameters:
att
- the attribute- Returns:
- the mean or the mode
-
numAttributes
public int numAttributes()Returns the number of attributes.- Returns:
- the number of attributes as an integer
-
numClasses
public int numClasses()Returns the number of class labels.- Returns:
- the number of class labels as an integer if the class attribute is nominal, 1 otherwise.
- Throws:
UnassignedClassException
- if the class is not set
-
numDistinctValues
public int numDistinctValues(int attIndex) Returns the number of distinct values of a given attribute. The value 'missing' is not counted.- Parameters:
attIndex
- the attribute (index starts with 0)- Returns:
- the number of distinct values of a given attribute
-
numDistinctValues
Returns the number of distinct values of a given attribute. The value 'missing' is not counted.- Parameters:
att
- the attribute- Returns:
- the number of distinct values of a given attribute
-
numInstances
public int numInstances()Returns the number of instances in the dataset.- Returns:
- the number of instances in the dataset as an integer
-
size
public int size()Returns the number of instances in the dataset.- Specified by:
size
in interfaceCollection<Instance>
- Specified by:
size
in interfaceList<Instance>
- Specified by:
size
in classAbstractCollection<Instance>
- Returns:
- the number of instances in the dataset as an integer
-
randomize
Shuffles the instances in the set so that they are ordered randomly.- Parameters:
random
- a random number generator
-
readInstance
Deprecated.instead of using this method in conjunction with thereadInstance(Reader)
method, one should use theArffLoader
orDataSource
class instead.Reads a single instance from the reader and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance. This method does not check for carriage return at the end of the line.- Parameters:
reader
- the reader- Returns:
- false if end of file has been reached
- Throws:
IOException
- if the information is not read successfully- See Also:
-
replaceAttributeAt
Replaces the attribute at the given position (0 to numAttributes()) with the given attribute and sets all its values to be missing. Shallow copies the given attribute before it is inserted. Creates a fresh list to hold the old and new attribute objects.- Parameters:
att
- the attribute to be insertedposition
- the attribute's position (position starts with 0)- Throws:
IllegalArgumentException
- if the given index is out of range
-
relationName
Returns the relation's name.- Returns:
- the relation's name as a string
-
remove
Removes the instance at the given position. -
renameAttribute
Renames an attribute. This change only affects this dataset.- Parameters:
att
- the attribute's index (index starts with 0)name
- the new name
-
setAttributeWeight
Sets the weight of an attribute. This change only affects this dataset.- Parameters:
att
- the attributeweight
- the new weight
-
setAttributeWeight
public void setAttributeWeight(int att, double weight) Sets the weight of an attribute. This change only affects this dataset.- Parameters:
att
- the attribute's index (index starts with 0)weight
- the new weight
-
renameAttribute
Renames an attribute. This change only affects this dataset.- Parameters:
att
- the attributename
- the new name
-
renameAttributeValue
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.- Parameters:
att
- the attribute's index (index starts with 0)val
- the value's index (index starts with 0)name
- the new name
-
renameAttributeValue
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.- Parameters:
att
- the attributeval
- the valuename
- the new name
-
resample
Creates a new dataset of the same size as this dataset using random sampling with replacement.- Parameters:
random
- a random number generator- Returns:
- the new dataset
-
resampleWithWeights
Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights. The weights of the instances in the new dataset are set to one. See also resampleWithWeights(Random, double[], boolean[]).- Parameters:
random
- a random number generator- Returns:
- the new dataset
-
resampleWithWeights
Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights. The weights of the instances in the new dataset are set to one. See also resampleWithWeights(Random, double[], boolean[]).- Parameters:
random
- a random number generatorsampled
- an array indicating what has been sampled- Returns:
- the new dataset
-
resampleWithWeights
Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights. See also resampleWithWeights(Random, double[], boolean[]).- Parameters:
random
- a random number generatorrepresentUsingWeights
- if true, copies are represented using weights in resampled data- Returns:
- the new dataset
-
resampleWithWeights
public Instances resampleWithWeights(Random random, boolean[] sampled, boolean representUsingWeights) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the current instance weights. See also resampleWithWeights(Random, double[], boolean[]).- Parameters:
random
- a random number generatorsampled
- an array indicating what has been sampledrepresentUsingWeights
- if true, copies are represented using weights in resampled data- Returns:
- the new dataset
-
resampleWithWeights
public Instances resampleWithWeights(Random random, boolean[] sampled, boolean representUsingWeights, double sampleSize) Creates a new dataset from this dataset using random sampling with replacement according to current instance weights. The size of the sample can be specified as a percentage of this dataset. See also resampleWithWeights(Random, double[], boolean[]).- Parameters:
random
- a random number generatorsampled
- an array indicating what has been sampled, can be nullrepresentUsingWeights
- if true, copies are represented using weights in resampled datasampleSize
- size of the new dataset as a percentage of the size of this dataset- Returns:
- the new dataset
- Throws:
IllegalArgumentException
- if the weights array is of the wrong length or contains negative weights.
-
resampleWithWeights
Creates a new dataset of the same size as this dataset using random sampling with replacement according to the given weight vector. The weights of the instances in the new dataset are set to one. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive. See also resampleWithWeights(Random, double[], boolean[]).- Parameters:
random
- a random number generatorweights
- the weight vector- Returns:
- the new dataset
- Throws:
IllegalArgumentException
- if the weights array is of the wrong length or contains negative weights.
-
resampleWithWeights
Creates a new dataset of the same size as this dataset using random sampling with replacement according to the given weight vector. The weights of the instances in the new dataset are set to one. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive. Uses Walker's method, see pp. 232 of "Stochastic Simulation" by B.D. Ripley (1987).- Parameters:
random
- a random number generatorweights
- the weight vectorsampled
- an array indicating what has been sampled, can be null- Returns:
- the new dataset
- Throws:
IllegalArgumentException
- if the weights array is of the wrong length or contains negative weights.
-
resampleWithWeights
public Instances resampleWithWeights(Random random, double[] weights, boolean[] sampled, boolean representUsingWeights) Creates a new dataset of the same size as this dataset using random sampling with replacement according to the given weight vector. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive. Uses Walker's method, see pp. 232 of "Stochastic Simulation" by B.D. Ripley (1987).- Parameters:
random
- a random number generatorweights
- the weight vectorsampled
- an array indicating what has been sampled, can be nullrepresentUsingWeights
- if true, copies are represented using weights in resampled data- Returns:
- the new dataset
- Throws:
IllegalArgumentException
- if the weights array is of the wrong length or contains negative weights.
-
resampleWithWeights
public Instances resampleWithWeights(Random random, double[] weights, boolean[] sampled, boolean representUsingWeights, double sampleSize) Creates a new dataset from this dataset using random sampling with replacement according to the given weight vector. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive. Uses Walker's method, see pp. 232 of "Stochastic Simulation" by B.D. Ripley (1987). The size of the sample can be specified as a percentage of this dataset.- Parameters:
random
- a random number generatorweights
- the weight vectorsampled
- an array indicating what has been sampled, can be nullrepresentUsingWeights
- if true, copies are represented using weights in resampled datasampleSize
- size of the new dataset as a percentage of the size of this dataset- Returns:
- the new dataset
- Throws:
IllegalArgumentException
- if the weights array is of the wrong length or contains negative weights.
-
set
Replaces the instance at the given position. Shallow copies instance before it is added. Does not check if the instance is compatible with the dataset. Note: String or relational values are not transferred. -
setClass
Sets the class attribute.- Parameters:
att
- attribute to be the class
-
setClassIndex
public void setClassIndex(int classIndex) Sets the class index of the set. If the class index is negative there is assumed to be no class. (ie. it is undefined)- Parameters:
classIndex
- the new class index (index starts with 0)- Throws:
IllegalArgumentException
- if the class index is too big or < 0
-
setRelationName
Sets the relation's name.- Parameters:
newName
- the new relation name.
-
sort
public void sort(int attIndex) Sorts the instances based on an attribute. For numeric attributes, instances are sorted in ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.- Parameters:
attIndex
- the attribute's index (index starts with 0)
-
sort
Sorts the instances based on an attribute. For numeric attributes, instances are sorted into ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.- Parameters:
att
- the attribute
-
stableSort
public void stableSort(int attIndex) Sorts the instances based on an attribute, using a stable sort. For numeric attributes, instances are sorted in ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.- Parameters:
attIndex
- the attribute's index (index starts with 0)
-
stableSort
Sorts the instances based on an attribute, using a stable sort. For numeric attributes, instances are sorted into ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.- Parameters:
att
- the attribute
-
stratify
public void stratify(int numFolds) Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).- Parameters:
numFolds
- the number of folds in the cross-validation- Throws:
UnassignedClassException
- if the class is not set
-
sumOfWeights
public double sumOfWeights()Computes the sum of all the instances' weights.- Returns:
- the sum of all the instances' weights as a double
-
testCV
Creates the test set for one fold of a cross-validation on the dataset.- Parameters:
numFolds
- the number of folds in the cross-validation. Must be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...- Returns:
- the test set as a set of weighted instances
- Throws:
IllegalArgumentException
- if the number of folds is less than 2 or greater than the number of instances.
-
toString
Returns the dataset as a string in ARFF format. Strings are quoted if they contain whitespace characters, or if they are a question mark.- Overrides:
toString
in classAbstractCollection<Instance>
- Returns:
- the dataset in ARFF format as a string
-
trainCV
Creates the training set for one fold of a cross-validation on the dataset.- Parameters:
numFolds
- the number of folds in the cross-validation. Must be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...- Returns:
- the training set
- Throws:
IllegalArgumentException
- if the number of folds is less than 2 or greater than the number of instances.
-
trainCV
Creates the training set for one fold of a cross-validation on the dataset. The data is subsequently randomized based on the given random number generator.- Parameters:
numFolds
- the number of folds in the cross-validation. Must be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...random
- the random number generator- Returns:
- the training set
- Throws:
IllegalArgumentException
- if the number of folds is less than 2 or greater than the number of instances.
-
variances
public double[] variances()Computes the variance for all numeric attributes simultaneously. This is faster than calling variance() for each attribute. The resulting array has as many dimensions as there are attributes. Array elements corresponding to non-numeric attributes are set to 0.- Returns:
- the array containing the variance values
-
variance
public double variance(int attIndex) Computes the variance for a numeric attribute.- Parameters:
attIndex
- the numeric attribute (index starts with 0)- Returns:
- the variance if the attribute is numeric
- Throws:
IllegalArgumentException
- if the attribute is not numeric
-
variance
Computes the variance for a numeric attribute.- Parameters:
att
- the numeric attribute- Returns:
- the variance if the attribute is numeric
- Throws:
IllegalArgumentException
- if the attribute is not numeric
-
attributeStats
Calculates summary statistics on the values that appear in this set of instances for a specified attribute.- Parameters:
index
- the index of the attribute to summarize (index starts with 0)- Returns:
- an AttributeStats object with it's fields calculated.
-
attributeToDoubleArray
public double[] attributeToDoubleArray(int index) Gets the value of all instances in this dataset for a particular attribute. Useful in conjunction with Utils.sort to allow iterating through the dataset in sorted order for some attribute.- Parameters:
index
- the index of the attribute.- Returns:
- an array containing the value of the desired attribute for each instance in the dataset.
-
toSummaryString
Generates a string summarizing the set of instances. Gives a breakdown for each attribute indicating the number of missing/discrete/unique values and other information.- Returns:
- a string summarizing the dataset
-
swap
public void swap(int i, int j) Swaps two instances in the set.- Parameters:
i
- the first instance's index (index starts with 0)j
- the second instance's index (index starts with 0)
-
mergeInstances
Merges two sets of Instances together. The resulting set will have all the attributes of the first set plus all the attributes of the second set. The number of instances in both sets must be the same.- Parameters:
first
- the first set of Instancessecond
- the second set of Instances- Returns:
- the merged set of Instances
- Throws:
IllegalArgumentException
- if the datasets are not the same size
-
test
Method for testing this class.- Parameters:
argv
- should contain one element: the name of an ARFF file
-
main
Main method for this class. The following calls are possible:-
weka.core.Instances
help
prints a short list of possible commands. -
weka.core.Instances
<filename>
prints a summary of a set of instances. -
weka.core.Instances
merge <filename1> <filename2>
merges the two datasets (must have same number of instances) and outputs the results on stdout. -
weka.core.Instances
append <filename1> <filename2>
appends the second dataset to the first one (must have same headers) and outputs the results on stdout. -
weka.core.Instances
headers <filename1> <filename2>
Compares the headers of the two datasets and prints whether they match or not. -
weka.core.Instances
randomize <seed> <filename>
randomizes the dataset with the given seed and outputs the result on stdout.
- Parameters:
args
- the commandline parameters
-
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
readInstance(Reader)
method, one should use theArffLoader
orDataSource
class instead.