Package weka.core.converters
Class ArffLoader.ArffReader
java.lang.Object
weka.core.converters.ArffLoader.ArffReader
- All Implemented Interfaces:
RevisionHandler
- Enclosing class:
- ArffLoader
Reads data from an ARFF file, either in incremental or batch mode.
Typical code for batch usage:
BufferedReader reader = new BufferedReader(new FileReader("/some/where/file.arff")); ArffReader arff = new ArffReader(reader); Instances data = arff.getData(); data.setClassIndex(data.numAttributes() - 1);Typical code for incremental usage:
BufferedReader reader = new BufferedReader(new FileReader("/some/where/file.arff")); ArffReader arff = new ArffReader(reader, 1000); Instances data = arff.getStructure(); data.setClassIndex(data.numAttributes() - 1); Instance inst; while ((inst = arff.readInstance(data)) != null) { data.add(inst); }
- Version:
- $Revision: 14509 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz), fracpete (fracpete at waikato dot ac dot nz)
-
Constructor Summary
ConstructorDescriptionArffReader
(Reader reader) Reads the data completely from the reader.ArffReader
(Reader reader, int capacity) ArffReader
(Reader reader, int capacity, boolean batch) Reads only the header and reserves the specified space for instances.ArffReader
(Reader reader, Instances template, int lines, int capacity, boolean batch, String... fieldSepAndEnclosures) Initializes the reader without reading the header according to the specified template.ArffReader
(Reader reader, Instances template, int lines, int capacity, String... fieldSepAndEnclosures) Initializes the reader without reading the header according to the specified template.ArffReader
(Reader reader, Instances template, int lines, String... fieldSepAndEnclosures) Reads the data without header according to the specified template. -
Method Summary
Modifier and TypeMethodDescriptiongetData()
Returns the data that was readint
returns the current line numberboolean
Get whether to retain the values of string attributes in memory (in the header) when reading incrementally.Returns the revision string.Returns the header formatreadInstance
(Instances structure) Reads a single instance using the tokenizer and returns it.readInstance
(Instances structure, boolean flag) Reads a single instance using the tokenizer and returns it.void
setRetainStringValues
(boolean retain) Set whether to retain the values of string attributes in memory (in the header) when reading incrementally.
-
Constructor Details
-
ArffReader
Reads the data completely from the reader. The data can be accessed via thegetData()
method.- Parameters:
reader
- the reader to use- Throws:
IOException
- if something goes wrong- See Also:
-
ArffReader
- Throws:
IOException
-
ArffReader
Reads only the header and reserves the specified space for instances. Further instances can be read viareadInstance()
.- Parameters:
reader
- the reader to usecapacity
- the capacity of the new datasetbatch
- true if reading in batch mode- Throws:
IOException
- if something goes wrongIOException
- if a problem occurs- See Also:
-
ArffReader
public ArffReader(Reader reader, Instances template, int lines, String... fieldSepAndEnclosures) throws IOException Reads the data without header according to the specified template. The data can be accessed via thegetData()
method.- Parameters:
reader
- the reader to usetemplate
- the template headerlines
- the lines read so farfieldSepAndEnclosures
- an optional array of Strings containing the field separator and enclosures to use instead of the defaults. The first entry in the array is expected to be the single character field separator to use; the remaining entries (if any) are enclosure characters to use.- Throws:
IOException
- if something goes wrong- See Also:
-
ArffReader
public ArffReader(Reader reader, Instances template, int lines, int capacity, String... fieldSepAndEnclosures) throws IOException Initializes the reader without reading the header according to the specified template. The data must be read via thereadInstance()
method.- Parameters:
reader
- the reader to usetemplate
- the template headerlines
- the lines read so farcapacity
- the capacity of the new datasetfieldSepAndEnclosures
- an optional array of Strings containing the field separator and enclosures to use instead of the defaults. The first entry in the array is expected to be the single character field separator to use; the remaining entries (if any) are enclosure characters to use.- Throws:
IOException
- if something goes wrong- See Also:
-
ArffReader
public ArffReader(Reader reader, Instances template, int lines, int capacity, boolean batch, String... fieldSepAndEnclosures) throws IOException Initializes the reader without reading the header according to the specified template. The data must be read via thereadInstance()
method.- Parameters:
reader
- the reader to usetemplate
- the template headerlines
- the lines read so farcapacity
- the capacity of the new datasetbatch
- true if the data is going to be read in batch modefieldSepAndEnclosures
- an optional array of Strings containing the field separator and enclosures to use instead of the defaults. The first entry in the array is expected to be the single character field separator to use; the remaining entries (if any) are enclosure characters to use.- Throws:
IOException
- if something goes wrong- See Also:
-
-
Method Details
-
getLineNo
public int getLineNo()returns the current line number- Returns:
- the current line number
-
readInstance
Reads a single instance using the tokenizer and returns it.- Parameters:
structure
- the dataset header information, will get updated in case of string or relational attributes- Returns:
- null if end of file has been reached
- Throws:
IOException
- if the information is not read successfully
-
readInstance
Reads a single instance using the tokenizer and returns it.- Parameters:
structure
- the dataset header information, will get updated in case of string or relational attributesflag
- if method should test for carriage return after each instance- Returns:
- null if end of file has been reached
- Throws:
IOException
- if the information is not read successfully
-
getStructure
Returns the header format- Returns:
- the header format
-
getData
Returns the data that was read- Returns:
- the data
-
setRetainStringValues
public void setRetainStringValues(boolean retain) Set whether to retain the values of string attributes in memory (in the header) when reading incrementally.- Parameters:
retain
- true if string values are to be retained in memory when reading incrementally
-
getRetainStringValues
public boolean getRetainStringValues()Get whether to retain the values of string attributes in memory (in the header) when reading incrementally.- Returns:
- true if string values are to be retained in memory when reading incrementally
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-