Class ArffLoader.ArffReader

java.lang.Object
weka.core.converters.ArffLoader.ArffReader
All Implemented Interfaces:
RevisionHandler
Enclosing class:
ArffLoader

public static class ArffLoader.ArffReader extends Object implements RevisionHandler
Reads data from an ARFF file, either in incremental or batch mode.

Typical code for batch usage:

 BufferedReader reader =
   new BufferedReader(new FileReader("/some/where/file.arff"));
 ArffReader arff = new ArffReader(reader);
 Instances data = arff.getData();
 data.setClassIndex(data.numAttributes() - 1);
 
Typical code for incremental usage:
 BufferedReader reader =
   new BufferedReader(new FileReader("/some/where/file.arff"));
 ArffReader arff = new ArffReader(reader, 1000);
 Instances data = arff.getStructure();
 data.setClassIndex(data.numAttributes() - 1);
 Instance inst;
 while ((inst = arff.readInstance(data)) != null) {
   data.add(inst);
 }
 
Version:
$Revision: 14509 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz), fracpete (fracpete at waikato dot ac dot nz)
  • Constructor Details

    • ArffReader

      public ArffReader(Reader reader) throws IOException
      Reads the data completely from the reader. The data can be accessed via the getData() method.
      Parameters:
      reader - the reader to use
      Throws:
      IOException - if something goes wrong
      See Also:
    • ArffReader

      public ArffReader(Reader reader, int capacity) throws IOException
      Throws:
      IOException
    • ArffReader

      public ArffReader(Reader reader, int capacity, boolean batch) throws IOException
      Reads only the header and reserves the specified space for instances. Further instances can be read via readInstance().
      Parameters:
      reader - the reader to use
      capacity - the capacity of the new dataset
      batch - true if reading in batch mode
      Throws:
      IOException - if something goes wrong
      IOException - if a problem occurs
      See Also:
    • ArffReader

      public ArffReader(Reader reader, Instances template, int lines, String... fieldSepAndEnclosures) throws IOException
      Reads the data without header according to the specified template. The data can be accessed via the getData() method.
      Parameters:
      reader - the reader to use
      template - the template header
      lines - the lines read so far
      fieldSepAndEnclosures - an optional array of Strings containing the field separator and enclosures to use instead of the defaults. The first entry in the array is expected to be the single character field separator to use; the remaining entries (if any) are enclosure characters to use.
      Throws:
      IOException - if something goes wrong
      See Also:
    • ArffReader

      public ArffReader(Reader reader, Instances template, int lines, int capacity, String... fieldSepAndEnclosures) throws IOException
      Initializes the reader without reading the header according to the specified template. The data must be read via the readInstance() method.
      Parameters:
      reader - the reader to use
      template - the template header
      lines - the lines read so far
      capacity - the capacity of the new dataset
      fieldSepAndEnclosures - an optional array of Strings containing the field separator and enclosures to use instead of the defaults. The first entry in the array is expected to be the single character field separator to use; the remaining entries (if any) are enclosure characters to use.
      Throws:
      IOException - if something goes wrong
      See Also:
    • ArffReader

      public ArffReader(Reader reader, Instances template, int lines, int capacity, boolean batch, String... fieldSepAndEnclosures) throws IOException
      Initializes the reader without reading the header according to the specified template. The data must be read via the readInstance() method.
      Parameters:
      reader - the reader to use
      template - the template header
      lines - the lines read so far
      capacity - the capacity of the new dataset
      batch - true if the data is going to be read in batch mode
      fieldSepAndEnclosures - an optional array of Strings containing the field separator and enclosures to use instead of the defaults. The first entry in the array is expected to be the single character field separator to use; the remaining entries (if any) are enclosure characters to use.
      Throws:
      IOException - if something goes wrong
      See Also:
  • Method Details

    • getLineNo

      public int getLineNo()
      returns the current line number
      Returns:
      the current line number
    • readInstance

      public Instance readInstance(Instances structure) throws IOException
      Reads a single instance using the tokenizer and returns it.
      Parameters:
      structure - the dataset header information, will get updated in case of string or relational attributes
      Returns:
      null if end of file has been reached
      Throws:
      IOException - if the information is not read successfully
    • readInstance

      public Instance readInstance(Instances structure, boolean flag) throws IOException
      Reads a single instance using the tokenizer and returns it.
      Parameters:
      structure - the dataset header information, will get updated in case of string or relational attributes
      flag - if method should test for carriage return after each instance
      Returns:
      null if end of file has been reached
      Throws:
      IOException - if the information is not read successfully
    • getStructure

      public Instances getStructure()
      Returns the header format
      Returns:
      the header format
    • getData

      public Instances getData()
      Returns the data that was read
      Returns:
      the data
    • setRetainStringValues

      public void setRetainStringValues(boolean retain)
      Set whether to retain the values of string attributes in memory (in the header) when reading incrementally.
      Parameters:
      retain - true if string values are to be retained in memory when reading incrementally
    • getRetainStringValues

      public boolean getRetainStringValues()
      Get whether to retain the values of string attributes in memory (in the header) when reading incrementally.
      Returns:
      true if string values are to be retained in memory when reading incrementally
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision