Class ConverterUtils.DataSource

java.lang.Object
weka.core.converters.ConverterUtils.DataSource
All Implemented Interfaces:
Serializable, RevisionHandler
Enclosing class:
ConverterUtils

public static class ConverterUtils.DataSource extends Object implements Serializable, RevisionHandler
Helper class for loading data from files and URLs. Via the ConverterUtils class it determines which converter to use for loading the data into memory. If the chosen converter is an incremental one, then the data will be loaded incrementally, otherwise as batch. In both cases the same interface will be used (hasMoreElements, nextElement). Before the data can be read again, one has to call the reset method. The data source can also be initialized with an Instances object, in order to provide a unified interface to files and already loaded datasets.
Version:
$Revision: 15656 $
Author:
FracPete (fracpete at waikato dot ac dot nz)
See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    Initializes the datasource with the given input stream.
    DataSource(String location)
    Tries to load the data from the file.
    Initializes the datasource with the given Loader.
    Initializes the datasource with the given dataset.
  • Method Summary

    Modifier and Type
    Method
    Description
    returns the full dataset, can be null in case of an error.
    getDataSet(int classIndex)
    returns the full dataset with the specified class index set, can be null in case of an error.
    returns the determined loader, null if the DataSource was initialized with data alone and not a file/URL.
    Returns the revision string.
    returns the structure of the data.
    getStructure(int classIndex)
    returns the structure of the data, with the defined class index.
    boolean
    returns whether there are more Instance objects in the data.
    static boolean
    isArff(String location)
    returns whether the extension of the location is likely to be of ARFF format, i.e., ending in ".arff" or ".arff.gz" (case-insensitive).
    boolean
    returns whether the loader is an incremental one.
    static void
    main(String[] args)
    for testing only - takes a data file as input.
    returns the next element and sets the specified dataset, null if none available.
    static Instances
    read(InputStream stream)
    convencience method for loading a dataset in batch mode from a stream.
    static Instances
    read(String location)
    convencience method for loading a dataset in batch mode.
    static Instances
    read(Loader loader)
    convencience method for loading a dataset in batch mode.
    void
    resets the loader.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • DataSource

      public DataSource(String location) throws Exception
      Tries to load the data from the file. Can be either a regular file or a web location (http://, https://, ftp:// or file://).
      Parameters:
      location - the name of the file to load
      Throws:
      Exception - if initialization fails
    • DataSource

      public DataSource(Instances inst)
      Initializes the datasource with the given dataset.
      Parameters:
      inst - the dataset to use
    • DataSource

      public DataSource(Loader loader)
      Initializes the datasource with the given Loader.
      Parameters:
      loader - the Loader to use
    • DataSource

      public DataSource(InputStream stream)
      Initializes the datasource with the given input stream. This stream is always interpreted as ARFF.
      Parameters:
      stream - the stream to use
  • Method Details

    • isArff

      public static boolean isArff(String location)
      returns whether the extension of the location is likely to be of ARFF format, i.e., ending in ".arff" or ".arff.gz" (case-insensitive).
      Parameters:
      location - the file location to check
      Returns:
      true if the location seems to be of ARFF format
    • isIncremental

      public boolean isIncremental()
      returns whether the loader is an incremental one.
      Returns:
      true if the loader is a true incremental one
    • getLoader

      public Loader getLoader()
      returns the determined loader, null if the DataSource was initialized with data alone and not a file/URL.
      Returns:
      the loader used for retrieving the data
    • getDataSet

      public Instances getDataSet() throws Exception
      returns the full dataset, can be null in case of an error.
      Returns:
      the full dataset
      Throws:
      Exception - if resetting of loader fails
    • getDataSet

      public Instances getDataSet(int classIndex) throws Exception
      returns the full dataset with the specified class index set, can be null in case of an error.
      Parameters:
      classIndex - the class index for the dataset
      Returns:
      the full dataset
      Throws:
      Exception - if resetting of loader fails
    • reset

      public void reset() throws Exception
      resets the loader.
      Throws:
      Exception - if resetting fails
    • getStructure

      public Instances getStructure() throws Exception
      returns the structure of the data.
      Returns:
      the structure of the data
      Throws:
      Exception - if something goes wrong
    • getStructure

      public Instances getStructure(int classIndex) throws Exception
      returns the structure of the data, with the defined class index.
      Parameters:
      classIndex - the class index for the dataset
      Returns:
      the structure of the data
      Throws:
      Exception - if something goes wrong
    • hasMoreElements

      public boolean hasMoreElements(Instances structure)
      returns whether there are more Instance objects in the data.
      Parameters:
      structure - the structure of the dataset
      Returns:
      true if there are more Instance objects available
      See Also:
    • nextElement

      public Instance nextElement(Instances dataset)
      returns the next element and sets the specified dataset, null if none available.
      Parameters:
      dataset - the dataset to set for the instance
      Returns:
      the next Instance
    • read

      public static Instances read(String location) throws Exception
      convencience method for loading a dataset in batch mode.
      Parameters:
      location - the dataset to load
      Returns:
      the dataset
      Throws:
      Exception - if loading fails
    • read

      public static Instances read(InputStream stream) throws Exception
      convencience method for loading a dataset in batch mode from a stream.
      Parameters:
      stream - the stream to load the dataset from
      Returns:
      the dataset
      Throws:
      Exception - if loading fails
    • read

      public static Instances read(Loader loader) throws Exception
      convencience method for loading a dataset in batch mode.
      Parameters:
      loader - the loader to get the dataset from
      Returns:
      the dataset
      Throws:
      Exception - if loading fails
    • main

      public static void main(String[] args) throws Exception
      for testing only - takes a data file as input.
      Parameters:
      args - the commandline arguments
      Throws:
      Exception - if something goes wrong
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision