Class TextDirectoryLoader

java.lang.Object
weka.core.converters.AbstractLoader
weka.core.converters.TextDirectoryLoader
All Implemented Interfaces:
Serializable, CommandlineRunnable, BatchConverter, IncrementalConverter, Loader, OptionHandler, RevisionHandler

public class TextDirectoryLoader extends AbstractLoader implements BatchConverter, IncrementalConverter, OptionHandler, CommandlineRunnable
Loads all text files in a directory and uses the subdirectory names as class labels. The content of the text files will be stored in a String attribute, the filename can be stored as well.

Valid options are:

 -D
  Enables debug output.
  (default: off)
 
 -F
  Stores the filename in an additional attribute.
  (default: off)
 
 -dir <directory>
  The directory to work on.
  (default: current directory)
 
 -charset <charset name>
  The character set to use, e.g UTF-8.
  (default: use the default character set)
 
 -R
  Retain all string attribute values when reading incrementally.
 
Based on code from the TextDirectoryToArff tool. See the
  • Wiki article
  • Version:
    $Revision: 15257 $
    Author:
    Ashraf M. Kibriya (amk14 at cs.waikato.ac.nz), Richard Kirkby (rkirkby at cs.waikato.ac.nz), fracpete (fracpete at waikato dot ac dot nz)
    See Also:
    • Constructor Details

      • TextDirectoryLoader

        public TextDirectoryLoader()
        default constructor
    • Method Details

      • globalInfo

        public String globalInfo()
        Returns a string describing this loader
        Returns:
        a description of the evaluator suitable for displaying in the explorer/experimenter gui
      • listOptions

        public Enumeration<Option> listOptions()
        Lists the available options
        Specified by:
        listOptions in interface OptionHandler
        Returns:
        an enumeration of the available options
      • setOptions

        public void setOptions(String[] options) throws Exception
        Parses a given list of options.

        Valid options are:

         -D
          Enables debug output.
          (default: off)
         
         -F
          Stores the filename in an additional attribute.
          (default: off)
         
         -dir <directory>
          The directory to work on.
          (default: current directory)
         
         -charset <charset name>
          The character set to use, e.g UTF-8.
          (default: use the default character set)
         
        Specified by:
        setOptions in interface OptionHandler
        Parameters:
        options - the options
        Throws:
        Exception - if options cannot be set
      • getOptions

        public String[] getOptions()
        Gets the setting
        Specified by:
        getOptions in interface OptionHandler
        Returns:
        the current setting
      • charSetTipText

        public String charSetTipText()
        the tip text for this property
        Returns:
        the tip text
      • setCharSet

        public void setCharSet(String charSet)
        Set the character set to use when reading text files (an empty string indicates that the default character set will be used).
        Parameters:
        charSet - the character set to use.
      • getCharSet

        public String getCharSet()
        Get the character set to use when reading text files. An empty string indicates that the default character set will be used.
        Returns:
        the character set name to use (or empty string to indicate that the default character set will be used).
      • setDebug

        public void setDebug(boolean value)
        Sets whether to print some debug information.
        Parameters:
        value - if true additional debug information will be printed.
      • getDebug

        public boolean getDebug()
        Gets whether additional debug information is printed.
        Returns:
        true if additional debug information is printed
      • debugTipText

        public String debugTipText()
        the tip text for this property
        Returns:
        the tip text
      • setOutputFilename

        public void setOutputFilename(boolean value)
        Sets whether the filename will be stored as an extra attribute.
        Parameters:
        value - if true the filename will be stored in an extra attribute
      • getOutputFilename

        public boolean getOutputFilename()
        Gets whether the filename will be stored as an extra attribute.
        Returns:
        true if the filename is stored in an extra attribute
      • outputFilenameTipText

        public String outputFilenameTipText()
        the tip text for this property
        Returns:
        the tip text
      • getFileDescription

        public String getFileDescription()
        Returns a description of the file type, actually it's directories.
        Returns:
        a short file description
      • getDirectory

        public File getDirectory()
        get the Dir specified as the source
        Returns:
        the source directory
      • setDirectory

        public void setDirectory(File dir) throws IOException
        sets the source directory
        Parameters:
        dir - the source directory
        Throws:
        IOException - if an error occurs
      • reset

        public void reset()
        Resets the loader ready to read a new data set
        Specified by:
        reset in interface Loader
        Overrides:
        reset in class AbstractLoader
      • setSource

        public void setSource(File dir) throws IOException
        Resets the Loader object and sets the source of the data set to be the supplied File object.
        Specified by:
        setSource in interface Loader
        Overrides:
        setSource in class AbstractLoader
        Parameters:
        dir - the source directory.
        Throws:
        IOException - if an error occurs
      • getStructure

        public Instances getStructure() throws IOException
        Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
        Specified by:
        getStructure in interface Loader
        Specified by:
        getStructure in class AbstractLoader
        Returns:
        the structure of the data set as an empty set of Instances
        Throws:
        IOException - if an error occurs
      • getDataSet

        public Instances getDataSet() throws IOException
        Return the full data set. If the structure hasn't yet been determined by a call to getStructure then method should do so before processing the rest of the data set.
        Specified by:
        getDataSet in interface Loader
        Specified by:
        getDataSet in class AbstractLoader
        Returns:
        the structure of the data set as an empty set of Instances
        Throws:
        IOException - if there is no source or parsing fails
      • getNextInstance

        public Instance getNextInstance(Instances structure) throws IOException
        Process input directories/files incrementally.
        Specified by:
        getNextInstance in interface Loader
        Specified by:
        getNextInstance in class AbstractLoader
        Parameters:
        structure - ignored
        Returns:
        never returns without throwing an exception
        Throws:
        IOException - if a problem occurs
      • getRevision

        public String getRevision()
        Returns the revision string.
        Specified by:
        getRevision in interface RevisionHandler
        Returns:
        the revision
      • main

        public static void main(String[] args)
        Main method.
        Parameters:
        args - should contain the name of an input file.
      • preExecution

        public void preExecution() throws Exception
        Perform any setup stuff that might need to happen before commandline execution. Subclasses should override if they need to do something here
        Specified by:
        preExecution in interface CommandlineRunnable
        Throws:
        Exception - if a problem occurs during setup
      • postExecution

        public void postExecution() throws Exception
        Perform any teardown stuff that might need to happen after execution. Subclasses should override if they need to do something here
        Specified by:
        postExecution in interface CommandlineRunnable
        Throws:
        Exception - if a problem occurs during teardown
      • run

        public void run(Object toRun, String[] args) throws IllegalArgumentException
        Description copied from interface: CommandlineRunnable
        Execute the supplied object.
        Specified by:
        run in interface CommandlineRunnable
        Parameters:
        toRun - the object to execute
        args - any options to pass to the object
        Throws:
        IllegalArgumentException