Class CharacterDelimitedTokenizer

java.lang.Object
weka.core.tokenizers.Tokenizer
weka.core.tokenizers.CharacterDelimitedTokenizer
All Implemented Interfaces:
Serializable, Enumeration<String>, OptionHandler, RevisionHandler
Direct Known Subclasses:
NGramTokenizer, WordTokenizer

public abstract class CharacterDelimitedTokenizer extends Tokenizer
Abstract superclass for tokenizers that take characters as delimiters.
Version:
$Revision: 10203 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
  • Constructor Details

    • CharacterDelimitedTokenizer

      public CharacterDelimitedTokenizer()
  • Method Details

    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration of all the available options..
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Tokenizer
      Returns:
      an enumeration of all available options.
    • getOptions

      public String[] getOptions()
      Gets the current option settings for the OptionHandler.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Tokenizer
      Returns:
      the list of current option settings as an array of strings
    • setOptions

      public void setOptions(String[] options) throws Exception
      Sets the OptionHandler's options using the given list. All options will be set (or reset) during this call (i.e. incremental setting of options is not possible).
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Tokenizer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getDelimiters

      public String getDelimiters()
      Get the value of delimiters (not backquoted).
      Returns:
      Value of delimiters.
    • setDelimiters

      public void setDelimiters(String value)
      Set the value of delimiters. For convenienve, the strings "\r", "\n", "\t", "\'", "\\" get automatically translated into their character representations '\r', '\n', '\t', '\'', '\\'. This means, one can either use setDelimiters("\r\n\t\\"); or setDelimiters("\\r\\n\\t\\\\");.
      Parameters:
      value - Value to assign to delimiters.
      See Also:
    • delimitersTipText

      public String delimitersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui