Class WordTokenizer

All Implemented Interfaces:
Serializable, Enumeration<String>, OptionHandler, RevisionHandler

public class WordTokenizer extends CharacterDelimitedTokenizer
A simple tokenizer that is using the java.util.StringTokenizer class to tokenize the strings.

Valid options are:

 -delimiters <value>
  The delimiters to use
  (default ' \r\n\t.,;:'"()?!').
 
Version:
$Revision: 10203 $
Author:
FracPete (fracpete at waikato dot ac dot nz)
See Also:
  • Constructor Details

    • WordTokenizer

      public WordTokenizer()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing the stemmer
      Specified by:
      globalInfo in class Tokenizer
      Returns:
      a description suitable for displaying in the explorer/experimenter gui
    • hasMoreElements

      public boolean hasMoreElements()
      Tests if this enumeration contains more elements.
      Specified by:
      hasMoreElements in interface Enumeration<String>
      Specified by:
      hasMoreElements in class Tokenizer
      Returns:
      true if and only if this enumeration object contains at least one more element to provide; false otherwise.
    • nextElement

      public String nextElement()
      Returns the next element of this enumeration if this enumeration object has at least one more element to provide.
      Specified by:
      nextElement in interface Enumeration<String>
      Specified by:
      nextElement in class Tokenizer
      Returns:
      the next element of this enumeration.
    • tokenize

      public void tokenize(String s)
      Sets the string to tokenize. Tokenization happens immediately.
      Specified by:
      tokenize in class Tokenizer
      Parameters:
      s - the string to tokenize
    • getRevision

      public String getRevision()
      Returns the revision string.
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Runs the tokenizer with the given options and strings to tokenize. The tokens are printed to stdout.
      Parameters:
      args - the commandline options and strings to tokenize