Package weka.core
Class Stopwords
java.lang.Object
weka.core.Stopwords
- All Implemented Interfaces:
RevisionHandler
Class that can test whether a given string is a stop word. Lowercases all
words before the test.
The format for reading and writing is one word per line, lines starting with
'#' are interpreted as comments and therefore skipped.
The default stopwords are based on Rainbow.
Accepts the following parameter:
-i file
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.
- Version:
- $Revision: 10203 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Ashraf M. Kibriya (amk14@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
adds the given word to the stopword list (is automatically converted to lower case and trimmed)void
clear()
removes all stopwordselements()
Returns a sorted enumeration over all stored stopwordsReturns the revision string.boolean
Returns true if the given string is a stop word.static boolean
isStopword
(String str) Returns true if the given string is a stop word.static void
Accepts the following parameter:void
read
(BufferedReader reader) Generates a new Stopwords object from the reader.void
Generates a new Stopwords object from the given filevoid
Generates a new Stopwords object from the given fileboolean
removes the word from the stopword listtoString()
returns the current stopwords in a stringvoid
write
(BufferedWriter writer) Writes the current stopwords to the given writer.void
Writes the current stopwords to the given filevoid
Writes the current stopwords to the given file
-
Constructor Details
-
Stopwords
public Stopwords()initializes the stopwords (based on Rainbow).
-
-
Method Details
-
clear
public void clear()removes all stopwords -
add
adds the given word to the stopword list (is automatically converted to lower case and trimmed)- Parameters:
word
- the word to add
-
remove
removes the word from the stopword list- Parameters:
word
- the word to remove- Returns:
- true if the word was found in the list and then removed
-
is
Returns true if the given string is a stop word.- Parameters:
word
- the word to test- Returns:
- true if the word is a stopword
-
elements
Returns a sorted enumeration over all stored stopwords- Returns:
- the enumeration over all stopwords
-
read
Generates a new Stopwords object from the given file- Parameters:
filename
- the file to read the stopwords from- Throws:
Exception
- if reading fails
-
read
Generates a new Stopwords object from the given file- Parameters:
file
- the file to read the stopwords from- Throws:
Exception
- if reading fails
-
read
Generates a new Stopwords object from the reader. The reader is closed automatically.- Parameters:
reader
- the reader to get the stopwords from- Throws:
Exception
- if reading fails
-
write
Writes the current stopwords to the given file- Parameters:
filename
- the file to write the stopwords to- Throws:
Exception
- if writing fails
-
write
Writes the current stopwords to the given file- Parameters:
file
- the file to write the stopwords to- Throws:
Exception
- if writing fails
-
write
Writes the current stopwords to the given writer. The writer is closed automatically.- Parameters:
writer
- the writer to get the stopwords from- Throws:
Exception
- if writing fails
-
toString
returns the current stopwords in a string -
isStopword
Returns true if the given string is a stop word.- Parameters:
str
- the word to test- Returns:
- true if the word is a stopword
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Returns:
- the revision
-
main
Accepts the following parameter: -i file
loads the stopwords from the given file -o file
saves the stopwords to the given file -p
outputs the current stopwords on stdout Any additional parameters are interpreted as words to test as stopwords.- Parameters:
args
- commandline parameters- Throws:
Exception
- if something goes wrong
-