Interface | Description |
---|---|
Tokenizer |
Interface for taking the text of a document and breaking it up into features.
|
Class | Description |
---|---|
NaiveTokenizer |
A simple tokenizer.
|
NGramTokenizer |
This tokenizer creates n-grams, which are sequences of tokens combined into
their own larger token.
|
StemmingTokenizer | |
StopWordTokenizer |
This tokenizer wraps another such that any stop words that would have been
returned by the base tokenizer are removed.
|
Copyright © 2017. All rights reserved.