public class NGramTokenizer extends Object implements Tokenizer
Constructor and Description |
---|
NGramTokenizer(int n,
Tokenizer base,
boolean allSubN)
Creates a new n-gramer
|
Modifier and Type | Method and Description |
---|---|
List<String> |
tokenize(String input)
Breaks the input string into a series of tokens that may be used as
features for a classifier.
|
void |
tokenize(String input,
StringBuilder workSpace,
List<String> storageSpace)
Breaks the input string into a series of tokens that may be used as
features for a classifier.
|
public NGramTokenizer(int n, Tokenizer base, boolean allSubN)
n
- the length of the ngrams. While it should be greater than 1, 1
is still a valid input.base
- the base tokenizer to create n-grams fromallSubN
- true
to generate all sub n-grams, false
to
only return the n-grams specifiedpublic List<String> tokenize(String input)
Tokenizer
public void tokenize(String input, StringBuilder workSpace, List<String> storageSpace)
Tokenizer
Copyright © 2017. All rights reserved.