Package | Description |
---|---|
jsat.text | |
jsat.text.tokenizer |
Modifier and Type | Field and Description |
---|---|
protected Tokenizer |
TextDataLoader.tokenizer
Tokenizer to apply to input strings
|
Constructor and Description |
---|
BasicTextVectorCreator(Tokenizer tokenizer,
Map<String,Integer> wordIndex,
WordWeighting weighting)
Creates a new basic text vector creator
|
ClassificationHashedTextDataLoader(int dimensionSize,
Tokenizer tokenizer,
WordWeighting weighting)
Creates an new hashed text data loader for classification problems.
|
ClassificationHashedTextDataLoader(Tokenizer tokenizer,
WordWeighting weighting)
Creates an new hashed text data loader for classification problems, it
uses a relatively large default size of 222 for the dimension
of the space.
|
ClassificationTextDataLoader(Tokenizer tokenizer,
WordWeighting weighting)
Creates a new text data loader
|
HashedTextDataLoader(int dimensionSize,
Tokenizer tokenizer,
WordWeighting weighting) |
HashedTextDataLoader(Tokenizer tokenizer,
WordWeighting weighting) |
HashedTextVectorCreator(int dimensionSize,
Tokenizer tokenizer,
WordWeighting weighting)
Creates a new text vector creator that works with hash-trick features
|
TextDataLoader(Tokenizer tokenizer,
WordWeighting weighting)
Creates a new loader for text datasets
|
Modifier and Type | Class and Description |
---|---|
class |
NaiveTokenizer
A simple tokenizer.
|
class |
NGramTokenizer
This tokenizer creates n-grams, which are sequences of tokens combined into
their own larger token.
|
class |
StemmingTokenizer |
class |
StopWordTokenizer
This tokenizer wraps another such that any stop words that would have been
returned by the base tokenizer are removed.
|
Constructor and Description |
---|
NGramTokenizer(int n,
Tokenizer base,
boolean allSubN)
Creates a new n-gramer
|
StemmingTokenizer(Stemmer stemmer,
Tokenizer baseTokenizer) |
StopWordTokenizer(Tokenizer base,
Collection<String> stopWords)
Creates a new Stop Word tokenizer
|
StopWordTokenizer(Tokenizer base,
String... stopWords)
Creates a new Stop Word tokenizer
|
Copyright © 2017. All rights reserved.