public class HashedTextVectorCreator extends Object implements TextVectorCreator
Vec
using feature hashing. The tokenization
and
word weighting
method must be provided and already set
up. When constructed the user should make sure the
WordWeighting.setWeight(java.util.List, java.util.List)
method has already been called, or is a stateless weighting (such as
BinaryWordPresent
).Constructor and Description |
---|
HashedTextVectorCreator(int dimensionSize,
Tokenizer tokenizer,
WordWeighting weighting)
Creates a new text vector creator that works with hash-trick features
|
Modifier and Type | Method and Description |
---|---|
Vec |
newText(String input)
Converts the given input text into a vector representation.
|
Vec |
newText(String input,
StringBuilder workSpace,
List<String> storageSpace)
Converts the given input text into a vector representation
|
public HashedTextVectorCreator(int dimensionSize, Tokenizer tokenizer, WordWeighting weighting)
dimensionSize
- the dimension size of the feature spacetokenizer
- the tokenizer to apply to incoming stringsweighting
- the weighting process to apply to each loaded document.public Vec newText(String input)
TextVectorCreator
newText
in interface TextVectorCreator
input
- the input stringpublic Vec newText(String input, StringBuilder workSpace, List<String> storageSpace)
TextVectorCreator
newText
in interface TextVectorCreator
input
- the input stringworkSpace
- an already allocated (but empty) string builder than can
be used as a temporary work space.storageSpace
- an already allocated (but empty) list to place the
tokens intoCopyright © 2017. All rights reserved.