public abstract class ClassificationHashedTextDataLoader extends HashedTextDataLoader
addOriginalDocument(java.lang.String, int)
instead so that the
original documents have a class label associated with them.
getDataSet()
then returns a classification data set, where the
class label for each data point is the label provided when
addOriginalDocument
was called.
HashedTextDataLoader.newText(java.lang.String)
are inherently
not part of the original data set, so do not need or receive a class label.Modifier and Type | Field and Description |
---|---|
protected List<Integer> |
classLabels
The list of the true class labels for the data that was loaded before
HashedTextDataLoader.finishAdding() was called. |
protected CategoricalData |
labelInfo
The information about the class label that would be predicted for a
classification data set.
|
noMoreAdding, storageSpace, vectors, wordCounts, workSpace
Constructor and Description |
---|
ClassificationHashedTextDataLoader(int dimensionSize,
Tokenizer tokenizer,
WordWeighting weighting)
Creates an new hashed text data loader for classification problems.
|
ClassificationHashedTextDataLoader(Tokenizer tokenizer,
WordWeighting weighting)
Creates an new hashed text data loader for classification problems, it
uses a relatively large default size of 222 for the dimension
of the space.
|
Modifier and Type | Method and Description |
---|---|
protected int |
addOriginalDocument(String text)
Should use
addOriginalDocument(java.lang.String, int) instead. |
protected int |
addOriginalDocument(String text,
int label)
To be called by the
HashedTextDataLoader.initialLoad() method. |
ClassificationDataSet |
getDataSet()
Returns a new data set containing the original data points that were
loaded with this loader.
|
protected abstract void |
setLabelInfo()
The classification label data stored in
labelInfo must be set
if the text loader is to return a classification data set. |
finishAdding, getTextVectorCreator, initialLoad, newText, newText
protected List<Integer> classLabels
HashedTextDataLoader.finishAdding()
was called.protected CategoricalData labelInfo
public ClassificationHashedTextDataLoader(Tokenizer tokenizer, WordWeighting weighting)
tokenizer
- the tokenization method to break up strings withweighting
- the scheme to set the weights for feature vectors.public ClassificationHashedTextDataLoader(int dimensionSize, Tokenizer tokenizer, WordWeighting weighting)
dimensionSize
- the size of the hashed space to use.tokenizer
- the tokenization method to break up strings withweighting
- the scheme to set the weights for feature vectors.protected abstract void setLabelInfo()
labelInfo
must be set
if the text loader is to return a classification data set. As such, this
abstract class exists to force the user to set it, in this way they can
not forget. getDataSet()
just before
HashedTextDataLoader.initialLoad()
is called.protected int addOriginalDocument(String text)
addOriginalDocument(java.lang.String, int)
instead.addOriginalDocument
in class HashedTextDataLoader
text
- the text of the data to addprotected int addOriginalDocument(String text, int label)
HashedTextDataLoader.initialLoad()
method.
It will take in the text and add a new document
vector to the data set. Once all text documents
have been loaded, this method should never be
called again. text
- the text of the document to addlabel
- the classification label for this documentpublic ClassificationDataSet getDataSet()
HashedTextDataLoader
getDataSet
in class HashedTextDataLoader
Copyright © 2017. All rights reserved.