public abstract class ClassificationTextDataLoader extends TextDataLoader
addOriginalDocument(java.lang.String, int)
instead so that the
original documents have a class label associated with them.
getDataSet()
then returns a classification data set, where the
class label for each data point is the label provided when
addOriginalDocument was called.
TextDataLoader.newText(java.lang.String)
are inherently
not part of the original data set, so do not need or receive a class label.Modifier and Type | Field and Description |
---|---|
protected List<Integer> |
classLabels
The list of the true class labels for the data that was loaded before
TextDataLoader.finishAdding() was called. |
protected CategoricalData |
labelInfo
The information about the class label that would be predicted for a
classification data set.
|
allWords, noMoreAdding, storageSpace, termDocumentFrequencys, tokenizer, vectors, wordCounts, wordIndex, workSpace
Constructor and Description |
---|
ClassificationTextDataLoader(Tokenizer tokenizer,
WordWeighting weighting)
Creates a new text data loader
|
Modifier and Type | Method and Description |
---|---|
protected int |
addOriginalDocument(String text)
Should use
addOriginalDocument(java.lang.String, int) instead. |
protected int |
addOriginalDocument(String text,
int label)
To be called by the
TextDataLoader.initialLoad() method. |
ClassificationDataSet |
getDataSet()
Returns a new data set containing the original data points that were
loaded with this loader.
|
protected abstract void |
setLabelInfo()
The classification label data stored in
labelInfo must be set
if the text loader is to return a classification data set. |
finishAdding, getMinimumOccurrenceDTF, getTermFrequency, getTextVectorCreator, getWordForIndex, initialLoad, newText, newText
protected final List<Integer> classLabels
TextDataLoader.finishAdding()
was called.protected CategoricalData labelInfo
public ClassificationTextDataLoader(Tokenizer tokenizer, WordWeighting weighting)
tokenizer
- the string tokenizer to use on each inputweighting
- the weighting scheme to apply to each vector in the
collectionprotected abstract void setLabelInfo()
labelInfo
must be set
if the text loader is to return a classification data set. As such, this
abstract class exists to force the user to set it, in this way they can
not forget. getDataSet()
just before
TextDataLoader.initialLoad()
is called.protected int addOriginalDocument(String text)
addOriginalDocument(java.lang.String, int)
instead.addOriginalDocument
in class TextDataLoader
text
- the text of the data to addprotected int addOriginalDocument(String text, int label)
TextDataLoader.initialLoad()
method.
It will take in the text and add a new document
vector to the data set. Once all text documents
have been loaded, this method should never be
called again. text
- the text of the document to addlabel
- the classification label for this documentpublic ClassificationDataSet getDataSet()
TextDataLoader
getDataSet
in class TextDataLoader
Copyright © 2017. All rights reserved.