ClassificationTextDataLoader (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.text.TextDataLoader
- - jsat.text.ClassificationTextDataLoader

All Implemented Interfaces:

Serializable, TextVectorCreator
```
public abstract class ClassificationTextDataLoader
extends TextDataLoader
```
This class provides a framework for loading classification datasets made of text documents as vectors. This extension uses addOriginalDocument(java.lang.String, int) instead so that the original documents have a class label associated with them. getDataSet() then returns a classification data set, where the class label for each data point is the label provided when addOriginalDocument was called.
New vectors created with TextDataLoader.newText(java.lang.String) are inherently not part of the original data set, so do not need or receive a class label.

Author:

Edward Raff

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`protected List<Integer>`	`classLabels` The list of the true class labels for the data that was loaded before `TextDataLoader.finishAdding()` was called.
`protected CategoricalData`	`labelInfo` The information about the class label that would be predicted for a classification data set.

Fields inherited from class jsat.text.TextDataLoader
allWords, noMoreAdding, storageSpace, termDocumentFrequencys, tokenizer, vectors, wordCounts, wordIndex, workSpace

Constructor Summary

Constructors
Constructor and Description

ClassificationTextDataLoader(Tokenizer tokenizer, WordWeighting weighting)
Creates a new text data loader

Constructors
Constructor and Description
`ClassificationTextDataLoader(Tokenizer tokenizer, WordWeighting weighting)` Creates a new text data loader

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected int`	`addOriginalDocument(String text)` Should use `addOriginalDocument(java.lang.String, int)` instead.
`protected int`	`addOriginalDocument(String text, int label)` To be called by the `TextDataLoader.initialLoad()` method.
`ClassificationDataSet`	`getDataSet()` Returns a new data set containing the original data points that were loaded with this loader.
`protected abstract void`	`setLabelInfo()` The classification label data stored in `labelInfo` must be set if the text loader is to return a classification data set.

Methods inherited from class jsat.text.TextDataLoader
finishAdding, getMinimumOccurrenceDTF, getTermFrequency, getTextVectorCreator, getWordForIndex, initialLoad, newText, newText

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - classLabels
```
protected final List<Integer> classLabels
```
    The list of the true class labels for the data that was loaded before TextDataLoader.finishAdding() was called.
  - labelInfo
```
protected CategoricalData labelInfo
```
    The information about the class label that would be predicted for a classification data set.
- Constructor Detail
  - ClassificationTextDataLoader
```
public ClassificationTextDataLoader(Tokenizer tokenizer,
                                    WordWeighting weighting)
```
    Creates a new text data loader
    
    Parameters:
    
    tokenizer - the string tokenizer to use on each input
    
    weighting - the weighting scheme to apply to each vector in the collection
- Method Detail
  - setLabelInfo
```
protected abstract void setLabelInfo()
```
    The classification label data stored in labelInfo must be set if the text loader is to return a classification data set. As such, this abstract class exists to force the user to set it, in this way they can not forget.
    This will be called in getDataSet() just before TextDataLoader.initialLoad() is called.
  - addOriginalDocument
```
protected int addOriginalDocument(String text)
```
    Should use addOriginalDocument(java.lang.String, int) instead.
    
    Overrides:
    
    addOriginalDocument in class TextDataLoader
    
    Parameters:
    
    text - the text of the data to add
    
    Returns:
    
    the index of the created document for the given text. Starts from zero and counts up.
  - addOriginalDocument
```
protected int addOriginalDocument(String text,
                                  int label)
```
    To be called by the TextDataLoader.initialLoad() method. It will take in the text and add a new document vector to the data set. Once all text documents have been loaded, this method should never be called again.
    This method is thread safe
    
    Parameters:
    
    text - the text of the document to add
    
    label - the classification label for this document
    
    Returns:
    
    the index of the created document for the given text. Starts from zero and counts up.
  - getDataSet
```
public ClassificationDataSet getDataSet()
```
    Description copied from class: TextDataLoader
    
    Returns a new data set containing the original data points that were loaded with this loader.
    
    Overrides:
    
    getDataSet in class TextDataLoader
    
    Returns:
    
    an appropriate data set for this loader

Class ClassificationTextDataLoader

Field Summary

Fields inherited from class jsat.text.TextDataLoader

Constructor Summary

Method Summary

Methods inherited from class jsat.text.TextDataLoader

Methods inherited from class java.lang.Object

Field Detail

classLabels

labelInfo

Constructor Detail

ClassificationTextDataLoader

Method Detail

setLabelInfo

addOriginalDocument

addOriginalDocument

getDataSet