ClassificationHashedTextDataLoader (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.text.HashedTextDataLoader
- - jsat.text.ClassificationHashedTextDataLoader

All Implemented Interfaces:

Serializable, TextVectorCreator
```
public abstract class ClassificationHashedTextDataLoader
extends HashedTextDataLoader
```
This class provides a framework for loading classification datasets made of text documents as hashed feature vectors. This extension uses addOriginalDocument(java.lang.String, int) instead so that the original documents have a class label associated with them. getDataSet() then returns a classification data set, where the class label for each data point is the label provided when addOriginalDocument was called.
New vectors created with HashedTextDataLoader.newText(java.lang.String) are inherently not part of the original data set, so do not need or receive a class label.

Author:

Edward Raff

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`protected List<Integer>`	`classLabels` The list of the true class labels for the data that was loaded before `HashedTextDataLoader.finishAdding()` was called.
`protected CategoricalData`	`labelInfo` The information about the class label that would be predicted for a classification data set.

Fields inherited from class jsat.text.HashedTextDataLoader
noMoreAdding, storageSpace, vectors, wordCounts, workSpace

Constructor Summary

Constructors
Constructor and Description
`ClassificationHashedTextDataLoader(int dimensionSize, Tokenizer tokenizer, WordWeighting weighting)` Creates an new hashed text data loader for classification problems.
`ClassificationHashedTextDataLoader(Tokenizer tokenizer, WordWeighting weighting)` Creates an new hashed text data loader for classification problems, it uses a relatively large default size of 2²² for the dimension of the space.

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`protected int`	`addOriginalDocument(String text)` Should use `addOriginalDocument(java.lang.String, int)` instead.
`protected int`	`addOriginalDocument(String text, int label)` To be called by the `HashedTextDataLoader.initialLoad()` method.
`ClassificationDataSet`	`getDataSet()` Returns a new data set containing the original data points that were loaded with this loader.
`protected abstract void`	`setLabelInfo()` The classification label data stored in `labelInfo` must be set if the text loader is to return a classification data set.

Methods inherited from class jsat.text.HashedTextDataLoader
finishAdding, getTextVectorCreator, initialLoad, newText, newText

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - classLabels
```
protected List<Integer> classLabels
```
    The list of the true class labels for the data that was loaded before HashedTextDataLoader.finishAdding() was called.
  - labelInfo
```
protected CategoricalData labelInfo
```
    The information about the class label that would be predicted for a classification data set.
- Constructor Detail
  - ClassificationHashedTextDataLoader
```
public ClassificationHashedTextDataLoader(Tokenizer tokenizer,
                                          WordWeighting weighting)
```
    Creates an new hashed text data loader for classification problems, it uses a relatively large default size of 2²² for the dimension of the space.
    
    Parameters:
    
    tokenizer - the tokenization method to break up strings with
    
    weighting - the scheme to set the weights for feature vectors.
  - ClassificationHashedTextDataLoader
```
public ClassificationHashedTextDataLoader(int dimensionSize,
                                          Tokenizer tokenizer,
                                          WordWeighting weighting)
```
    Creates an new hashed text data loader for classification problems.
    
    Parameters:
    
    dimensionSize - the size of the hashed space to use.
    
    tokenizer - the tokenization method to break up strings with
    
    weighting - the scheme to set the weights for feature vectors.
- Method Detail
  - setLabelInfo
```
protected abstract void setLabelInfo()
```
    The classification label data stored in labelInfo must be set if the text loader is to return a classification data set. As such, this abstract class exists to force the user to set it, in this way they can not forget.
    This will be called in getDataSet() just before HashedTextDataLoader.initialLoad() is called.
  - addOriginalDocument
```
protected int addOriginalDocument(String text)
```
    Should use addOriginalDocument(java.lang.String, int) instead.
    
    Overrides:
    
    addOriginalDocument in class HashedTextDataLoader
    
    Parameters:
    
    text - the text of the data to add
    
    Returns:
    
    the index of the created document for the given text. Starts from zero and counts up.
  - addOriginalDocument
```
protected int addOriginalDocument(String text,
                                  int label)
```
    To be called by the HashedTextDataLoader.initialLoad() method. It will take in the text and add a new document vector to the data set. Once all text documents have been loaded, this method should never be called again.
    This method is thread safe.
    
    Parameters:
    
    text - the text of the document to add
    
    label - the classification label for this document
    
    Returns:
    
    the index of the created document for the given text. Starts from zero and counts up.
  - getDataSet
```
public ClassificationDataSet getDataSet()
```
    Description copied from class: HashedTextDataLoader
    
    Returns a new data set containing the original data points that were loaded with this loader.
    
    Overrides:
    
    getDataSet in class HashedTextDataLoader
    
    Returns:
    
    an appropriate data set for this loader

Class ClassificationHashedTextDataLoader

Field Summary

Fields inherited from class jsat.text.HashedTextDataLoader

Constructor Summary

Method Summary

Methods inherited from class jsat.text.HashedTextDataLoader

Methods inherited from class java.lang.Object

Field Detail

classLabels

labelInfo

Constructor Detail

ClassificationHashedTextDataLoader

ClassificationHashedTextDataLoader

Method Detail

setLabelInfo

addOriginalDocument

addOriginalDocument

getDataSet