RandomForest (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.classifiers.trees.RandomForest

All Implemented Interfaces:

Serializable, Cloneable, Classifier, Parameterized, Regressor
```
public class RandomForest
extends Object
implements Classifier, Regressor, Parameterized
```
Random Forest is an extension of Bagging that is applied only to DecisionTrees. It works in a similar manner, but also only uses a random sub set of the features for each tree trained. This provides increased performance in accuracy of predictions, and reduced training time over just Bagging.

This class supports learning and predicting with missing values.

Author:

Edward Raff

See Also:

Bagging, Serialized Form

Constructor Summary

Constructors
Constructor and Description

RandomForest()

RandomForest(int maxForestSize)

Constructors
Constructor and Description
`RandomForest()`
`RandomForest(int maxForestSize)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`autoFeatureSample()` Tells the class to automatically select the number of features to use.
`CategoricalResults`	`classify(DataPoint data)` Performs classification on the given data point.
`RandomForest`	`clone()`
`int`	`getExtraSamples()`
`OnLineStatistics[]`	`getFeatureImportance()` Random Forest can obtain an unbiased estimate of feature importance using a `TreeFeatureImportanceInference` method on the out-of-bag samples during training.
`int`	`getMaxForestSize()` Returns the number of rounds of boosting that will be done, which is also the number of base learners that will be trained
`double`	`getOutOfBagError()` If `isUseOutOfBagError()` is false, then this method will return 0 after training.
`Parameter`	`getParameter(String paramName)` Returns the parameter with the given name.
`List<Parameter>`	`getParameters()` Returns the list of parameters that can be altered for this learner.
`boolean`	`isAutoFeatureSample()` Returns true if heuristics are currently in use for the number of features, or false if the number has been specified.
`boolean`	`isUseOutOfBagError()` Indicates if the out of bag error rate will be computed during training
`boolean`	`isUseOutOfBagImportance()` Indicates if the out of bag feature importance will be computed during training
`double`	`regress(DataPoint data)`
`void`	`setExtraSamples(int i)` RandomForest performs Bagging.
`void`	`setFeatureSamples(int featureSamples)` Instead of using a heuristic, the exact number of features to sample is provided.
`void`	`setMaxForestSize(int maxForestSize)` Sets the maximum number of trees to create for the forest.
`void`	`setUseOutOfBagError(boolean useOutOfBagError)` Sets whether or not to compute the out of bag error during training
`void`	`setUseOutOfBagImportance(boolean useOutOfBagImportance)` Sets whether or not to compute the out of bag importance of each feature during training.
`boolean`	`supportsWeightedData()` Indicates whether the model knows how to train using weighted data points.
`void`	`train(RegressionDataSet dataSet)`
`void`	`train(RegressionDataSet dataSet, ExecutorService threadPool)`
`void`	`trainC(ClassificationDataSet dataSet)` Trains the classifier and constructs a model for classification using the given data set.
`void`	`trainC(ClassificationDataSet dataSet, ExecutorService threadPool)` Trains the classifier and constructs a model for classification using the given data set.

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - RandomForest
```
public RandomForest()
```
  - RandomForest
```
public RandomForest(int maxForestSize)
```
- Method Detail
  - setExtraSamples
```
public void setExtraSamples(int i)
```
    RandomForest performs Bagging. Bagging samples from the training set with replacement, and draws a sampleWithReplacement at least as large as the training set. This controls how many extra samples are taken. If negative, fewer samples will be taken. Using negative values is not recommended.
    
    Parameters:
    
    i - how many extra samples to take
  - getExtraSamples
```
public int getExtraSamples()
```
  - setFeatureSamples
```
public void setFeatureSamples(int featureSamples)
```
    Instead of using a heuristic, the exact number of features to sample is provided. If equal to or larger then the number of features in one of the training data sets, RandomForest degrades to Bagging performed on DecisionTree.
    
    To re-enable the heuristic mode, call autoFeatureSample()
    
    Parameters:
    
    featureSamples - the number of features to randomly select for each tree in the forest.
    
    Throws:
    
    ArithmeticException - if the number given is less then or equal to zero
    
    See Also:
    
    autoFeatureSample(), Bagging
  - autoFeatureSample
```
public void autoFeatureSample()
```
    Tells the class to automatically select the number of features to use. For classification problems, this is the square root of the number of features. For regression, the number of features divided by 3 is used.
  - isAutoFeatureSample
```
public boolean isAutoFeatureSample()
```
    Returns true if heuristics are currently in use for the number of features, or false if the number has been specified.
    
    Returns:
    
    true if heuristics are currently in use for the number of features, or false if the number has been specified.
  - setMaxForestSize
```
public void setMaxForestSize(int maxForestSize)
```
    Sets the maximum number of trees to create for the forest.
    
    Parameters:
    
    maxForestSize - the number of base learners to train
    
    Throws:
    
    ArithmeticException - if the number specified is not a positive value
  - getMaxForestSize
```
public int getMaxForestSize()
```
    Returns the number of rounds of boosting that will be done, which is also the number of base learners that will be trained
    
    Returns:
    
    the number of rounds of boosting that will be done, which is also the number of base learners that will be trained
  - setUseOutOfBagError
```
public void setUseOutOfBagError(boolean useOutOfBagError)
```
    Sets whether or not to compute the out of bag error during training
    
    Parameters:
    
    useOutOfBagError - true to compute the out of bag error, false to skip it
  - isUseOutOfBagError
```
public boolean isUseOutOfBagError()
```
    Indicates if the out of bag error rate will be computed during training
    
    Returns:
    
    true if the out of bag error will be computed, false otherwise
  - getFeatureImportance
```
public OnLineStatistics[] getFeatureImportance()
```
    Random Forest can obtain an unbiased estimate of feature importance using a TreeFeatureImportanceInference method on the out-of-bag samples during training. Since each tree will produce a different importance score, we also get a set of statistics for each feature rather than just a single score value. These are only computed if setUseOutOfBagImportance(boolean) is set to true.
    
    Returns:
    
    an array of size equal to the number of features, each OnLineStatistics describing the statistics for the importance of each feature. Numeric features start from index 0, and categorical features start from the index equal to the number of numeric features.
  - setUseOutOfBagImportance
```
public void setUseOutOfBagImportance(boolean useOutOfBagImportance)
```
    Sets whether or not to compute the out of bag importance of each feature during training.
    
    Parameters:
    
    useOutOfBagImportance - true to compute the out of bag feature importance, false to skip it
  - isUseOutOfBagImportance
```
public boolean isUseOutOfBagImportance()
```
    Indicates if the out of bag feature importance will be computed during training
    
    Returns:
    
    true if the out of bag importance will be computed, false otherwise
  - getOutOfBagError
```
public double getOutOfBagError()
```
    If isUseOutOfBagError() is false, then this method will return 0 after training. Otherwise, it will return the out of bag error estimate after training has completed. For classification problems, this is the 0/1 loss error rate. Regression problems return the mean squared error.
    
    Returns:
    
    the out of bag error estimate for this predictor
  - classify
```
public CategoricalResults classify(DataPoint data)
```
    Description copied from interface: Classifier
    
    Performs classification on the given data point.
    
    Specified by:
    
    classify in interface Classifier
    
    Parameters:
    
    data - the data point to classify
    
    Returns:
    
    the results of the classification.
  - trainC
```
public void trainC(ClassificationDataSet dataSet,
                   ExecutorService threadPool)
```
    Description copied from interface: Classifier
    
    Trains the classifier and constructs a model for classification using the given data set. If the training method knows how, it will used the threadPool to conduct training in parallel. This method will block until the training has completed.
    
    Specified by:
    
    trainC in interface Classifier
    
    Parameters:
    
    dataSet - the data set to train on
    
    threadPool - the source of threads to use.
  - trainC
```
public void trainC(ClassificationDataSet dataSet)
```
    Description copied from interface: Classifier
    
    Trains the classifier and constructs a model for classification using the given data set.
    
    Specified by:
    
    trainC in interface Classifier
    
    Parameters:
    
    dataSet - the data set to train on
  - supportsWeightedData
```
public boolean supportsWeightedData()
```
    Description copied from interface: Classifier
    
    Indicates whether the model knows how to train using weighted data points. If it does, the model will train assuming the weights. The values returned by this method may change depending on the parameters set for the model.
    
    Specified by:
    
    supportsWeightedData in interface Classifier
    
    Specified by:
    
    supportsWeightedData in interface Regressor
    
    Returns:
    
    true if the model supports weighted data, false otherwise
  - regress
```
public double regress(DataPoint data)
```
    Specified by:
    
    regress in interface Regressor
  - train
```
public void train(RegressionDataSet dataSet,
                  ExecutorService threadPool)
```
    Specified by:
    
    train in interface Regressor
  - train
```
public void train(RegressionDataSet dataSet)
```
    Specified by:
    
    train in interface Regressor
  - clone
```
public RandomForest clone()
```
    Specified by:
    
    clone in interface Classifier
    
    Specified by:
    
    clone in interface Regressor
    
    Overrides:
    
    clone in class Object
  - getParameters
```
public List<Parameter> getParameters()
```
    Description copied from interface: Parameterized
    
    Returns the list of parameters that can be altered for this learner.
    
    Specified by:
    
    getParameters in interface Parameterized
    
    Returns:
    
    the list of parameters that can be altered for this learner.
  - getParameter
```
public Parameter getParameter(String paramName)
```
    Description copied from interface: Parameterized
    
    Returns the parameter with the given name. Two different strings may map to a single Parameter object. An ASCII only string, and a Unicode style string.
    
    Specified by:
    
    getParameter in interface Parameterized
    
    Parameters:
    
    paramName - the name of the parameter to obtain
    
    Returns:
    
    the Parameter in question, or null if no such named Parameter exists.

Class RandomForest

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

RandomForest

RandomForest

Method Detail

setExtraSamples

getExtraSamples

setFeatureSamples

autoFeatureSample

isAutoFeatureSample

setMaxForestSize

getMaxForestSize

setUseOutOfBagError

isUseOutOfBagError

getFeatureImportance

setUseOutOfBagImportance

isUseOutOfBagImportance

getOutOfBagError

classify

trainC

trainC

supportsWeightedData

regress

train

train

clone

getParameters

getParameter