public class RandomForest extends Object implements Classifier, Regressor, Parameterized
Bagging
that is applied only to
DecisionTrees
. It works in a similar manner, but also
only uses a random sub set of the features for each tree trained. This
provides increased performance in accuracy of predictions, and reduced
training time over just Bagging.Bagging
,
Serialized FormConstructor and Description |
---|
RandomForest() |
RandomForest(int maxForestSize) |
Modifier and Type | Method and Description |
---|---|
void |
autoFeatureSample()
Tells the class to automatically select the number of features to use.
|
CategoricalResults |
classify(DataPoint data)
Performs classification on the given data point.
|
RandomForest |
clone() |
int |
getExtraSamples() |
OnLineStatistics[] |
getFeatureImportance()
Random Forest can obtain an unbiased estimate of feature importance using
a
TreeFeatureImportanceInference method on the out-of-bag samples
during training. |
int |
getMaxForestSize()
Returns the number of rounds of boosting that will be done, which is also the number of base learners that will be trained
|
double |
getOutOfBagError()
If
isUseOutOfBagError() is false, then this method will return
0 after training. |
Parameter |
getParameter(String paramName)
Returns the parameter with the given name.
|
List<Parameter> |
getParameters()
Returns the list of parameters that can be altered for this learner.
|
boolean |
isAutoFeatureSample()
Returns true if heuristics are currently in use for the number of features, or false if the number has been specified.
|
boolean |
isUseOutOfBagError()
Indicates if the out of bag error rate will be computed during training
|
boolean |
isUseOutOfBagImportance()
Indicates if the out of bag feature importance will be computed during
training
|
double |
regress(DataPoint data) |
void |
setExtraSamples(int i)
RandomForest performs Bagging.
|
void |
setFeatureSamples(int featureSamples)
Instead of using a heuristic, the exact number of features to sample is provided.
|
void |
setMaxForestSize(int maxForestSize)
Sets the maximum number of trees to create for the forest.
|
void |
setUseOutOfBagError(boolean useOutOfBagError)
Sets whether or not to compute the out of bag error during training
|
void |
setUseOutOfBagImportance(boolean useOutOfBagImportance)
Sets whether or not to compute the out of bag importance of each feature
during training.
|
boolean |
supportsWeightedData()
Indicates whether the model knows how to train using weighted data points.
|
void |
train(RegressionDataSet dataSet) |
void |
train(RegressionDataSet dataSet,
ExecutorService threadPool) |
void |
trainC(ClassificationDataSet dataSet)
Trains the classifier and constructs a model for classification using the
given data set.
|
void |
trainC(ClassificationDataSet dataSet,
ExecutorService threadPool)
Trains the classifier and constructs a model for classification using the
given data set.
|
public RandomForest()
public RandomForest(int maxForestSize)
public void setExtraSamples(int i)
i
- how many extra samples to takepublic int getExtraSamples()
public void setFeatureSamples(int featureSamples)
Bagging
performed on DecisionTree
.autoFeatureSample()
featureSamples
- the number of features to randomly select for each tree in the forest.ArithmeticException
- if the number given is less then or equal to zeroautoFeatureSample()
,
Bagging
public void autoFeatureSample()
public boolean isAutoFeatureSample()
public void setMaxForestSize(int maxForestSize)
maxForestSize
- the number of base learners to trainArithmeticException
- if the number specified is not a positive valuepublic int getMaxForestSize()
public void setUseOutOfBagError(boolean useOutOfBagError)
useOutOfBagError
- true to compute the out of bag error, false to skip itpublic boolean isUseOutOfBagError()
public OnLineStatistics[] getFeatureImportance()
TreeFeatureImportanceInference
method on the out-of-bag samples
during training. Since each tree will produce a different importance
score, we also get a set of statistics for each feature rather than just
a single score value. These are only computed if setUseOutOfBagImportance(boolean)
is set to true.OnLineStatistics
describing the statistics for the importance of
each feature. Numeric features start from index 0, and categorical
features start from the index equal to the number of numeric features.public void setUseOutOfBagImportance(boolean useOutOfBagImportance)
useOutOfBagImportance
- true to compute the out of bag
feature importance, false to skip itpublic boolean isUseOutOfBagImportance()
public double getOutOfBagError()
isUseOutOfBagError()
is false, then this method will return
0 after training. Otherwise, it will return the out of bag error estimate
after training has completed. For classification problems, this is the 0/1
loss error rate. Regression problems return the mean squared error.public CategoricalResults classify(DataPoint data)
Classifier
classify
in interface Classifier
data
- the data point to classifypublic void trainC(ClassificationDataSet dataSet, ExecutorService threadPool)
Classifier
trainC
in interface Classifier
dataSet
- the data set to train onthreadPool
- the source of threads to use.public void trainC(ClassificationDataSet dataSet)
Classifier
trainC
in interface Classifier
dataSet
- the data set to train onpublic boolean supportsWeightedData()
Classifier
supportsWeightedData
in interface Classifier
supportsWeightedData
in interface Regressor
public void train(RegressionDataSet dataSet, ExecutorService threadPool)
public void train(RegressionDataSet dataSet)
public RandomForest clone()
public List<Parameter> getParameters()
Parameterized
getParameters
in interface Parameterized
public Parameter getParameter(String paramName)
Parameterized
getParameter
in interface Parameterized
paramName
- the name of the parameter to obtainCopyright © 2017. All rights reserved.