public class DecisionStump extends Object implements Classifier, Regressor, Parameterized
Modifier and Type | Field and Description |
---|---|
protected double[] |
pathRatio
How much of the data went to each path
|
Constructor and Description |
---|
DecisionStump()
Creates a new decision stump
|
Modifier and Type | Method and Description |
---|---|
CategoricalResults |
classify(DataPoint data)
Performs classification on the given data point.
|
DecisionStump |
clone() |
protected static <T> void |
distributMissing(List<List<DataPointPair<T>>> splits,
double[] fracs,
List<DataPointPair<T>> hadMissing)
Distributes a list of datapoints that had missing values to each split, re-weighted by the indicated fractions
|
protected static <T> void |
distributMissing(List<List<DataPointPair<T>>> splits,
List<DataPointPair<T>> hadMissing)
Distributes a list of datapoints that had missing values to each split, re-weighted by the indicated fractions
|
protected double |
getGain(ImpurityScore origScore,
List<List<DataPointPair<Integer>>> aSplit)
From the score for the original set that is being split, this computes
the gain as the improvement in classification from the original split.
|
ImpurityScore.ImpurityMeasure |
getGainMethod() |
int |
getMinResultSplitSize()
Returns the minimum result split size that may be considered for use as
the attribute to split on.
|
int |
getNumberOfPaths()
Returns the number of paths that this decision stump leads to.
|
Parameter |
getParameter(String paramName)
Returns the parameter with the given name.
|
List<Parameter> |
getParameters()
Returns the list of parameters that can be altered for this learner.
|
int |
getSplittingAttribute()
Returns the attribute that this stump has decided to use to compute
results.
|
protected int |
numCategorical() |
protected int |
numNumeric() |
double |
regress(DataPoint data) |
CategoricalResults |
result(int i)
Returns the categorical result of the i'th path.
|
void |
setGainMethod(ImpurityScore.ImpurityMeasure gainMethod) |
void |
setMinResultSplitSize(int minResultSplitSize)
When a split is made, it may be that outliers cause the split to
segregate a minority of points from the majority.
|
void |
setPredicting(CategoricalData predicting)
Sets the DecisionStump's predicting information.
|
void |
setRemoveContinuousAttributes(boolean removeContinuousAttributes)
Unlike categorical values, when a continuous attribute is selected to split on, not
all values of the attribute become the same.
|
boolean |
supportsWeightedData()
Indicates whether the model knows how to train using weighted data points.
|
void |
train(RegressionDataSet dataSet) |
void |
train(RegressionDataSet dataSet,
ExecutorService threadPool) |
void |
trainC(ClassificationDataSet dataSet)
Trains the classifier and constructs a model for classification using the
given data set.
|
void |
trainC(ClassificationDataSet dataSet,
ExecutorService threadPool)
Trains the classifier and constructs a model for classification using the
given data set.
|
List<List<DataPointPair<Integer>>> |
trainC(List<DataPointPair<Integer>> dataPoints,
Set<Integer> options)
This is a helper function that does the work of training this stump.
|
List<List<DataPointPair<Integer>>> |
trainC(List<DataPointPair<Integer>> dataPoints,
Set<Integer> options,
ExecutorService ex) |
List<List<DataPointPair<Double>>> |
trainR(List<DataPointPair<Double>> dataPoints,
Set<Integer> options) |
List<List<DataPointPair<Double>>> |
trainR(List<DataPointPair<Double>> dataPoints,
Set<Integer> options,
ExecutorService ex) |
int |
whichPath(DataPoint data)
Determines which split path this data point would follow from this decision stump.
|
public void setRemoveContinuousAttributes(boolean removeContinuousAttributes)
removeContinuousAttributes
- whether or not to remove continuous attributes on a call to trainC(java.util.List, java.util.Set)
public void setGainMethod(ImpurityScore.ImpurityMeasure gainMethod)
public ImpurityScore.ImpurityMeasure getGainMethod()
protected int numNumeric()
protected int numCategorical()
public void setMinResultSplitSize(int minResultSplitSize)
minResultSplitSize
- the minimum result split size to usepublic int getMinResultSplitSize()
public int getSplittingAttribute()
public void setPredicting(CategoricalData predicting)
trainC(jsat.classifiers.ClassificationDataSet)
or
trainC(jsat.classifiers.ClassificationDataSet, java.util.concurrent.ExecutorService)
,
but it must be called before using trainC(java.util.List, java.util.Set)
.predicting
- the information about the attribute that will be predicted by this classifierpublic void train(RegressionDataSet dataSet)
public void train(RegressionDataSet dataSet, ExecutorService threadPool)
protected double getGain(ImpurityScore origScore, List<List<DataPointPair<Integer>>> aSplit)
origScore
- the score of the unsplit setaSplit
- the splitting of the data pointspublic int whichPath(DataPoint data)
data
- the data point in questionpublic int getNumberOfPaths()
public CategoricalResults classify(DataPoint data)
Classifier
classify
in interface Classifier
data
- the data point to classifypublic CategoricalResults result(int i)
i
- the path to get the result forIndexOutOfBoundsException
- if an invalid path is givenNullPointerException
- if the stump has not been trained for classificationpublic void trainC(ClassificationDataSet dataSet, ExecutorService threadPool)
Classifier
trainC
in interface Classifier
dataSet
- the data set to train onthreadPool
- the source of threads to use.public void trainC(ClassificationDataSet dataSet)
Classifier
trainC
in interface Classifier
dataSet
- the data set to train onpublic List<List<DataPointPair<Integer>>> trainC(List<DataPointPair<Integer>> dataPoints, Set<Integer> options)
dataPoints
- the lists of datapoint to train on, paired with the true category of each training pointoptions
- the set of attributes that this classifier may choose from. The attribute it does choose will be removed from the set.public List<List<DataPointPair<Integer>>> trainC(List<DataPointPair<Integer>> dataPoints, Set<Integer> options, ExecutorService ex)
protected static <T> void distributMissing(List<List<DataPointPair<T>>> splits, List<DataPointPair<T>> hadMissing)
splits
- a list of lists, where each inner list is a splithadMissing
- the list of datapoints that had missing valuesprotected static <T> void distributMissing(List<List<DataPointPair<T>>> splits, double[] fracs, List<DataPointPair<T>> hadMissing)
splits
- a list of lists, where each inner list is a splitfracs
- the fraction of weight to each split, should sum to onehadMissing
- the list of datapoints that had missing valuespublic List<List<DataPointPair<Double>>> trainR(List<DataPointPair<Double>> dataPoints, Set<Integer> options)
public List<List<DataPointPair<Double>>> trainR(List<DataPointPair<Double>> dataPoints, Set<Integer> options, ExecutorService ex)
public boolean supportsWeightedData()
Classifier
supportsWeightedData
in interface Classifier
supportsWeightedData
in interface Regressor
public DecisionStump clone()
public List<Parameter> getParameters()
Parameterized
getParameters
in interface Parameterized
public Parameter getParameter(String paramName)
Parameterized
getParameter
in interface Parameterized
paramName
- the name of the parameter to obtainCopyright © 2017. All rights reserved.