DecisionStump (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.classifiers.trees.DecisionStump

All Implemented Interfaces:

Serializable, Cloneable, Classifier, Parameterized, Regressor
```
public class DecisionStump
extends Object
implements Classifier, Regressor, Parameterized
```
This class is a 1-rule. It creates one rule that is used to classify all inputs, making it a decision tree with only one node. It can be used as a weak learner for ensemble learners, or as the nodes in a true decision tree.

Categorical values are handled similarly under all circumstances.
During classification, numeric attributes are separated based on most likely probability into their classes.
During regression, numeric attributes are done with only binary splits, finding the split that minimizes the total squared error sum.

The Decision Stump supports missing values in training and prediction.

Author:

Edward Raff

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

protected double[] pathRatio
How much of the data went to each path

Fields
Modifier and Type	Field and Description
`protected double[]`	`pathRatio` How much of the data went to each path

Constructor Summary

Constructors
Constructor and Description

DecisionStump()
Creates a new decision stump

Constructors
Constructor and Description
`DecisionStump()` Creates a new decision stump

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`CategoricalResults`	`classify(DataPoint data)` Performs classification on the given data point.
`DecisionStump`	`clone()`
`protected static <T> void`	`distributMissing(List<List<DataPointPair<T>>> splits, double[] fracs, List<DataPointPair<T>> hadMissing)` Distributes a list of datapoints that had missing values to each split, re-weighted by the indicated fractions
`protected static <T> void`	`distributMissing(List<List<DataPointPair<T>>> splits, List<DataPointPair<T>> hadMissing)` Distributes a list of datapoints that had missing values to each split, re-weighted by the indicated fractions
`protected double`	`getGain(ImpurityScore origScore, List<List<DataPointPair<Integer>>> aSplit)` From the score for the original set that is being split, this computes the gain as the improvement in classification from the original split.
`ImpurityScore.ImpurityMeasure`	`getGainMethod()`
`int`	`getMinResultSplitSize()` Returns the minimum result split size that may be considered for use as the attribute to split on.
`int`	`getNumberOfPaths()` Returns the number of paths that this decision stump leads to.
`Parameter`	`getParameter(String paramName)` Returns the parameter with the given name.
`List<Parameter>`	`getParameters()` Returns the list of parameters that can be altered for this learner.
`int`	`getSplittingAttribute()` Returns the attribute that this stump has decided to use to compute results.
`protected int`	`numCategorical()`
`protected int`	`numNumeric()`
`double`	`regress(DataPoint data)`
`CategoricalResults`	`result(int i)` Returns the categorical result of the i'th path.
`void`	`setGainMethod(ImpurityScore.ImpurityMeasure gainMethod)`
`void`	`setMinResultSplitSize(int minResultSplitSize)` When a split is made, it may be that outliers cause the split to segregate a minority of points from the majority.
`void`	`setPredicting(CategoricalData predicting)` Sets the DecisionStump's predicting information.
`void`	`setRemoveContinuousAttributes(boolean removeContinuousAttributes)` Unlike categorical values, when a continuous attribute is selected to split on, not all values of the attribute become the same.
`boolean`	`supportsWeightedData()` Indicates whether the model knows how to train using weighted data points.
`void`	`train(RegressionDataSet dataSet)`
`void`	`train(RegressionDataSet dataSet, ExecutorService threadPool)`
`void`	`trainC(ClassificationDataSet dataSet)` Trains the classifier and constructs a model for classification using the given data set.
`void`	`trainC(ClassificationDataSet dataSet, ExecutorService threadPool)` Trains the classifier and constructs a model for classification using the given data set.
`List<List<DataPointPair<Integer>>>`	`trainC(List<DataPointPair<Integer>> dataPoints, Set<Integer> options)` This is a helper function that does the work of training this stump.
`List<List<DataPointPair<Integer>>>`	`trainC(List<DataPointPair<Integer>> dataPoints, Set<Integer> options, ExecutorService ex)`
`List<List<DataPointPair<Double>>>`	`trainR(List<DataPointPair<Double>> dataPoints, Set<Integer> options)`
`List<List<DataPointPair<Double>>>`	`trainR(List<DataPointPair<Double>> dataPoints, Set<Integer> options, ExecutorService ex)`
`int`	`whichPath(DataPoint data)` Determines which split path this data point would follow from this decision stump.

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - pathRatio
```
protected double[] pathRatio
```
    How much of the data went to each path
- Constructor Detail
  - DecisionStump
```
public DecisionStump()
```
    Creates a new decision stump
- Method Detail
  - setRemoveContinuousAttributes
```
public void setRemoveContinuousAttributes(boolean removeContinuousAttributes)
```
    Unlike categorical values, when a continuous attribute is selected to split on, not all values of the attribute become the same. It can be useful to split on the same attribute multiple times. If set true, continuous attributes will be removed from the options list. Else, they will be left in the options list.
    
    Parameters:
    
    removeContinuousAttributes - whether or not to remove continuous attributes on a call to trainC(java.util.List, java.util.Set)
  - setGainMethod
```
public void setGainMethod(ImpurityScore.ImpurityMeasure gainMethod)
```
  - getGainMethod
```
public ImpurityScore.ImpurityMeasure getGainMethod()
```
  - numNumeric
```
protected int numNumeric()
```
    Returns:
    
    The number of numeric features in the dataset that this Stump was trained from
  - numCategorical
```
protected int numCategorical()
```
    Returns:
    
    the number of categorical features in the dataset that this Stump was trained from.
  - setMinResultSplitSize
```
public void setMinResultSplitSize(int minResultSplitSize)
```
    When a split is made, it may be that outliers cause the split to segregate a minority of points from the majority. The min result split size parameter specifies the minimum allowable number of points to end up in one of the splits for it to be admisible for consideration.
    
    Parameters:
    
    minResultSplitSize - the minimum result split size to use
  - getMinResultSplitSize
```
public int getMinResultSplitSize()
```
    Returns the minimum result split size that may be considered for use as the attribute to split on.
    
    Returns:
    
    the minimum result split size in use
  - getSplittingAttribute
```
public int getSplittingAttribute()
```
    Returns the attribute that this stump has decided to use to compute results. Numeric features start from 0, and categorical features start from the number of numeric features.
    
    Returns:
    
    the attribute that this stump has decided to use to compute results.
  - setPredicting
```
public void setPredicting(CategoricalData predicting)
```
    Sets the DecisionStump's predicting information. This will be set automatically by calling trainC(jsat.classifiers.ClassificationDataSet) or trainC(jsat.classifiers.ClassificationDataSet, java.util.concurrent.ExecutorService), but it must be called before using trainC(java.util.List, java.util.Set).
    
    Parameters:
    
    predicting - the information about the attribute that will be predicted by this classifier
  - regress
```
public double regress(DataPoint data)
```
    Specified by:
    
    regress in interface Regressor
  - train
```
public void train(RegressionDataSet dataSet)
```
    Specified by:
    
    train in interface Regressor
  - train
```
public void train(RegressionDataSet dataSet,
                  ExecutorService threadPool)
```
    Specified by:
    
    train in interface Regressor
  - getGain
```
protected double getGain(ImpurityScore origScore,
                         List<List<DataPointPair<Integer>>> aSplit)
```
    From the score for the original set that is being split, this computes the gain as the improvement in classification from the original split.
    
    Parameters:
    
    origScore - the score of the unsplit set
    
    aSplit - the splitting of the data points
    
    Returns:
    
    the gain score for this split
  - whichPath
```
public int whichPath(DataPoint data)
```
    Determines which split path this data point would follow from this decision stump. Works for both classification and regression.
    
    Parameters:
    
    data - the data point in question
    
    Returns:
    
    the integer indicating which path to take. -1 returned if stump is not trained
  - getNumberOfPaths
```
public int getNumberOfPaths()
```
    Returns the number of paths that this decision stump leads to. The stump may not ever direct a data point on some of the paths. A result of 1 path means that all data points will be given the same decision, and is generated when the entropy of a set is 0.0.
    
    -1 is returned for an untrained stump
    
    Returns:
    
    the number of paths this decision stump has stored
  - classify
```
public CategoricalResults classify(DataPoint data)
```
    Description copied from interface: Classifier
    
    Performs classification on the given data point.
    
    Specified by:
    
    classify in interface Classifier
    
    Parameters:
    
    data - the data point to classify
    
    Returns:
    
    the results of the classification.
  - result
```
public CategoricalResults result(int i)
```
    Returns the categorical result of the i'th path.
    
    Parameters:
    
    i - the path to get the result for
    
    Returns:
    
    the result that would be returned if a data point went down the given path
    
    Throws:
    
    IndexOutOfBoundsException - if an invalid path is given
    
    NullPointerException - if the stump has not been trained for classification
  - trainC
```
public void trainC(ClassificationDataSet dataSet,
                   ExecutorService threadPool)
```
    Description copied from interface: Classifier
    
    Trains the classifier and constructs a model for classification using the given data set. If the training method knows how, it will used the threadPool to conduct training in parallel. This method will block until the training has completed.
    
    Specified by:
    
    trainC in interface Classifier
    
    Parameters:
    
    dataSet - the data set to train on
    
    threadPool - the source of threads to use.
  - trainC
```
public void trainC(ClassificationDataSet dataSet)
```
    Description copied from interface: Classifier
    
    Trains the classifier and constructs a model for classification using the given data set.
    
    Specified by:
    
    trainC in interface Classifier
    
    Parameters:
    
    dataSet - the data set to train on
  - trainC
```
public List<List<DataPointPair<Integer>>> trainC(List<DataPointPair<Integer>> dataPoints,
                                                 Set<Integer> options)
```
    This is a helper function that does the work of training this stump. It may be called directly by other classes that are creating decision trees to avoid redundant repackaging of lists.
    
    Parameters:
    
    dataPoints - the lists of datapoint to train on, paired with the true category of each training point
    
    options - the set of attributes that this classifier may choose from. The attribute it does choose will be removed from the set.
    
    Returns:
    
    the a list of lists, containing all the datapoints that would have followed each path. Useful for training a decision tree
  - trainC
```
public List<List<DataPointPair<Integer>>> trainC(List<DataPointPair<Integer>> dataPoints,
                                                 Set<Integer> options,
                                                 ExecutorService ex)
```
  - distributMissing
```
protected static <T> void distributMissing(List<List<DataPointPair<T>>> splits,
                                           List<DataPointPair<T>> hadMissing)
```
    Distributes a list of datapoints that had missing values to each split, re-weighted by the indicated fractions
    
    Parameters:
    
    splits - a list of lists, where each inner list is a split
    
    hadMissing - the list of datapoints that had missing values
  - distributMissing
```
protected static <T> void distributMissing(List<List<DataPointPair<T>>> splits,
                                           double[] fracs,
                                           List<DataPointPair<T>> hadMissing)
```
    Distributes a list of datapoints that had missing values to each split, re-weighted by the indicated fractions
    
    Parameters:
    
    splits - a list of lists, where each inner list is a split
    
    fracs - the fraction of weight to each split, should sum to one
    
    hadMissing - the list of datapoints that had missing values
  - trainR
```
public List<List<DataPointPair<Double>>> trainR(List<DataPointPair<Double>> dataPoints,
                                                Set<Integer> options)
```
  - trainR
```
public List<List<DataPointPair<Double>>> trainR(List<DataPointPair<Double>> dataPoints,
                                                Set<Integer> options,
                                                ExecutorService ex)
```
  - supportsWeightedData
```
public boolean supportsWeightedData()
```
    Description copied from interface: Classifier
    
    Indicates whether the model knows how to train using weighted data points. If it does, the model will train assuming the weights. The values returned by this method may change depending on the parameters set for the model.
    
    Specified by:
    
    supportsWeightedData in interface Classifier
    
    Specified by:
    
    supportsWeightedData in interface Regressor
    
    Returns:
    
    true if the model supports weighted data, false otherwise
  - clone
```
public DecisionStump clone()
```
    Specified by:
    
    clone in interface Classifier
    
    Specified by:
    
    clone in interface Regressor
    
    Overrides:
    
    clone in class Object
  - getParameters
```
public List<Parameter> getParameters()
```
    Description copied from interface: Parameterized
    
    Returns the list of parameters that can be altered for this learner.
    
    Specified by:
    
    getParameters in interface Parameterized
    
    Returns:
    
    the list of parameters that can be altered for this learner.
  - getParameter
```
public Parameter getParameter(String paramName)
```
    Description copied from interface: Parameterized
    
    Returns the parameter with the given name. Two different strings may map to a single Parameter object. An ASCII only string, and a Unicode style string.
    
    Specified by:
    
    getParameter in interface Parameterized
    
    Parameters:
    
    paramName - the name of the parameter to obtain
    
    Returns:
    
    the Parameter in question, or null if no such named Parameter exists.

Class DecisionStump

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

pathRatio

Constructor Detail

DecisionStump

Method Detail

setRemoveContinuousAttributes

setGainMethod

getGainMethod

numNumeric

numCategorical

setMinResultSplitSize

getMinResultSplitSize

getSplittingAttribute

setPredicting

regress

train

train

getGain

whichPath

getNumberOfPaths

classify

result

trainC

trainC

trainC

trainC

distributMissing

distributMissing

trainR

trainR

supportsWeightedData

clone

getParameters

getParameter