NewGLMNET (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.classifiers.linear.NewGLMNET

All Implemented Interfaces:

Serializable, Cloneable, Classifier, WarmClassifier, Parameterized, SimpleWeightVectorModel, SingleWeightVectorModel
```
public class NewGLMNET
extends Object
implements WarmClassifier, Parameterized, SingleWeightVectorModel
```
NewGLMNET is a batch method for solving Elastic Net regularized Logistic Regression problems of the form
0.5 * (1-α) ||w||₂ + α * ||w||₁ + C * ∑^N_i=1 ℓ (w^T x_i + b, y_i).

For α = 1, this becomes pure Lasso / L₁ regularized Logistic Regression. For α = 0, this becomes pure Ridge/ L₂ regularized Logistic Regression, however better solvers such as LogisticRegressionDCD are faster if using α = 0.
The default behavior is to use α=1, and includes the bias term. Including the bias term can take longer to train, but can also increase sparsity for some problems.

This algorithm can be warm started from any classifier implementing the SingleWeightVectorModel interface.

See:
- Yuan, G., Ho, C.-H.,&Lin, C. (2012). An improved GLMNET for L1-regularized logistic regression. Journal of Machine Learning Research, 13, 1999–2030. doi:10.1145/2020408.2020421
- King, R., Morgan, B. J. T., Gimenez, O., Brooks, S. P., Crc, H.,&Raton, B. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 36(1), 1–22.
- Zou, H.,&Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67(2), 301–320. doi:10.1111/j.1467-9868.2005.00503.x
Author:

Edward Raff

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static double`	`DEFAULT_EPS` The default tolerance for training is 0.01.
`static int`	`DEFAULT_MAX_OUTER_ITER` The default number of outer iterations of the training algorithm is 100 .

Constructor Summary

Constructors
Modifier	Constructor and Description
	`NewGLMNET()` Creates a new L₁ regularized Logistic Regression solver with C = 1.
	`NewGLMNET(double C)` Creates a new L₁ regularized Logistic Regression solver
	`NewGLMNET(double C, double alpha)` Creates a new Elastic Net regularized Logistic Regression solver
`protected`	`NewGLMNET(NewGLMNET toCopy)` Copy constructor

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`CategoricalResults`	`classify(DataPoint data)` Performs classification on the given data point.
`NewGLMNET`	`clone()`
`double`	`getAlpha()`
`double`	`getBias()` Returns the bias term used for the model, or 0 of the model does not support or was not trained with a bias term.
`double`	`getBias(int index)` Returns the bias term used with the weight vector for the given class index.
`double`	`getC()`
`int`	`getMaxIters()`
`Parameter`	`getParameter(String paramName)` Returns the parameter with the given name.
`List<Parameter>`	`getParameters()` Returns the list of parameters that can be altered for this learner.
`Vec`	`getRawWeight()` Returns the only weight vector used for the model
`Vec`	`getRawWeight(int index)` Returns the raw weight vector associated with the given class index.
`double`	`getTolerance()`
`static Distribution`	`guessAlpha(DataSet d)` Guess the distribution to use for the trade off term term `(double) α` in Elastic Net regularization.
`static Distribution`	`guessC(DataSet d)` Guess the distribution to use for the regularization term `C` in Logistic Regression.
`boolean`	`isUseBias()`
`int`	`numWeightsVecs()` Returns the number of weight vectors that can be returned.
`void`	`setAlpha(double alpha)` Using α = 1 corresponds to pure L₁ regularization, and α = 0 corresponds to pure L₂ regularization.
`void`	`setC(double C)` Sets the regularization term, where smaller values indicate a larger regularization penalty.
`void`	`setMaxIters(int maxOuterIters)` Sets the maximum number of training iterations for the algorithm, specifically the outer loop as mentioned in the original paper.
`void`	`setTolerance(double e_out)` Sets the tolerance parameter for convergence.
`void`	`setUseBias(boolean useBias)` Controls whether or not an un-regularized bias term is added to the model.
`boolean`	`supportsWeightedData()` Indicates whether the model knows how to train using weighted data points.
`void`	`trainC(ClassificationDataSet dataSet)` Trains the classifier and constructs a model for classification using the given data set.
`void`	`trainC(ClassificationDataSet dataSet, Classifier warmSolution)` Trains the classifier and constructs a model for classification using the given data set.
`void`	`trainC(ClassificationDataSet dataSet, Classifier warmSolution, ExecutorService threadPool)` Trains the classifier and constructs a model for classification using the given data set.
`void`	`trainC(ClassificationDataSet dataSet, ExecutorService threadPool)` Trains the classifier and constructs a model for classification using the given data set.
`boolean`	`warmFromSameDataOnly()` Some models can only be warm started from a solution trained on the exact same data set as the model it is warm starting from.

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_EPS
```
public static final double DEFAULT_EPS
```
    The default tolerance for training is 0.01.
    
    See Also:
    
    Constant Field Values
  - DEFAULT_MAX_OUTER_ITER
```
public static final int DEFAULT_MAX_OUTER_ITER
```
    The default number of outer iterations of the training algorithm is 100 .
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - NewGLMNET
```
public NewGLMNET()
```
    Creates a new L₁ regularized Logistic Regression solver with C = 1.
  - NewGLMNET
```
public NewGLMNET(double C)
```
    Creates a new L₁ regularized Logistic Regression solver
    
    Parameters:
    
    C - the regularization term
  - NewGLMNET
```
public NewGLMNET(double C,
                 double alpha)
```
    Creates a new Elastic Net regularized Logistic Regression solver
    
    Parameters:
    
    C - the regularization term
    
    alpha - the fraction of weight (in [0, 1]) to apply to L₁ regularization instead of L₂ regularization.
  - NewGLMNET
```
protected NewGLMNET(NewGLMNET toCopy)
```
    Copy constructor
    
    Parameters:
    
    toCopy - the object to copy
- Method Detail
  - setC
```
public void setC(double C)
```
    Sets the regularization term, where smaller values indicate a larger regularization penalty.
    
    Parameters:
    
    C - the positive regularization term
  - getC
```
public double getC()
```
    Returns:
    
    the regularization term
  - setAlpha
```
public void setAlpha(double alpha)
```
    Using α = 1 corresponds to pure L₁ regularization, and α = 0 corresponds to pure L₂ regularization. Any value in-between is then an Elastic Net regularization.
    
    Parameters:
    
    alpha - the value in [0, 1] for determining the regularization penalty's interpolation between pure L₂ and L₁ regularization.
  - getAlpha
```
public double getAlpha()
```
    Returns:
    
    the fraction of weight (in [0, 1]) to apply to L₁ regularization instead of L₂ regularization.
  - setMaxIters
```
public void setMaxIters(int maxOuterIters)
```
    Sets the maximum number of training iterations for the algorithm, specifically the outer loop as mentioned in the original paper. 100 is the default value used, and may need to be increased for more difficult problems.
    
    Parameters:
    
    maxOuterIters - the maximum number of outer iterations
  - getMaxIters
```
public int getMaxIters()
```
    Returns:
    
    the maximum number of training iterations
  - setTolerance
```
public void setTolerance(double e_out)
```
    Sets the tolerance parameter for convergence. Smaller values will be more exact, but larger values will converge faster. The default value is fairly exact at 0.01, increasing it by an order of magnitude can often be done without hurting accuracy.
    
    Parameters:
    
    e_out - the tolerance parameter.
  - getTolerance
```
public double getTolerance()
```
    Returns:
    
    the convergence tolerance parameter
  - setUseBias
```
public void setUseBias(boolean useBias)
```
    Controls whether or not an un-regularized bias term is added to the model. Using a bias term can increase runtime, especially in sparse data sets, as each data point will have work done for the implicit bias term. However the bias term is usually needed for small dimension problems, and can improve the sparsity of the solution for higher dimensional problems.
    
    Parameters:
    
    useBias - true if an un-regularized bias term should be used or false to not use any bias term.
  - isUseBias
```
public boolean isUseBias()
```
    Returns:
    
    true if an un-regularized bias term will be used or false to not use any bias term.
  - classify
```
public CategoricalResults classify(DataPoint data)
```
    Description copied from interface: Classifier
    
    Performs classification on the given data point.
    
    Specified by:
    
    classify in interface Classifier
    
    Parameters:
    
    data - the data point to classify
    
    Returns:
    
    the results of the classification.
  - trainC
```
public void trainC(ClassificationDataSet dataSet,
                   ExecutorService threadPool)
```
    Description copied from interface: Classifier
    
    Trains the classifier and constructs a model for classification using the given data set. If the training method knows how, it will used the threadPool to conduct training in parallel. This method will block until the training has completed.
    
    Specified by:
    
    trainC in interface Classifier
    
    Parameters:
    
    dataSet - the data set to train on
    
    threadPool - the source of threads to use.
  - trainC
```
public void trainC(ClassificationDataSet dataSet,
                   Classifier warmSolution,
                   ExecutorService threadPool)
```
    Description copied from interface: WarmClassifier
    
    Trains the classifier and constructs a model for classification using the given data set. If the training method knows how, it will used the threadPool to conduct training in parallel. This method will block until the training has completed.
    
    Specified by:
    
    trainC in interface WarmClassifier
    
    Parameters:
    
    dataSet - the data set to train on
    
    warmSolution - the solution to use to warm start this model
    
    threadPool - the source of threads to use.
  - trainC
```
public void trainC(ClassificationDataSet dataSet,
                   Classifier warmSolution)
```
    Description copied from interface: WarmClassifier
    
    Trains the classifier and constructs a model for classification using the given data set.
    
    Specified by:
    
    trainC in interface WarmClassifier
    
    Parameters:
    
    dataSet - the data set to train on
    
    warmSolution - the solution to use to warm start this model
  - trainC
```
public void trainC(ClassificationDataSet dataSet)
```
    Description copied from interface: Classifier
    
    Trains the classifier and constructs a model for classification using the given data set.
    
    Specified by:
    
    trainC in interface Classifier
    
    Parameters:
    
    dataSet - the data set to train on
  - supportsWeightedData
```
public boolean supportsWeightedData()
```
    Description copied from interface: Classifier
    
    Indicates whether the model knows how to train using weighted data points. If it does, the model will train assuming the weights. The values returned by this method may change depending on the parameters set for the model.
    
    Specified by:
    
    supportsWeightedData in interface Classifier
    
    Returns:
    
    true if the model supports weighted data, false otherwise
  - clone
```
public NewGLMNET clone()
```
    Specified by:
    
    clone in interface Classifier
    
    Overrides:
    
    clone in class Object
  - getParameters
```
public List<Parameter> getParameters()
```
    Description copied from interface: Parameterized
    
    Returns the list of parameters that can be altered for this learner.
    
    Specified by:
    
    getParameters in interface Parameterized
    
    Returns:
    
    the list of parameters that can be altered for this learner.
  - getParameter
```
public Parameter getParameter(String paramName)
```
    Description copied from interface: Parameterized
    
    Returns the parameter with the given name. Two different strings may map to a single Parameter object. An ASCII only string, and a Unicode style string.
    
    Specified by:
    
    getParameter in interface Parameterized
    
    Parameters:
    
    paramName - the name of the parameter to obtain
    
    Returns:
    
    the Parameter in question, or null if no such named Parameter exists.
  - getRawWeight
```
public Vec getRawWeight()
```
    Description copied from interface: SingleWeightVectorModel
    
    Returns the only weight vector used for the model
    
    Specified by:
    
    getRawWeight in interface SingleWeightVectorModel
    
    Returns:
    
    the only weight vector used for the model
  - getBias
```
public double getBias()
```
    Description copied from interface: SingleWeightVectorModel
    
    Returns the bias term used for the model, or 0 of the model does not support or was not trained with a bias term.
    
    Specified by:
    
    getBias in interface SingleWeightVectorModel
    
    Returns:
    
    the bias term for the model
  - getRawWeight
```
public Vec getRawWeight(int index)
```
    Description copied from interface: SimpleWeightVectorModel
    
    Returns the raw weight vector associated with the given class index. If the given class is an implicit zero vector, a ConstantVector object may be returned.
    Do not alter the returned weight vector, as it will change the model's values.
    
    If a regression problem, only index = 0 should be used
    
    Specified by:
    
    getRawWeight in interface SimpleWeightVectorModel
    
    Parameters:
    
    index - the class index to get the weight vector for
    
    Returns:
    
    the weight vector used for the specified class
  - getBias
```
public double getBias(int index)
```
    Description copied from interface: SimpleWeightVectorModel
    
    Returns the bias term used with the weight vector for the given class index. If the model does not support or was not trained with bias weights, 0 will be returned.
    
    If a regression problem, only index = 0 should be used
    
    Specified by:
    
    getBias in interface SimpleWeightVectorModel
    
    Parameters:
    
    index - the class index to get the weight vector for
    
    Returns:
    
    the bias term for the specified class
  - numWeightsVecs
```
public int numWeightsVecs()
```
    Description copied from interface: SimpleWeightVectorModel
    
    Returns the number of weight vectors that can be returned. For binary classification problems the value may be 1 if only a single weight vector's sign is used to determine the class. For multi-class problems, the weight vector count includes the implicit zero vector (if one is being used).
    
    Specified by:
    
    numWeightsVecs in interface SimpleWeightVectorModel
    
    Returns:
    
    the number of weight vectors for which SimpleWeightVectorModel.getRawWeight(int) can be called.
  - warmFromSameDataOnly
```
public boolean warmFromSameDataOnly()
```
    Description copied from interface: WarmClassifier
    
    Some models can only be warm started from a solution trained on the exact same data set as the model it is warm starting from. If this is the case true will be returned. The behavior for training on a different data set when this is defined is undefined. It may cause an error, or it may cause the algorithm to take longer or reach a worse solution.
    When true, it is important that the data set be unaltered - this includes mutating the values stored or re-arranging the data points within the data set.
    
    Specified by:
    
    warmFromSameDataOnly in interface WarmClassifier
    
    Returns:
    
    true if the algorithm can only be warm started from the model trained on the exact same data set.
  - guessAlpha
```
public static Distribution guessAlpha(DataSet d)
```
    Guess the distribution to use for the trade off term term (double) α in Elastic Net regularization.
    
    Parameters:
    
    d - the data set to get the guess for
    
    Returns:
    
    the guess for the α parameter
  - guessC
```
public static Distribution guessC(DataSet d)
```
    Guess the distribution to use for the regularization term C in Logistic Regression.
    
    Parameters:
    
    d - the data set to get the guess for
    
    Returns:
    
    the guess for the C parameter

Class NewGLMNET

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_EPS

DEFAULT_MAX_OUTER_ITER

Constructor Detail

NewGLMNET

NewGLMNET

NewGLMNET

NewGLMNET

Method Detail

setC

getC

setAlpha

getAlpha

setMaxIters

getMaxIters

setTolerance

getTolerance

setUseBias

isUseBias

classify

trainC

trainC

trainC

trainC

supportsWeightedData

clone

getParameters

getParameter

getRawWeight

getBias

getRawWeight

getBias

numWeightsVecs

warmFromSameDataOnly

guessAlpha

guessC