public class Adam extends Object implements GradientUpdater
RMSProp
and AdaGrad
, where the former can
be seen as a special case of Adam. Adam has been shown to work well in
training neural networks, and still converges well with sparse gradients.Modifier and Type | Field and Description |
---|---|
static double |
DEFAULT_ALPHA |
static double |
DEFAULT_BETA_1 |
static double |
DEFAULT_BETA_2 |
static double |
DEFAULT_EPS |
static double |
DEFAULT_LAMBDA |
Constructor and Description |
---|
Adam() |
Adam(Adam toCopy)
Copy constructor
|
Adam(double alpha,
double beta_1,
double beta_2,
double eps,
double lambda) |
Modifier and Type | Method and Description |
---|---|
Adam |
clone() |
void |
setup(int d)
Sets up this updater to update a weight vector of dimension
d
by a gradient of the same dimension |
void |
update(Vec x,
Vec grad,
double eta)
Updates the weight vector
x such that x = x-ηf(grad),
where f(grad) is some function on the gradient that effectively returns a
new vector. |
double |
update(Vec x,
Vec grad,
double eta,
double bias,
double biasGrad)
Updates the weight vector
x such that x = x-ηf(grad),
where f(grad) is some function on the gradient that effectively returns a
new vector. |
public static final double DEFAULT_ALPHA
public static final double DEFAULT_BETA_1
public static final double DEFAULT_BETA_2
public static final double DEFAULT_EPS
public static final double DEFAULT_LAMBDA
public Adam()
public Adam(double alpha, double beta_1, double beta_2, double eps, double lambda)
public Adam(Adam toCopy)
toCopy
- the object to copypublic void update(Vec x, Vec grad, double eta)
GradientUpdater
x
such that x = x-ηf(grad),
where f(grad) is some function on the gradient that effectively returns a
new vector. It is not necessary for the internal implementation to ever
explicitly form any of these objects, so long as x
is mutated to
have the correct result.update
in interface GradientUpdater
x
- the vector to mutate such that is has been updated by the
gradientgrad
- the gradient to update the weight vector x
frometa
- the learning rate to applypublic double update(Vec x, Vec grad, double eta, double bias, double biasGrad)
GradientUpdater
x
such that x = x-ηf(grad),
where f(grad) is some function on the gradient that effectively returns a
new vector. It is not necessary for the internal implementation to ever
explicitly form any of these objects, so long as x
is mutated to
have the correct result. update
in interface GradientUpdater
x
- the vector to mutate such that is has been updated by the
gradientgrad
- the gradient to update the weight vector x
frometa
- the learning rate to applybias
- the bias term of the vectorbiasGrad
- the gradient for the bias termbias = bias - returnValue
public Adam clone()
clone
in interface GradientUpdater
clone
in class Object
public void setup(int d)
GradientUpdater
d
by a gradient of the same dimensionsetup
in interface GradientUpdater
d
- the dimension of the weight vector that will be updatedCopyright © 2017. All rights reserved.