public class OnlineLDAsvi extends Object implements Parameterized
α
=
η
= 1/K, where K is the number of topics to be
learned. Note that η is not a learning rate parameter, as the symbol is
usually used. batch size
, κ
,
and τ0
are:batch size | 256 | 1024 | 4096 |
κ | 0.6 | 0.5 | 0.5 |
τ0 | 1024 | 256 | 64 |
Constructor and Description |
---|
OnlineLDAsvi()
Creates a new Online LDA learner.
|
OnlineLDAsvi(int K,
int D,
int W)
Creates a new Online LDA learner that is ready for online updates
|
Modifier and Type | Method and Description |
---|---|
double |
getAlpha() |
int |
getD()
Returns the approximate number of documents that will be observed, or
-1 if this object is not ready to learn |
int |
getEpochs()
Returns the number of training iterations over the data set that will be
used
|
double |
getEta() |
int |
getK()
Returns the number of topics to learn, or
-1 if this
object is not ready to learn |
double |
getKappa() |
Parameter |
getParameter(String paramName)
Returns the parameter with the given name.
|
List<Parameter> |
getParameters()
Returns the list of parameters that can be altered for this learner.
|
Vec |
getTopics(Vec doc)
Computes the topic distribution for the given document.
Note that the returned vector will be dense, but many of the values may be very nearly zero. |
Vec |
getTopicVec(int k)
Returns the topic vector for a given topic.
|
int |
getVocabSize()
Returns the size of the vocabulary for LDA, or
-1 if this
object is not ready to learn |
void |
model(DataSet dataSet,
int topics)
Fits the LDA model against the given data set
|
void |
model(DataSet dataSet,
int topics,
ExecutorService ex)
Fits the LDA model against the given data set
|
void |
setAlpha(double alpha)
Sets the prior for the on weight vector theta.
|
void |
setD(int D)
Sets the approximate number of documents that will be observed
|
void |
setEpochs(int epochs)
Sets the number of training epochs when learning in a "batch" setting
|
void |
setEta(double eta)
Prior on topics.
|
void |
setK(int K)
Sets the number of topics that LDA will try to learn
|
void |
setKappa(double kappa)
The "forgetfulness" factor in the learning rate.
|
void |
setMiniBatchSize(int miniBatchSize)
Sets the number of data points used at a time to perform one update of
the model parameters
|
void |
setTau0(double tau0)
A learning rate constant to control the influence of early iterations on
the solution.
|
void |
setVocabSize(int W)
Sets the vocabulary size for LDA, which is the number of dimensions in
the input feature vectors.
|
void |
update(List<Vec> docs)
Performs an update of the LDA topic distributions based on the given
mini-batch of documents.
|
void |
update(List<Vec> docs,
ExecutorService ex)
Performs an update of the LDA topic distribution based on the given
mini-batch of documents.
|
public OnlineLDAsvi()
topics
, expected number of
documents
, and the vocabulary size
must be set before it can be used.public OnlineLDAsvi(int K, int D, int W)
K
- the number of topics to learnD
- the expected number of documents to seeW
- the vocabulary sizepublic void setK(int K)
K
- the number of topics to learnpublic int getK()
-1
if this
object is not ready to learnpublic void setD(int D)
D
- the number of documents that will be observedpublic int getD()
-1
if this object is not ready to learnpublic void setVocabSize(int W)
W
- the vocabulary size for LDApublic int getVocabSize()
-1
if this
object is not ready to learnpublic void setAlpha(double alpha)
K
is
a common choice.alpha
- the positive prior valuepublic double getAlpha()
public void setEta(double eta)
K
is a common choice.eta
- the positive prior for topicspublic double getEta()
public void setTau0(double tau0)
tau0
- a learning rate parameter that must be greater than 0 (usually at least 1)public void setEpochs(int epochs)
epochs
- the number of iterations to go over the data setpublic int getEpochs()
public void setKappa(double kappa)
kappa
- the forgetfulness factor in [0.5, 1]public double getKappa()
public void setMiniBatchSize(int miniBatchSize)
miniBatchSize
- the batch size to usepublic Vec getTopicVec(int k)
k
- the topic to get the vector forpublic void update(List<Vec> docs)
docs
- the list of document vectors to update frompublic void update(List<Vec> docs, ExecutorService ex)
docs
- the list of document vectors to update fromex
- the source of threads for parallel executionpublic void model(DataSet dataSet, int topics)
dataSet
- the data set to learn a topic model fortopics
- the number of topics to learnpublic void model(DataSet dataSet, int topics, ExecutorService ex)
dataSet
- the data set to learn a topic model fortopics
- the number of topics to learnex
- the source of threads for parallel executionpublic Vec getTopics(Vec doc)
doc
- the document to find the topics forpublic List<Parameter> getParameters()
Parameterized
getParameters
in interface Parameterized
public Parameter getParameter(String paramName)
Parameterized
getParameter
in interface Parameterized
paramName
- the name of the parameter to obtainCopyright © 2017. All rights reserved.