public class EMGaussianMixture extends KClustererBase implements MultivariateDistribution
Modifier and Type | Field and Description |
---|---|
protected int |
MaxIterLimit
Control the maximum number of iterations to perform.
|
Constructor and Description |
---|
EMGaussianMixture() |
EMGaussianMixture(EMGaussianMixture gm)
Copy constructor.
|
EMGaussianMixture(SeedSelectionMethods.SeedSelection seedSelection) |
Modifier and Type | Method and Description |
---|---|
EMGaussianMixture |
clone() |
int[] |
cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations)
Performs clustering on the given data set.
|
int[] |
cluster(DataSet dataSet,
int[] designations)
Performs clustering on the given data set.
|
int[] |
cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
protected double |
cluster(DataSet dataSet,
List<Double> accelCache,
int K,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError) |
protected double |
clusterCompute(int K,
DataSet dataSet,
int[] assignment,
List<Vec> means,
List<Matrix> covs,
ExecutorService execServ) |
int |
getIterationLimit()
Returns the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
|
SeedSelectionMethods.SeedSelection |
getSeedSelection() |
double |
logPdf(double... x)
Computes the log of the probability density function.
|
double |
logPdf(Vec x)
Computes the log of the probability density function.
|
double |
pdf(double... x)
Returns the probability of a given vector from this distribution.
|
double |
pdf(Vec x)
Returns the probability of a given vector from this distribution.
|
List<Vec> |
sample(int count,
Random rand)
Performs sampling on the current distribution.
|
void |
setIterationLimit(int iterLimit)
Sets the maximum number of iterations allowed
|
void |
setSeedSelection(SeedSelectionMethods.SeedSelection seedSelection)
Sets the method of seed selection to use for this algorithm.
|
boolean |
setUsingData(DataSet dataSet)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
boolean |
setUsingData(DataSet dataSet,
ExecutorService threadpool)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
<V extends Vec> |
setUsingData(List<V> dataSet)
Sets the parameters of the distribution to attempt to fit the given list of vectors.
|
<V extends Vec> |
setUsingData(List<V> dataSet,
ExecutorService threadpool)
Sets the parameters of the distribution to attempt to fit the given list of vectors.
|
boolean |
setUsingDataList(List<DataPoint> dataPoint)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
boolean |
setUsingDataList(List<DataPoint> dataPoints,
ExecutorService threadpool)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
cluster, cluster, cluster, cluster
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster, supportsWeightedData
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
cluster, cluster, supportsWeightedData
protected int MaxIterLimit
public EMGaussianMixture(SeedSelectionMethods.SeedSelection seedSelection)
public EMGaussianMixture()
public EMGaussianMixture(EMGaussianMixture gm)
gm
- the Guassian Mixture to duplicatepublic void setSeedSelection(SeedSelectionMethods.SeedSelection seedSelection)
SeedSelectionMethods.SeedSelection.KPP
is recommended for this algorithm in
particular.seedSelection
- the method of seed selection to usepublic SeedSelectionMethods.SeedSelection getSeedSelection()
public void setIterationLimit(int iterLimit)
iterLimit
- the maximum number of iterations of the ElkanKMeans algorithmpublic int getIterationLimit()
protected double cluster(DataSet dataSet, List<Double> accelCache, int K, List<Vec> means, int[] assignment, boolean exactTotal, ExecutorService threadpool, boolean returnError)
protected double clusterCompute(int K, DataSet dataSet, int[] assignment, List<Vec> means, List<Matrix> covs, ExecutorService execServ)
public double logPdf(double... x)
MultivariateDistribution
Double.NEGATIVE_INFINITY
. Instead, -Double.MAX_VALUE
is returned.logPdf
in interface MultivariateDistribution
x
- the array for the vector the get the log probability ofpublic double logPdf(Vec x)
MultivariateDistribution
Double.NEGATIVE_INFINITY
. Instead, -Double.MAX_VALUE
is returned.logPdf
in interface MultivariateDistribution
x
- the vector the get the log probability ofpublic double pdf(double... x)
MultivariateDistribution
pdf
in interface MultivariateDistribution
x
- the array of the vector the get the log probability ofpublic double pdf(Vec x)
MultivariateDistribution
pdf
in interface MultivariateDistribution
x
- the vector the get the log probability ofpublic <V extends Vec> boolean setUsingData(List<V> dataSet)
MultivariateDistribution
setUsingData
in interface MultivariateDistribution
V
- the vector typedataSet
- the list of data pointspublic boolean setUsingDataList(List<DataPoint> dataPoint)
MultivariateDistribution
weights
of the data points will be used.setUsingDataList
in interface MultivariateDistribution
dataPoint
- the list of data points to usepublic boolean setUsingData(DataSet dataSet)
MultivariateDistribution
weights
of the data points will be used.setUsingData
in interface MultivariateDistribution
dataSet
- the data set to usepublic boolean setUsingData(DataSet dataSet, ExecutorService threadpool)
MultivariateDistribution
weights
of the data points will be used.setUsingData
in interface MultivariateDistribution
dataSet
- the data set to usethreadpool
- the source of threads for computationpublic <V extends Vec> boolean setUsingData(List<V> dataSet, ExecutorService threadpool)
MultivariateDistribution
setUsingData
in interface MultivariateDistribution
V
- the vector typedataSet
- the list of data pointsthreadpool
- the source of threads for computationpublic boolean setUsingDataList(List<DataPoint> dataPoints, ExecutorService threadpool)
MultivariateDistribution
weights
of the data points will be used.setUsingDataList
in interface MultivariateDistribution
dataPoints
- the list of data points to usethreadpool
- the source of threads for computationpublic EMGaussianMixture clone()
clone
in interface Clusterer
clone
in interface KClusterer
clone
in interface MultivariateDistribution
clone
in class KClustererBase
public List<Vec> sample(int count, Random rand)
MultivariateDistribution
sample
in interface MultivariateDistribution
count
- the number of iid samples to drawrand
- the source of randomnesspublic int[] cluster(DataSet dataSet, int[] designations)
Clusterer
cluster
in interface Clusterer
dataSet
- the data set to perform clustering ondesignations
- the array which will contain the designated values. The array will be altered and returned by
the function. If null is given, a new array will be created and returned.public int[] cluster(DataSet dataSet, ExecutorService threadpool, int[] designations)
Clusterer
cluster
in interface Clusterer
dataSet
- the data set to perform clustering onthreadpool
- a source of threads to run tasksdesignations
- the array which will contain the designated values. The array will be altered and returned by
the function. If null is given, a new array will be created and returned.public int[] cluster(DataSet dataSet, int clusters, ExecutorService threadpool, int[] designations)
cluster
in interface KClusterer
public int[] cluster(DataSet dataSet, int clusters, int[] designations)
cluster
in interface KClusterer
public int[] cluster(DataSet dataSet, int lowK, int highK, ExecutorService threadpool, int[] designations)
cluster
in interface KClusterer
public int[] cluster(DataSet dataSet, int lowK, int highK, int[] designations)
cluster
in interface KClusterer
Copyright © 2017. All rights reserved.