EMGaussianMixture (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.clustering.ClustererBase
- - jsat.clustering.KClustererBase
  - - jsat.clustering.EMGaussianMixture

All Implemented Interfaces:

Serializable, Cloneable, Clusterer, KClusterer, MultivariateDistribution
```
public class EMGaussianMixture
extends KClustererBase
implements MultivariateDistribution
```
An implementation of Gaussian Mixture models that learns the specified number of Gaussians using Expectation Maximization algorithm.

Author:

Edward Raff

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

protected int MaxIterLimit
Control the maximum number of iterations to perform.

Fields
Modifier and Type	Field and Description
`protected int`	`MaxIterLimit` Control the maximum number of iterations to perform.

Constructor Summary

Constructors
Constructor and Description
`EMGaussianMixture()`
`EMGaussianMixture(EMGaussianMixture gm)` Copy constructor.
`EMGaussianMixture(SeedSelectionMethods.SeedSelection seedSelection)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`EMGaussianMixture`	`clone()`
`int[]`	`cluster(DataSet dataSet, ExecutorService threadpool, int[] designations)` Performs clustering on the given data set.
`int[]`	`cluster(DataSet dataSet, int[] designations)` Performs clustering on the given data set.
`int[]`	`cluster(DataSet dataSet, int clusters, ExecutorService threadpool, int[] designations)`
`int[]`	`cluster(DataSet dataSet, int clusters, int[] designations)`
`int[]`	`cluster(DataSet dataSet, int lowK, int highK, ExecutorService threadpool, int[] designations)`
`int[]`	`cluster(DataSet dataSet, int lowK, int highK, int[] designations)`
`protected double`	`cluster(DataSet dataSet, List<Double> accelCache, int K, List<Vec> means, int[] assignment, boolean exactTotal, ExecutorService threadpool, boolean returnError)`
`protected double`	`clusterCompute(int K, DataSet dataSet, int[] assignment, List<Vec> means, List<Matrix> covs, ExecutorService execServ)`
`int`	`getIterationLimit()` Returns the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
`SeedSelectionMethods.SeedSelection`	`getSeedSelection()`
`double`	`logPdf(double... x)` Computes the log of the probability density function.
`double`	`logPdf(Vec x)` Computes the log of the probability density function.
`double`	`pdf(double... x)` Returns the probability of a given vector from this distribution.
`double`	`pdf(Vec x)` Returns the probability of a given vector from this distribution.
`List<Vec>`	`sample(int count, Random rand)` Performs sampling on the current distribution.
`void`	`setIterationLimit(int iterLimit)` Sets the maximum number of iterations allowed
`void`	`setSeedSelection(SeedSelectionMethods.SeedSelection seedSelection)` Sets the method of seed selection to use for this algorithm.
`boolean`	`setUsingData(DataSet dataSet)` Sets the parameters of the distribution to attempt to fit the given list of data points.
`boolean`	`setUsingData(DataSet dataSet, ExecutorService threadpool)` Sets the parameters of the distribution to attempt to fit the given list of data points.
`<V extends Vec> boolean`	`setUsingData(List<V> dataSet)` Sets the parameters of the distribution to attempt to fit the given list of vectors.
`<V extends Vec> boolean`	`setUsingData(List<V> dataSet, ExecutorService threadpool)` Sets the parameters of the distribution to attempt to fit the given list of vectors.
`boolean`	`setUsingDataList(List<DataPoint> dataPoint)` Sets the parameters of the distribution to attempt to fit the given list of data points.
`boolean`	`setUsingDataList(List<DataPoint> dataPoints, ExecutorService threadpool)` Sets the parameters of the distribution to attempt to fit the given list of data points.

Methods inherited from class jsat.clustering.KClustererBase
cluster, cluster, cluster, cluster

Methods inherited from class jsat.clustering.ClustererBase
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster, supportsWeightedData

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface jsat.clustering.Clusterer
cluster, cluster, supportsWeightedData

- Field Detail
  - MaxIterLimit
```
protected int MaxIterLimit
```
    Control the maximum number of iterations to perform.
- Constructor Detail
  - EMGaussianMixture
```
public EMGaussianMixture(SeedSelectionMethods.SeedSelection seedSelection)
```
  - EMGaussianMixture
```
public EMGaussianMixture()
```
  - EMGaussianMixture
```
public EMGaussianMixture(EMGaussianMixture gm)
```
    Copy constructor. The new Gaussian Mixture can be altered without effecting gm
    
    Parameters:
    
    gm - the Guassian Mixture to duplicate
- Method Detail
  - setSeedSelection
```
public void setSeedSelection(SeedSelectionMethods.SeedSelection seedSelection)
```
    Sets the method of seed selection to use for this algorithm. SeedSelectionMethods.SeedSelection.KPP is recommended for this algorithm in particular.
    
    Parameters:
    
    seedSelection - the method of seed selection to use
  - getSeedSelection
```
public SeedSelectionMethods.SeedSelection getSeedSelection()
```
    Returns:
    
    the method of seed selection used
  - setIterationLimit
```
public void setIterationLimit(int iterLimit)
```
    Sets the maximum number of iterations allowed
    
    Parameters:
    
    iterLimit - the maximum number of iterations of the ElkanKMeans algorithm
  - getIterationLimit
```
public int getIterationLimit()
```
    Returns the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
    
    Returns:
    
    the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
  - cluster
```
protected double cluster(DataSet dataSet,
                         List<Double> accelCache,
                         int K,
                         List<Vec> means,
                         int[] assignment,
                         boolean exactTotal,
                         ExecutorService threadpool,
                         boolean returnError)
```
  - clusterCompute
```
protected double clusterCompute(int K,
                                DataSet dataSet,
                                int[] assignment,
                                List<Vec> means,
                                List<Matrix> covs,
                                ExecutorService execServ)
```
  - logPdf
```
public double logPdf(double... x)
```
    Description copied from interface: MultivariateDistribution
    
    Computes the log of the probability density function. If the probability of the input is zero, the log of zero would be Double.NEGATIVE_INFINITY. Instead, -Double.MAX_VALUE is returned.
    
    Specified by:
    
    logPdf in interface MultivariateDistribution
    
    Parameters:
    
    x - the array for the vector the get the log probability of
    
    Returns:
    
    the log of the probability.
  - logPdf
```
public double logPdf(Vec x)
```
    Description copied from interface: MultivariateDistribution
    
    Computes the log of the probability density function. If the probability of the input is zero, the log of zero would be Double.NEGATIVE_INFINITY. Instead, -Double.MAX_VALUE is returned.
    
    Specified by:
    
    logPdf in interface MultivariateDistribution
    
    Parameters:
    
    x - the vector the get the log probability of
    
    Returns:
    
    the log of the probability.
  - pdf
```
public double pdf(double... x)
```
    Description copied from interface: MultivariateDistribution
    
    Returns the probability of a given vector from this distribution. By definition, the probability will always be in the range [0, 1].
    
    Specified by:
    
    pdf in interface MultivariateDistribution
    
    Parameters:
    
    x - the array of the vector the get the log probability of
    
    Returns:
    
    the probability
  - pdf
```
public double pdf(Vec x)
```
    Description copied from interface: MultivariateDistribution
    
    Returns the probability of a given vector from this distribution. By definition, the probability will always be in the range [0, 1].
    
    Specified by:
    
    pdf in interface MultivariateDistribution
    
    Parameters:
    
    x - the vector the get the log probability of
    
    Returns:
    
    the probability
  - setUsingData
```
public <V extends Vec> boolean setUsingData(List<V> dataSet)
```
    Description copied from interface: MultivariateDistribution
    
    Sets the parameters of the distribution to attempt to fit the given list of vectors. All vectors are assumed to have the same weight.
    
    Specified by:
    
    setUsingData in interface MultivariateDistribution
    
    Type Parameters:
    
    V - the vector type
    
    Parameters:
    
    dataSet - the list of data points
    
    Returns:
    
    true if the distribution was fit to the data, or false if the distribution could not be fit to the data set.
  - setUsingDataList
```
public boolean setUsingDataList(List<DataPoint> dataPoint)
```
    Description copied from interface: MultivariateDistribution
    
    Sets the parameters of the distribution to attempt to fit the given list of data points. The weights of the data points will be used.
    
    Specified by:
    
    setUsingDataList in interface MultivariateDistribution
    
    Parameters:
    
    dataPoint - the list of data points to use
    
    Returns:
    
    true if the distribution was fit to the data, or false if the distribution could not be fit to the data set.
  - setUsingData
```
public boolean setUsingData(DataSet dataSet)
```
    Description copied from interface: MultivariateDistribution
    
    Sets the parameters of the distribution to attempt to fit the given list of data points. The weights of the data points will be used.
    
    Specified by:
    
    setUsingData in interface MultivariateDistribution
    
    Parameters:
    
    dataSet - the data set to use
    
    Returns:
    
    true if the distribution was fit to the data, or false if the distribution could not be fit to the data set.
  - setUsingData
```
public boolean setUsingData(DataSet dataSet,
                            ExecutorService threadpool)
```
    Description copied from interface: MultivariateDistribution
    
    Sets the parameters of the distribution to attempt to fit the given list of data points. The weights of the data points will be used.
    
    Specified by:
    
    setUsingData in interface MultivariateDistribution
    
    Parameters:
    
    dataSet - the data set to use
    
    threadpool - the source of threads for computation
    
    Returns:
    
    true if the distribution was fit to the data, or false if the distribution could not be fit to the data set.
  - setUsingData
```
public <V extends Vec> boolean setUsingData(List<V> dataSet,
                                            ExecutorService threadpool)
```
    Description copied from interface: MultivariateDistribution
    
    Sets the parameters of the distribution to attempt to fit the given list of vectors. All vectors are assumed to have the same weight.
    
    Specified by:
    
    setUsingData in interface MultivariateDistribution
    
    Type Parameters:
    
    V - the vector type
    
    Parameters:
    
    dataSet - the list of data points
    
    threadpool - the source of threads for computation
    
    Returns:
    
    true if the distribution was fit to the data, or false if the distribution could not be fit to the data set.
  - setUsingDataList
```
public boolean setUsingDataList(List<DataPoint> dataPoints,
                                ExecutorService threadpool)
```
    Description copied from interface: MultivariateDistribution
    
    Sets the parameters of the distribution to attempt to fit the given list of data points. The weights of the data points will be used.
    
    Specified by:
    
    setUsingDataList in interface MultivariateDistribution
    
    Parameters:
    
    dataPoints - the list of data points to use
    
    threadpool - the source of threads for computation
    
    Returns:
    
    true if the distribution was fit to the data, or false if the distribution could not be fit to the data set.
  - clone
```
public EMGaussianMixture clone()
```
    Specified by:
    
    clone in interface Clusterer
    
    Specified by:
    
    clone in interface KClusterer
    
    Specified by:
    
    clone in interface MultivariateDistribution
    
    Specified by:
    
    clone in class KClustererBase
  - sample
```
public List<Vec> sample(int count,
                        Random rand)
```
    Description copied from interface: MultivariateDistribution
    
    Performs sampling on the current distribution.
    
    Specified by:
    
    sample in interface MultivariateDistribution
    
    Parameters:
    
    count - the number of iid samples to draw
    
    rand - the source of randomness
    
    Returns:
    
    a list of sample vectors from this distribution
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int[] designations)
```
    Description copied from interface: Clusterer
    
    Performs clustering on the given data set. Parameters may be estimated by the method, or other heuristics performed.
    
    Specified by:
    
    cluster in interface Clusterer
    
    Parameters:
    
    dataSet - the data set to perform clustering on
    
    designations - the array which will contain the designated values. The array will be altered and returned by the function. If null is given, a new array will be created and returned.
    
    Returns:
    
    an array indicating for each value indicating the cluster designation. This is the same array as designations, or a new one if the input array was null
  - cluster
```
public int[] cluster(DataSet dataSet,
                     ExecutorService threadpool,
                     int[] designations)
```
    Description copied from interface: Clusterer
    
    Performs clustering on the given data set. Parameters may be estimated by the method, or other heuristics performed.
    
    Specified by:
    
    cluster in interface Clusterer
    
    Parameters:
    
    dataSet - the data set to perform clustering on
    
    threadpool - a source of threads to run tasks
    
    designations - the array which will contain the designated values. The array will be altered and returned by the function. If null is given, a new array will be created and returned.
    
    Returns:
    
    an array indicating for each value indicating the cluster designation. This is the same array as designations, or a new one if the input array was null
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int clusters,
                     ExecutorService threadpool,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int clusters,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int lowK,
                     int highK,
                     ExecutorService threadpool,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int lowK,
                     int highK,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer

Class EMGaussianMixture

Field Summary

Constructor Summary

Method Summary

Methods inherited from class jsat.clustering.KClustererBase

Methods inherited from class jsat.clustering.ClustererBase

Methods inherited from class java.lang.Object

Methods inherited from interface jsat.clustering.Clusterer

Field Detail

MaxIterLimit

Constructor Detail

EMGaussianMixture

EMGaussianMixture

EMGaussianMixture

Method Detail

setSeedSelection

getSeedSelection

setIterationLimit

getIterationLimit

cluster

clusterCompute

logPdf

logPdf

pdf

pdf

setUsingData

setUsingDataList

setUsingData

setUsingData

setUsingData

setUsingDataList

clone

sample

cluster

cluster

cluster

cluster

cluster

cluster