GMeans (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.clustering.ClustererBase
- - jsat.clustering.KClustererBase
  - - jsat.clustering.kmeans.KMeans
    - - jsat.clustering.kmeans.GMeans

All Implemented Interfaces:

Serializable, Clusterer, KClusterer, Parameterized
```
public class GMeans
extends KMeans
```
This class provides a method of performing KMeans clustering when the value of K is not known. It works by recursively splitting means up to some specified maximum. value.

When the value of K is specified, the implementation will simply call the regular KMeans object it was constructed with.

See: Hamerly, G.,&Elkan, C. (2003). Learning the K in K-Means. In seventeenth annual conference on neural information processing systems (NIPS) (pp. 281–288). Retrieved from here

Author:

Edward Raff

See Also:

Serialized Form

Field Summary
- Fields inherited from class jsat.clustering.kmeans.KMeans
  DEFAULT_SEED_SELECTION, dm, MaxIterLimit, means, nearestCentroidDist, rand, saveCentroidDistance, seedSelection, storeMeans

Constructor Summary

Constructors
Constructor and Description

GMeans()

GMeans(GMeans toCopy)

GMeans(KMeans kmeans)

Constructors
Constructor and Description
`GMeans()`
`GMeans(GMeans toCopy)`
`GMeans(KMeans kmeans)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`GMeans`	`clone()`
`int[]`	`cluster(DataSet dataSet, ExecutorService threadpool, int[] designations)` Performs clustering on the given data set.
`int[]`	`cluster(DataSet dataSet, int[] designations)` Performs clustering on the given data set.
`int[]`	`cluster(DataSet dataSet, int lowK, int highK, ExecutorService threadpool, int[] designations)`
`int[]`	`cluster(DataSet dataSet, int lowK, int highK, int[] designations)`
`protected double`	`cluster(DataSet dataSet, List<Double> accelCache, int k, List<Vec> means, int[] assignment, boolean exactTotal, ExecutorService threadpool, boolean returnError, Vec dataPointWeights)` This is a helper method where the actual cluster is performed.
`int`	`getIterationLimit()` Returns the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
`boolean`	`getIterativeRefine()`
`int`	`getMinClusterSize()`
`SeedSelectionMethods.SeedSelection`	`getSeedSelection()`
`boolean`	`getTrustH0()`
`void`	`setIterationLimit(int iterLimit)` Sets the maximum number of iterations allowed
`void`	`setIterativeRefine(boolean refineCenters)` Sets whether or not the set of all cluster centers should be refined at every iteration.
`void`	`setMinClusterSize(int minClusterSize)` Sets the minimum size for splitting a cluster.
`void`	`setSeedSelection(SeedSelectionMethods.SeedSelection seedSelection)` Sets the method of seed selection to use for this algorithm.
`void`	`setTrustH0(boolean trustH0)` Each new cluster will be tested for normality, with the null hypothesis H0 being that the cluster is normal.

Methods inherited from class jsat.clustering.kmeans.KMeans
cluster, cluster, getDistanceMetric, getListOfLists, getMeans, getParameter, getParameters, setStoreMeans, supportsWeightedData

Methods inherited from class jsat.clustering.KClustererBase
cluster, cluster, cluster, cluster

Methods inherited from class jsat.clustering.ClustererBase
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface jsat.clustering.Clusterer
cluster, cluster

- Constructor Detail
  - GMeans
```
public GMeans()
```
  - GMeans
```
public GMeans(KMeans kmeans)
```
  - GMeans
```
public GMeans(GMeans toCopy)
```
- Method Detail
  - setTrustH0
```
public void setTrustH0(boolean trustH0)
```
    Each new cluster will be tested for normality, with the null hypothesis H0 being that the cluster is normal. If this is set to true then an optimization is done that once a center fails to reject the null hypothesis, it will never be tested again. This is a safe assumption when setIterativeRefine(boolean) is set to false, but otherwise may not quite be true.
    
    When trustH0 is true (the default option), G-Means will make at most O(k) runs of k-means for the final value of k chosen. When false, at most O(k²) runs of k-means will occur.
    
    Parameters:
    
    trustH0 - true if a centroid shouldn't be re-tested once it fails to split.
  - getTrustH0
```
public boolean getTrustH0()
```
    Returns:
    
    true if cluster that fail to split wont be re-tested. false if they will.
  - setMinClusterSize
```
public void setMinClusterSize(int minClusterSize)
```
    Sets the minimum size for splitting a cluster.
    
    Parameters:
    
    minClusterSize - the minimum number of data points that must be present in a cluster to consider splitting it
  - getMinClusterSize
```
public int getMinClusterSize()
```
    Returns:
    
    the minimum number of data points that must be present in a cluster to consider splitting it
  - setIterativeRefine
```
public void setIterativeRefine(boolean refineCenters)
```
    Sets whether or not the set of all cluster centers should be refined at every iteration. By default this is true and part of how the GMeans algorithm is described. Setting this to false can result in large speedups at the potential cost of quality.
    
    Parameters:
    
    refineCenters - true to refine the cluster centers at every step, false to skip this step of the algorithm.
  - getIterativeRefine
```
public boolean getIterativeRefine()
```
    Returns:
    
    true if the cluster centers are refined at every step, false if skipping this step of the algorithm.
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int[] designations)
```
    Description copied from interface: Clusterer
    
    Performs clustering on the given data set. Parameters may be estimated by the method, or other heuristics performed.
    
    Specified by:
    
    cluster in interface Clusterer
    
    Overrides:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - the data set to perform clustering on
    
    designations - the array which will contain the designated values. The array will be altered and returned by the function. If null is given, a new array will be created and returned.
    
    Returns:
    
    an array indicating for each value indicating the cluster designation. This is the same array as designations, or a new one if the input array was null
  - cluster
```
public int[] cluster(DataSet dataSet,
                     ExecutorService threadpool,
                     int[] designations)
```
    Description copied from interface: Clusterer
    
    Performs clustering on the given data set. Parameters may be estimated by the method, or other heuristics performed.
    
    Specified by:
    
    cluster in interface Clusterer
    
    Overrides:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - the data set to perform clustering on
    
    threadpool - a source of threads to run tasks
    
    designations - the array which will contain the designated values. The array will be altered and returned by the function. If null is given, a new array will be created and returned.
    
    Returns:
    
    an array indicating for each value indicating the cluster designation. This is the same array as designations, or a new one if the input array was null
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int lowK,
                     int highK,
                     ExecutorService threadpool,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
    
    Overrides:
    
    cluster in class KMeans
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int lowK,
                     int highK,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
    
    Overrides:
    
    cluster in class KMeans
  - getIterationLimit
```
public int getIterationLimit()
```
    Description copied from class: KMeans
    
    Returns the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
    
    Overrides:
    
    getIterationLimit in class KMeans
    
    Returns:
    
    the maximum number of iterations of the ElkanKMeans algorithm that will be performed.
  - setIterationLimit
```
public void setIterationLimit(int iterLimit)
```
    Description copied from class: KMeans
    
    Sets the maximum number of iterations allowed
    
    Overrides:
    
    setIterationLimit in class KMeans
    
    Parameters:
    
    iterLimit - the maximum number of iterations of the ElkanKMeans algorithm
  - setSeedSelection
```
public void setSeedSelection(SeedSelectionMethods.SeedSelection seedSelection)
```
    Description copied from class: KMeans
    
    Sets the method of seed selection to use for this algorithm. SeedSelectionMethods.SeedSelection.KPP is recommended for this algorithm in particular.
    
    Overrides:
    
    setSeedSelection in class KMeans
    
    Parameters:
    
    seedSelection - the method of seed selection to use
  - getSeedSelection
```
public SeedSelectionMethods.SeedSelection getSeedSelection()
```
    Overrides:
    
    getSeedSelection in class KMeans
    
    Returns:
    
    the method of seed selection used
  - cluster
```
protected double cluster(DataSet dataSet,
                         List<Double> accelCache,
                         int k,
                         List<Vec> means,
                         int[] assignment,
                         boolean exactTotal,
                         ExecutorService threadpool,
                         boolean returnError,
                         Vec dataPointWeights)
```
    Description copied from class: KMeans
    
    This is a helper method where the actual cluster is performed. This is because there are multiple strategies for modifying kmeans, but all of them require this step.
    The distance metric used is trained if needed
    
    Specified by:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - The set of data points to perform clustering on
    
    accelCache - acceleration cache to use, or null. If null, the kmeans code will attempt to create one
    
    k - the number of clusters
    
    means - the initial points to use as the means. Its length is the number of means that will be searched for. These means will be altered, and should contain deep copies of the points they were drawn from. May be empty, in which case the list will be filled with some selected means
    
    assignment - an empty temp space to store the clustering classifications. Should be the same length as the number of data points
    
    exactTotal - determines how the objective function (return value) will be computed. If true, extra work will be done to compute the exact distance from each data point to its cluster. If false, an upper bound approximation will be used. This also impacts the value stored in KMeans.nearestCentroidDist
    
    threadpool - the source of threads for parallel computation. If null, single threaded execution will occur
    
    returnError - true is the sum of squared distances should be returned. false means any value can be returned. KMeans.saveCentroidDistance only applies if this is true
    
    dataPointWeights - the weight value to use for each data point. If null, assume each point has equal weight.
    
    Returns:
    
    the double
  - clone
```
public GMeans clone()
```
    Specified by:
    
    clone in interface Clusterer
    
    Specified by:
    
    clone in interface KClusterer
    
    Specified by:
    
    clone in class KMeans

Class GMeans

Field Summary

Fields inherited from class jsat.clustering.kmeans.KMeans

Constructor Summary

Method Summary

Methods inherited from class jsat.clustering.kmeans.KMeans

Methods inherited from class jsat.clustering.KClustererBase

Methods inherited from class jsat.clustering.ClustererBase

Methods inherited from class java.lang.Object

Methods inherited from interface jsat.clustering.Clusterer

Constructor Detail

GMeans

GMeans

GMeans

Method Detail

setTrustH0

getTrustH0

setMinClusterSize

getMinClusterSize

setIterativeRefine

getIterativeRefine

cluster

cluster

cluster

cluster

getIterationLimit

setIterationLimit

setSeedSelection

getSeedSelection

cluster

clone