HamerlyKMeans (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.clustering.ClustererBase
- - jsat.clustering.KClustererBase
  - - jsat.clustering.kmeans.KMeans
    - - jsat.clustering.kmeans.HamerlyKMeans

All Implemented Interfaces:

Serializable, Clusterer, KClusterer, Parameterized
```
public class HamerlyKMeans
extends KMeans
```
An efficient implementation of the K-Means algorithm. This implementation uses the triangle inequality to accelerate computation while maintaining the exact same solution. This requires that the DistanceMetric used support DistanceMetric.isSubadditive(). It uses only O(n) extra memory.

See:
- Hamerly, G. (2010). Making k-means even faster. SIAM International Conference on Data Mining (SDM) (pp. 130–140). Retrieved from here
- Ryšavý, P., & Hamerly, G. (2016). Geometric methods to accelerate k-means algorithms. In Proceedings of the 2016 SIAM International Conference on Data Mining (pp. 324–332). Philadelphia, PA: Society for Industrial and Applied Mathematics. http://doi.org/10.1137/1.9781611974348.37
Author:

Edward Raff

See Also:

Serialized Form

Field Summary
- Fields inherited from class jsat.clustering.kmeans.KMeans
  DEFAULT_SEED_SELECTION, dm, MaxIterLimit, means, nearestCentroidDist, rand, saveCentroidDistance, seedSelection, storeMeans

Constructor Summary

Constructors
Constructor and Description
`HamerlyKMeans()` Creates a new k-Means object
`HamerlyKMeans(DistanceMetric dm, SeedSelectionMethods.SeedSelection seedSelection)` Creates a new k-Means object
`HamerlyKMeans(DistanceMetric dm, SeedSelectionMethods.SeedSelection seedSelection, Random rand)` Creates a new k-Means object
`HamerlyKMeans(HamerlyKMeans toCopy)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`HamerlyKMeans`	`clone()`
`protected double`	`cluster(DataSet dataSet, List<Double> accelCache, int k, List<Vec> means, int[] assignment, boolean exactTotal, ExecutorService threadpool, boolean returnError, Vec dataPointWeights)` This is a helper method where the actual cluster is performed.

Methods inherited from class jsat.clustering.kmeans.KMeans
cluster, cluster, cluster, cluster, cluster, cluster, getDistanceMetric, getIterationLimit, getListOfLists, getMeans, getParameter, getParameters, getSeedSelection, setIterationLimit, setSeedSelection, setStoreMeans, supportsWeightedData

Methods inherited from class jsat.clustering.KClustererBase
cluster, cluster, cluster, cluster

Methods inherited from class jsat.clustering.ClustererBase
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface jsat.clustering.Clusterer
cluster, cluster

- Constructor Detail
  - HamerlyKMeans
```
public HamerlyKMeans(DistanceMetric dm,
                     SeedSelectionMethods.SeedSelection seedSelection,
                     Random rand)
```
    Creates a new k-Means object
    
    Parameters:
    
    dm - the distance metric to use for clustering
    
    seedSelection - the method of initial seed selection
    
    rand - the source of randomnes to use
  - HamerlyKMeans
```
public HamerlyKMeans(DistanceMetric dm,
                     SeedSelectionMethods.SeedSelection seedSelection)
```
    Creates a new k-Means object
    
    Parameters:
    
    dm - the distance metric to use for clustering
    
    seedSelection - the method of initial seed selection
  - HamerlyKMeans
```
public HamerlyKMeans()
```
    Creates a new k-Means object
  - HamerlyKMeans
```
public HamerlyKMeans(HamerlyKMeans toCopy)
```
- Method Detail
  - cluster
```
protected double cluster(DataSet dataSet,
                         List<Double> accelCache,
                         int k,
                         List<Vec> means,
                         int[] assignment,
                         boolean exactTotal,
                         ExecutorService threadpool,
                         boolean returnError,
                         Vec dataPointWeights)
```
    Description copied from class: KMeans
    
    This is a helper method where the actual cluster is performed. This is because there are multiple strategies for modifying kmeans, but all of them require this step.
    The distance metric used is trained if needed
    
    Specified by:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - The set of data points to perform clustering on
    
    accelCache - acceleration cache to use, or null. If null, the kmeans code will attempt to create one
    
    k - the number of clusters
    
    means - the initial points to use as the means. Its length is the number of means that will be searched for. These means will be altered, and should contain deep copies of the points they were drawn from. May be empty, in which case the list will be filled with some selected means
    
    assignment - an empty temp space to store the clustering classifications. Should be the same length as the number of data points
    
    exactTotal - determines how the objective function (return value) will be computed. If true, extra work will be done to compute the exact distance from each data point to its cluster. If false, an upper bound approximation will be used. This also impacts the value stored in KMeans.nearestCentroidDist
    
    threadpool - the source of threads for parallel computation. If null, single threaded execution will occur
    
    returnError - true is the sum of squared distances should be returned. false means any value can be returned. KMeans.saveCentroidDistance only applies if this is true
    
    dataPointWeights - the weight value to use for each data point. If null, assume each point has equal weight.
    
    Returns:
    
    the double
  - clone
```
public HamerlyKMeans clone()
```
    Specified by:
    
    clone in interface Clusterer
    
    Specified by:
    
    clone in interface KClusterer
    
    Specified by:
    
    clone in class KMeans

Class HamerlyKMeans

Field Summary

Fields inherited from class jsat.clustering.kmeans.KMeans

Constructor Summary

Method Summary

Methods inherited from class jsat.clustering.kmeans.KMeans

Methods inherited from class jsat.clustering.KClustererBase

Methods inherited from class jsat.clustering.ClustererBase

Methods inherited from class java.lang.Object

Methods inherited from interface jsat.clustering.Clusterer

Constructor Detail

HamerlyKMeans

HamerlyKMeans

HamerlyKMeans

HamerlyKMeans

Method Detail

cluster

clone