KMeansPDN (Java Statistical Analysis Tool 0.0.8 API)

java.lang.Object
- jsat.clustering.ClustererBase
- - jsat.clustering.KClustererBase
  - - jsat.clustering.kmeans.KMeans
    - - jsat.clustering.kmeans.KMeansPDN

All Implemented Interfaces:

Serializable, Clusterer, KClusterer, Parameterized
```
public class KMeansPDN
extends KMeans
```
This class provides a method of performing KMeans clustering when the value of K is not known. It works by incrementing the value of k up to some specified maximum, and running a full KMeans for each value.

Note, by default this implementation uses a heuristic for the max value of K that is capped at 100 when using the ClustererBase.cluster(jsat.DataSet) type methods.

When the value of K is specified, the implementation will simply call the regular KMeans object it was constructed with. See: Pham, D. T., Dimov, S. S.,&Nguyen, C. D. (2005). Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103–119. doi:10.1243/095440605X8298

Author:

Edward Raff

See Also:

Serialized Form

Field Summary
- Fields inherited from class jsat.clustering.kmeans.KMeans
  DEFAULT_SEED_SELECTION, dm, MaxIterLimit, means, nearestCentroidDist, rand, saveCentroidDistance, seedSelection, storeMeans

Constructor Summary

Constructors
Constructor and Description
`KMeansPDN()` Creates a new clusterer.
`KMeansPDN(KMeans kmeans)` Creates a new clustered that uses the specified object to perform clustering for all `k`.
`KMeansPDN(KMeansPDN toCopy)` Copy constructor

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`KMeansPDN`	`clone()`
`int[]`	`cluster(DataSet dataSet, ExecutorService threadpool, int[] designations)` Performs clustering on the given data set.
`int[]`	`cluster(DataSet dataSet, int[] designations)` Performs clustering on the given data set.
`int[]`	`cluster(DataSet dataSet, int lowK, int highK, ExecutorService threadpool, int[] designations)`
`int[]`	`cluster(DataSet dataSet, int lowK, int highK, int[] designations)`
`protected double`	`cluster(DataSet dataSet, List<Double> accelCache, int k, List<Vec> means, int[] assignment, boolean exactTotal, ExecutorService threadpool, boolean returnError, Vec dataPointWeights)` This is a helper method where the actual cluster is performed.
`double[]`	`getfKs()` Returns the array of `f(K)` values generated for the last data set.

Methods inherited from class jsat.clustering.kmeans.KMeans
cluster, cluster, getDistanceMetric, getIterationLimit, getListOfLists, getMeans, getParameter, getParameters, getSeedSelection, setIterationLimit, setSeedSelection, setStoreMeans, supportsWeightedData

Methods inherited from class jsat.clustering.KClustererBase
cluster, cluster, cluster, cluster

Methods inherited from class jsat.clustering.ClustererBase
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface jsat.clustering.Clusterer
cluster, cluster

- Constructor Detail
  - KMeansPDN
```
public KMeansPDN()
```
    Creates a new clusterer.
  - KMeansPDN
```
public KMeansPDN(KMeans kmeans)
```
    Creates a new clustered that uses the specified object to perform clustering for all k.
    
    Parameters:
    
    kmeans - the k-means object to use for clustering
  - KMeansPDN
```
public KMeansPDN(KMeansPDN toCopy)
```
    Copy constructor
    
    Parameters:
    
    toCopy - the object to copy
- Method Detail
  - getfKs
```
public double[] getfKs()
```
    Returns the array of f(K) values generated for the last data set. The value at index i is the score for cluster i+1. Smaller values indicate better clusterings.
    
    Returns:
    
    the array of f(K) values, or null if no data set has been clustered
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int[] designations)
```
    Description copied from interface: Clusterer
    
    Performs clustering on the given data set. Parameters may be estimated by the method, or other heuristics performed.
    
    Specified by:
    
    cluster in interface Clusterer
    
    Overrides:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - the data set to perform clustering on
    
    designations - the array which will contain the designated values. The array will be altered and returned by the function. If null is given, a new array will be created and returned.
    
    Returns:
    
    an array indicating for each value indicating the cluster designation. This is the same array as designations, or a new one if the input array was null
  - cluster
```
public int[] cluster(DataSet dataSet,
                     ExecutorService threadpool,
                     int[] designations)
```
    Description copied from interface: Clusterer
    
    Performs clustering on the given data set. Parameters may be estimated by the method, or other heuristics performed.
    
    Specified by:
    
    cluster in interface Clusterer
    
    Overrides:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - the data set to perform clustering on
    
    threadpool - a source of threads to run tasks
    
    designations - the array which will contain the designated values. The array will be altered and returned by the function. If null is given, a new array will be created and returned.
    
    Returns:
    
    an array indicating for each value indicating the cluster designation. This is the same array as designations, or a new one if the input array was null
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int lowK,
                     int highK,
                     ExecutorService threadpool,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
    
    Overrides:
    
    cluster in class KMeans
  - cluster
```
public int[] cluster(DataSet dataSet,
                     int lowK,
                     int highK,
                     int[] designations)
```
    Specified by:
    
    cluster in interface KClusterer
    
    Overrides:
    
    cluster in class KMeans
  - cluster
```
protected double cluster(DataSet dataSet,
                         List<Double> accelCache,
                         int k,
                         List<Vec> means,
                         int[] assignment,
                         boolean exactTotal,
                         ExecutorService threadpool,
                         boolean returnError,
                         Vec dataPointWeights)
```
    Description copied from class: KMeans
    
    This is a helper method where the actual cluster is performed. This is because there are multiple strategies for modifying kmeans, but all of them require this step.
    The distance metric used is trained if needed
    
    Specified by:
    
    cluster in class KMeans
    
    Parameters:
    
    dataSet - The set of data points to perform clustering on
    
    accelCache - acceleration cache to use, or null. If null, the kmeans code will attempt to create one
    
    k - the number of clusters
    
    means - the initial points to use as the means. Its length is the number of means that will be searched for. These means will be altered, and should contain deep copies of the points they were drawn from. May be empty, in which case the list will be filled with some selected means
    
    assignment - an empty temp space to store the clustering classifications. Should be the same length as the number of data points
    
    exactTotal - determines how the objective function (return value) will be computed. If true, extra work will be done to compute the exact distance from each data point to its cluster. If false, an upper bound approximation will be used. This also impacts the value stored in KMeans.nearestCentroidDist
    
    threadpool - the source of threads for parallel computation. If null, single threaded execution will occur
    
    returnError - true is the sum of squared distances should be returned. false means any value can be returned. KMeans.saveCentroidDistance only applies if this is true
    
    dataPointWeights - the weight value to use for each data point. If null, assume each point has equal weight.
    
    Returns:
    
    the double
  - clone
```
public KMeansPDN clone()
```
    Specified by:
    
    clone in interface Clusterer
    
    Specified by:
    
    clone in interface KClusterer
    
    Specified by:
    
    clone in class KMeans

Class KMeansPDN

Field Summary

Fields inherited from class jsat.clustering.kmeans.KMeans

Constructor Summary

Method Summary

Methods inherited from class jsat.clustering.kmeans.KMeans

Methods inherited from class jsat.clustering.KClustererBase

Methods inherited from class jsat.clustering.ClustererBase

Methods inherited from class java.lang.Object

Methods inherited from interface jsat.clustering.Clusterer

Constructor Detail

KMeansPDN

KMeansPDN

KMeansPDN

Method Detail

getfKs

cluster

cluster

cluster

cluster

cluster

clone