public abstract class KernelKMeans extends KClustererBase implements Parameterized
the closest cluster
or finding
the distance between means
Modifier and Type | Field and Description |
---|---|
protected List<Double> |
accel
THe acceleration cache for the kernel
|
protected KernelTrick |
kernel
The kernel trick to use
|
protected int |
maximumIterations |
protected double[] |
meanSqrdNorms
The value of the un-normalized squared norm for each mean
|
protected int[] |
newDesignations
A temporary space for updating ownership designations for each datapoint.
|
protected double[] |
normConsts
The normalizing constant for each mean.
|
protected double[] |
ownes
The weighted number of dataums owned by each mean
|
protected double[] |
selfK
The value of k(x,x) for every point in
X |
protected Vec |
W
The weight of each data point
|
protected List<Vec> |
X
The list of data points that this was trained on
|
Constructor and Description |
---|
KernelKMeans(KernelKMeans toCopy)
Copy constructor
|
KernelKMeans(KernelTrick kernel) |
Modifier and Type | Method and Description |
---|---|
protected void |
applyMeanUpdates(double[] sqrdNorms,
double[] ownerships) |
abstract KernelKMeans |
clone() |
int[] |
cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations)
Performs clustering on the given data set.
|
int[] |
cluster(DataSet dataSet,
int[] designations)
Performs clustering on the given data set.
|
int[] |
cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
protected double |
distance(int i,
int k,
int[] designations)
Computes the distance between one data point and a specified mean
|
double |
distance(Vec x,
int k)
Returns the distance between the given data point and the the specified cluster
|
double |
distance(Vec x,
List<Double> qi,
int k)
Returns the distance between the given data point and the the specified cluster
|
protected double |
evalSumK(int i,
int clusterID,
int[] d)
Computes the kernel sum of data point
i against all the points in
cluster group clusterID . |
protected double |
evalSumK(Vec x,
List<Double> qi,
int clusterID,
int[] d)
Computes the kernel sum of the given data point against all the points in
cluster group
clusterID . |
int |
findClosestCluster(Vec x)
Finds the cluster ID that is closest to the given data point
|
int |
findClosestCluster(Vec x,
List<Double> qi)
Finds the cluster ID that is closest to the given data point
|
int |
getMaximumIterations()
Returns the maximum number of iterations of the KMeans algorithm that will be performed.
|
Parameter |
getParameter(String paramName)
Returns the parameter with the given name.
|
List<Parameter> |
getParameters()
Returns the list of parameters that can be altered for this learner.
|
double |
meanToMeanDistance(int k0,
int k1)
Computes the distance between two of the means in the clustering
|
protected double |
meanToMeanDistance(int k0,
int k1,
int[] assignments) |
protected double |
meanToMeanDistance(int k0,
int k1,
int[] assignments,
ExecutorService ex) |
protected double |
meanToMeanDistance(int k0,
int k1,
int[] assignments0,
int[] assignments1,
double k1SqrdNorm) |
protected double |
meanToMeanDistance(int k0,
int k1,
int[] assignments0,
int[] assignments1,
double k1SqrdNorm,
ExecutorService ex) |
void |
setMaximumIterations(int iterLimit)
Sets the maximum number of iterations allowed
|
protected void |
setup(int K,
int[] designations,
Vec W)
Sets up the internal structure for KenrelKMeans.
|
boolean |
supportsWeightedData()
Indicates whether the model knows how to cluster using weighted data
points.
|
protected int |
updateMeansFromChange(int i,
int[] designations)
Updates the means based off the change of a specific data point
|
protected int |
updateMeansFromChange(int i,
int[] designations,
double[] sqrdNorms,
double[] ownership)
Accumulates the updates to the means and ownership into the provided
arrays.
|
protected void |
updateNormConsts()
Updates the normalizing constants for each mean.
|
cluster, cluster, cluster, cluster
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
cluster, cluster
protected KernelTrick kernel
protected Vec W
protected double[] selfK
X
protected double[] meanSqrdNorms
protected double[] normConsts
protected double[] ownes
protected int[] newDesignations
protected int maximumIterations
public KernelKMeans(KernelTrick kernel)
kernel
- the kernel to usepublic KernelKMeans(KernelKMeans toCopy)
toCopy
- the object to copypublic void setMaximumIterations(int iterLimit)
iterLimit
- the maximum number of iterations of the KMeans algorithmpublic int getMaximumIterations()
public int[] cluster(DataSet dataSet, int[] designations)
Clusterer
cluster
in interface Clusterer
dataSet
- the data set to perform clustering ondesignations
- the array which will contain the designated values. The array will be altered and returned by
the function. If null is given, a new array will be created and returned.public int[] cluster(DataSet dataSet, ExecutorService threadpool, int[] designations)
Clusterer
cluster
in interface Clusterer
dataSet
- the data set to perform clustering onthreadpool
- a source of threads to run tasksdesignations
- the array which will contain the designated values. The array will be altered and returned by
the function. If null is given, a new array will be created and returned.public int[] cluster(DataSet dataSet, int lowK, int highK, ExecutorService threadpool, int[] designations)
cluster
in interface KClusterer
public int[] cluster(DataSet dataSet, int lowK, int highK, int[] designations)
cluster
in interface KClusterer
protected double evalSumK(int i, int clusterID, int[] d)
i
against all the points in
cluster group clusterID
.i
- the index of the data point to query forclusterID
- the cluster index to get the sum of kernel productsd
- protected double evalSumK(Vec x, List<Double> qi, int clusterID, int[] d)
clusterID
.x
- the data point to get the kernel sum ofqi
- the query information for the given data point generated from the kernel in use. See KernelTrick.getQueryInfo(jsat.linear.Vec)
clusterID
- the cluster index to get the sum of kernel productsd
- the array of cluster assignmentsprotected void setup(int K, int[] designations, Vec W)
K
- the number of clusters to finddesignations
- the initial designations array to fill with valuesW
- the weight for each individual data pointprotected void updateNormConsts()
protected double distance(int i, int k, int[] designations)
i
- the data point to get the distance fork
- the mean index to get the distance todesignations
- the array if ownership designations for each cluster to usex
i and mean k
public double distance(Vec x, int k)
x
- the data point to get the distance fork
- the cluster id to get the distance topublic double distance(Vec x, List<Double> qi, int k)
x
- the data point to get the distance forqi
- the query information for the given data point generated for the kernel in use. See KernelTrick.getQueryInfo(jsat.linear.Vec)
k
- the cluster id to get the distance topublic int findClosestCluster(Vec x)
x
- the data point to get the closest cluster forpublic int findClosestCluster(Vec x, List<Double> qi)
x
- the data point to get the closest cluster forqi
- the query information for the given data point generated for the kernel in use. See KernelTrick.getQueryInfo(jsat.linear.Vec)
protected int updateMeansFromChange(int i, int[] designations)
i
- the index of the data point to try and update the means based on its movementdesignations
- the old assignments for ownership of each data point to one of the means1
if the index changed ownership, 0
if the index did not change ownershipprotected int updateMeansFromChange(int i, int[] designations, double[] sqrdNorms, double[] ownership)
meanSqrdNorms
, and is meant to
accumulate the change. To apply the changes pass the same arrays to #applyMeanUpdates(double[], int[])
i
- the index of the data point to try and update the means based on its movementdesignations
- the old assignments for ownership of each data point to one of the meanssqrdNorms
- the array to place the changes to the squared norms inownership
- the array to place the changes to the ownership counts in1
if the index changed ownership, 0
if the index did not change ownershipprotected void applyMeanUpdates(double[] sqrdNorms, double[] ownerships)
public double meanToMeanDistance(int k0, int k1)
k0
- the index of the first meank1
- the index of the second meanprotected double meanToMeanDistance(int k0, int k1, int[] assignments)
protected double meanToMeanDistance(int k0, int k1, int[] assignments, ExecutorService ex)
protected double meanToMeanDistance(int k0, int k1, int[] assignments0, int[] assignments1, double k1SqrdNorm)
k0
- the index of the first clusterk1
- the index of the second clusterassignments0
- the array of assignments to use for index k0assignments1
- the array of assignments to use for index k1k1SqrdNorm
- the normalized squared norm for the mean
indicated by k1
. (ie: meanSqrdNorms
multiplied by normConsts
protected double meanToMeanDistance(int k0, int k1, int[] assignments0, int[] assignments1, double k1SqrdNorm, ExecutorService ex)
k0
- the index of the first clusterk1
- the index of the second clusterassignments0
- the array of assignments to use for index k0assignments1
- the array of assignments to use for index k1k1SqrdNorm
- the normalized squared norm for the mean
indicated by k1
. (ie: meanSqrdNorms
multiplied by normConsts
ex
- source of threads for parallel executionpublic abstract KernelKMeans clone()
clone
in interface Clusterer
clone
in interface KClusterer
clone
in class KClustererBase
public boolean supportsWeightedData()
Clusterer
supportsWeightedData
in interface Clusterer
supportsWeightedData
in class ClustererBase
public List<Parameter> getParameters()
Parameterized
getParameters
in interface Parameterized
public Parameter getParameter(String paramName)
Parameterized
getParameter
in interface Parameterized
paramName
- the name of the parameter to obtainCopyright © 2017. All rights reserved.