public class HDBSCAN extends ClustererBase implements Parameterized
DBSCAN
. Unlike its predecessor, HDBSCAN works with variable density
datasets and does not need a search radius to be specified. The original
paper presents HDBSCAN with two parameters
mpts
and
mclSize
, but recomends that they
can be set to the same value and effectively behave as if only one parameter
exists. This implementation allows for setting both independtly, but the
single parameter constructors will use the same value for both parameters.
Constructor and Description |
---|
HDBSCAN()
Creates a new HDBSCAN object using a threshold of 15 points to form a
cluster.
|
HDBSCAN(DistanceMetric dm,
int m_pts)
Creates a new HDBSCAN using the simplified form, where the only parameter
is a single value.
|
HDBSCAN(DistanceMetric dm,
int m_pts,
int m_clSize,
VectorCollectionFactory<Vec> vcf)
Creates a new HDBSCAN using the full specification of the algorithm,
where two parameters may be altered.
|
HDBSCAN(DistanceMetric dm,
int m_pts,
VectorCollectionFactory<Vec> vcf)
Creates a new HDBSCAN using the simplified form, where the only parameter
is a single value.
|
HDBSCAN(HDBSCAN toCopy)
Copy constructor
|
HDBSCAN(int m_pts)
Creates a new HDBSCAN using the simplified form, where the only parameter
is a single value.
|
Modifier and Type | Method and Description |
---|---|
HDBSCAN |
clone() |
int[] |
cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations)
Performs clustering on the given data set.
|
int[] |
cluster(DataSet dataSet,
int[] designations)
Performs clustering on the given data set.
|
DistanceMetric |
getDistanceMetrics() |
int |
getMinClusterSize() |
int |
getMinPoints() |
Parameter |
getParameter(String paramName)
Returns the parameter with the given name.
|
List<Parameter> |
getParameters()
Returns the list of parameters that can be altered for this learner.
|
void |
setDistanceMetrics(DistanceMetric dm)
Sets the distance metric to use for determining closeness between data points
|
void |
setMinClusterSize(int m_clSize) |
void |
setMinPoints(int m_pts) |
cluster, cluster, createClusterListFromAssignmentArray, getDatapointsFromCluster, supportsWeightedData
public HDBSCAN()
public HDBSCAN(int m_pts)
m_pts
- the minimum number of points needed to form a cluster and
the number of neighbors to considerpublic HDBSCAN(DistanceMetric dm, int m_pts)
dm
- the distance metric to use for finding nearest neighborsm_pts
- the minimum number of points needed to form a cluster and
the number of neighbors to considerpublic HDBSCAN(DistanceMetric dm, int m_pts, VectorCollectionFactory<Vec> vcf)
dm
- the distance metric to use for finding nearest neighborsm_pts
- the minimum number of points needed to form a cluster and
the number of neighbors to considervcf
- the vector collection to use for accelerating nearest neighbor
queriespublic HDBSCAN(DistanceMetric dm, int m_pts, int m_clSize, VectorCollectionFactory<Vec> vcf)
dm
- the distance metric to use for finding nearest neighborsm_pts
- the number of neighbors to consider, acts as a smoothing
over the density estimatem_clSize
- the minimum number of data points needed to form a
clustervcf
- the vector collection to use for accelerating nearest neighbor
queriespublic HDBSCAN(HDBSCAN toCopy)
toCopy
- the object to copypublic void setMinClusterSize(int m_clSize)
m_clSize
- the minimum number of data points needed to form a
clusterpublic int getMinClusterSize()
public void setDistanceMetrics(DistanceMetric dm)
dm
- the distance metric to determine nearest neighbors withpublic DistanceMetric getDistanceMetrics()
public void setMinPoints(int m_pts)
m_pts
- the number of neighbors to consider, acts as a smoothing
over the density estimatepublic int getMinPoints()
public HDBSCAN clone()
clone
in interface Clusterer
clone
in class ClustererBase
public int[] cluster(DataSet dataSet, int[] designations)
Clusterer
cluster
in interface Clusterer
dataSet
- the data set to perform clustering ondesignations
- the array which will contain the designated values. The array will be altered and returned by
the function. If null is given, a new array will be created and returned.public int[] cluster(DataSet dataSet, ExecutorService threadpool, int[] designations)
Clusterer
cluster
in interface Clusterer
dataSet
- the data set to perform clustering onthreadpool
- a source of threads to run tasksdesignations
- the array which will contain the designated values. The array will be altered and returned by
the function. If null is given, a new array will be created and returned.public List<Parameter> getParameters()
Parameterized
getParameters
in interface Parameterized
public Parameter getParameter(String paramName)
Parameterized
getParameter
in interface Parameterized
paramName
- the name of the parameter to obtainCopyright © 2017. All rights reserved.