public class MetricKDE extends MultivariateKDE implements Parameterized
KernelDensityEstimator
to the multivariate case.
A KernelFunction
is used to weight the contribution of each data point, and a
DistanceMetric
is used to effectively alter the shape of the kernel. The MetricKDE uses
one bandwidth parameter, which can be estimated using a nearest neighbor approach, or tuned by hand.
The bandwidth of the MetricKDE can not be estimated en the same way as the univariate case.Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_K
When estimating the bandwidth, the distances of the k'th nearest
neighbors are used to perform the estimate.
|
static KernelFunction |
DEFAULT_KF
When estimating the bandwidth, the distances of the k'th nearest
neighbors are used to perform the estimate.
|
static double |
DEFAULT_STND_DEV
When estimating the bandwidth, the distances of the k'th nearest
neighbors are used to perform the estimate.
|
Constructor and Description |
---|
MetricKDE()
Creates a new KDE object that still needs a data set to model the distribution of
|
MetricKDE(DistanceMetric distanceMetric)
Creates a new KDE object that still needs a data set to model the distribution of
|
MetricKDE(DistanceMetric distanceMetric,
VectorCollectionFactory<VecPaired<Vec,Integer>> vcf)
Creates a new KDE object that still needs a data set to model the distribution of
|
MetricKDE(KernelFunction kf,
DistanceMetric distanceMetric) |
MetricKDE(KernelFunction kf,
DistanceMetric distanceMetric,
VectorCollectionFactory<VecPaired<Vec,Integer>> vcf)
Creates a new KDE object that still needs a data set to model the distribution of
|
MetricKDE(KernelFunction kf,
DistanceMetric distanceMetric,
VectorCollectionFactory<VecPaired<Vec,Integer>> vcf,
int defaultK,
double defaultStndDev)
Creates a new KDE object that still needs a data set to model the distribution of
|
Modifier and Type | Method and Description |
---|---|
MetricKDE |
clone() |
double |
getBandwith()
Returns the current bandwidth used
|
int |
getDefaultK()
Returns the default value of the k'th nearest neighbor to use when not specified.
|
double |
getDefaultStndDev()
Returns the multiple of the standard deviations that is added to the bandwidth estimate
|
DistanceMetric |
getDistanceMetric()
Returns the distance metric that is used for density estimation
|
KernelFunction |
getKernelFunction() |
List<? extends VecPaired<VecPaired<Vec,Integer>,Double>> |
getNearby(Vec x)
Returns the list of vectors that have a non zero contribution to the density of the query point x.
|
List<? extends VecPaired<VecPaired<Vec,Integer>,Double>> |
getNearbyRaw(Vec x)
Returns the list of vectors that have a non zero contribution to the density of the query point x.
|
Parameter |
getParameter(String paramName)
Returns the parameter with the given name.
|
List<Parameter> |
getParameters()
Returns the list of parameters that can be altered for this learner.
|
double |
pdf(Vec x)
Returns the probability of a given vector from this distribution.
|
List<Vec> |
sample(int count,
Random rand)
Sampling not yet supported
|
void |
scaleBandwidth(double scale)
A caller may want to increase or decrease the bandwidth after training
has been completed to get smoother model, or decrease it to observe
behavior.
|
void |
setBandwith(double bandwidth)
Sets the bandwidth used to estimate the density of the underlying distribution.
|
void |
setDefaultK(int defaultK)
When estimating the bandwidth, the mean of the k'th nearest neighbors to each data point
is used.
|
void |
setDefaultStndDev(double defaultStndDev)
When estimating the bandwidth, the mean of the neighbor distances is used, and a multiple of
the standard deviations is added.
|
void |
setDistanceMetric(DistanceMetric distanceMetric)
Sets the distance metric that is used for density estimation
|
void |
setKernelFunction(KernelFunction kf) |
<V extends Vec> |
setUsingData(List<V> dataSet)
Sets the parameters of the distribution to attempt to fit the given list of vectors.
|
<V extends Vec> |
setUsingData(List<V> dataSet,
double bandwith)
Sets the KDE to model the density of the given data set with the specified bandwidth
|
<V extends Vec> |
setUsingData(List<V> dataSet,
double bandwith,
ExecutorService threadpool)
Sets the KDE to model the density of the given data set with the specified bandwidth
|
<V extends Vec> |
setUsingData(List<V> dataSet,
ExecutorService threadpool)
Sets the parameters of the distribution to attempt to fit the given list of vectors.
|
<V extends Vec> |
setUsingData(List<V> dataSet,
int k)
Sets the KDE to model the density of the given data set by estimating the bandwidth by using
the k nearest neighbors of each data point.
|
<V extends Vec> |
setUsingData(List<V> dataSet,
int k,
double stndDevs)
Sets the KDE to model the density of the given data set by estimating the bandwidth
by using the k nearest neighbors of each data data point.
|
<V extends Vec> |
setUsingData(List<V> dataSet,
int k,
double stndDevs,
ExecutorService threadpool)
Sets the KDE to model the density of the given data set by estimating the bandwidth
by using the k nearest neighbors of each data data point.
|
<V extends Vec> |
setUsingData(List<V> dataSet,
int k,
ExecutorService threadpool)
Sets the KDE to model the density of the given data set by estimating the bandwidth by using
the k nearest neighbors of each data point.
|
boolean |
setUsingDataList(List<DataPoint> dataPoints)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
boolean |
setUsingDataList(List<DataPoint> dataPoints,
ExecutorService threadpool)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
logPdf, logPdf, pdf, setUsingData, setUsingData
public static final int DEFAULT_K
public static final double DEFAULT_STND_DEV
public static final KernelFunction DEFAULT_KF
public MetricKDE()
public MetricKDE(DistanceMetric distanceMetric)
distanceMetric
- the distance metric to usepublic MetricKDE(DistanceMetric distanceMetric, VectorCollectionFactory<VecPaired<Vec,Integer>> vcf)
distanceMetric
- the distance metric to usevcf
- a factory to generate vector collection frompublic MetricKDE(KernelFunction kf, DistanceMetric distanceMetric)
public MetricKDE(KernelFunction kf, DistanceMetric distanceMetric, VectorCollectionFactory<VecPaired<Vec,Integer>> vcf)
kf
- the kernel function to usedistanceMetric
- the distance metric to usevcf
- a factory to generate vector collection frompublic MetricKDE(KernelFunction kf, DistanceMetric distanceMetric, VectorCollectionFactory<VecPaired<Vec,Integer>> vcf, int defaultK, double defaultStndDev)
kf
- the kernel function to usedistanceMetric
- the distance metric to usevcf
- a factory to generate vector collection fromdefaultK
- the default neighbor to use when estimating the bandwidthdefaultStndDev
- the default multiple of standard deviations to add when estimating the bandwidthpublic void setBandwith(double bandwidth)
bandwidth
- the bandwidth to use for estimationArithmeticException
- if the bandwidth given is not a positive numberpublic double getBandwith()
public void setDefaultK(int defaultK)
defaultK
- public int getDefaultK()
public void setDefaultStndDev(double defaultStndDev)
defaultStndDev
- the multiple of the standard deviation to add the to bandwidth estimatepublic double getDefaultStndDev()
public DistanceMetric getDistanceMetric()
public void setDistanceMetric(DistanceMetric distanceMetric)
distanceMetric
- the metric to usepublic MetricKDE clone()
clone
in interface MultivariateDistribution
clone
in class MultivariateKDE
public List<? extends VecPaired<VecPaired<Vec,Integer>,Double>> getNearby(Vec x)
MultivariateKDE
getNearby
in class MultivariateKDE
x
- the query pointpublic List<? extends VecPaired<VecPaired<Vec,Integer>,Double>> getNearbyRaw(Vec x)
MultivariateKDE
getNearbyRaw
in class MultivariateKDE
x
- the query pointpublic double pdf(Vec x)
MultivariateDistribution
pdf
in interface MultivariateDistribution
x
- the vector the get the log probability ofpublic <V extends Vec> boolean setUsingData(List<V> dataSet, double bandwith)
dataSet
- the data set to model the density ofbandwith
- the bandwidthpublic <V extends Vec> boolean setUsingData(List<V> dataSet, double bandwith, ExecutorService threadpool)
dataSet
- the data set to model the density ofbandwith
- the bandwidththreadpool
- the source of threads for parallel constructionpublic <V extends Vec> boolean setUsingData(List<V> dataSet, int k)
dataSet
- the data set to model the density ofk
- the number of neighbors to use to estimate the bandwidthpublic <V extends Vec> boolean setUsingData(List<V> dataSet, int k, ExecutorService threadpool)
dataSet
- the data set to model the density ofk
- the number of neighbors to use to estimate the bandwidththreadpool
- the source of threads for computationpublic <V extends Vec> boolean setUsingData(List<V> dataSet, int k, double stndDevs)
dataSet
- the data set to model the density ofk
- the number of neighbors to use to estimate the bandwidthstndDevs
- the multiple of the standard deviation to add to the mean of the distancespublic <V extends Vec> boolean setUsingData(List<V> dataSet, int k, double stndDevs, ExecutorService threadpool)
dataSet
- the data set to model the density ofk
- the number of neighbors to use to estimate the bandwidthstndDevs
- the multiple of the standard deviation to add to the mean of the distancesthreadpool
- the source of threads to use for computationpublic <V extends Vec> boolean setUsingData(List<V> dataSet)
MultivariateDistribution
setUsingData
in interface MultivariateDistribution
V
- the vector typedataSet
- the list of data pointspublic <V extends Vec> boolean setUsingData(List<V> dataSet, ExecutorService threadpool)
MultivariateDistribution
setUsingData
in interface MultivariateDistribution
setUsingData
in class MultivariateDistributionSkeleton
V
- the vector typedataSet
- the list of data pointsthreadpool
- the source of threads for computationpublic boolean setUsingDataList(List<DataPoint> dataPoints)
MultivariateDistribution
weights
of the data points will be used.setUsingDataList
in interface MultivariateDistribution
dataPoints
- the list of data points to usepublic boolean setUsingDataList(List<DataPoint> dataPoints, ExecutorService threadpool)
MultivariateDistribution
weights
of the data points will be used.setUsingDataList
in interface MultivariateDistribution
setUsingDataList
in class MultivariateDistributionSkeleton
dataPoints
- the list of data points to usethreadpool
- the source of threads for computationpublic List<Vec> sample(int count, Random rand)
sample
in interface MultivariateDistribution
count
- rand
- UnsupportedOperationException
- not yet implementedpublic KernelFunction getKernelFunction()
getKernelFunction
in class MultivariateKDE
public void setKernelFunction(KernelFunction kf)
public void scaleBandwidth(double scale)
MultivariateKDE
scaleBandwidth
in class MultivariateKDE
scale
- the value to scale the bandwidth usedpublic List<Parameter> getParameters()
Parameterized
getParameters
in interface Parameterized
public Parameter getParameter(String paramName)
Parameterized
getParameter
in interface Parameterized
paramName
- the name of the parameter to obtainCopyright © 2017. All rights reserved.