Modifier and Type | Class and Description |
---|---|
class |
DataSet<Type extends DataSet>
This is the base class for representing a data set.
|
Modifier and Type | Class and Description |
---|---|
class |
SimpleDataSet
SimpleData Set is a basic implementation of a data set.
|
Modifier and Type | Method and Description |
---|---|
DataSet |
DataSet.getTwiceShallowClone()
Returns a new version of this data set that is of the same type, and
contains a different listing pointing to shallow data point copies.
|
abstract DataSet<Type> |
DataSet.shallowClone()
Returns a new version of this data set that is of the same type, and
contains a different list pointing to the same data points.
|
Modifier and Type | Method and Description |
---|---|
static void |
ARFFLoader.writeArffFile(DataSet data,
OutputStream os) |
static void |
ARFFLoader.writeArffFile(DataSet data,
OutputStream os,
String relation)
Writes out the dataset as an ARFF file to the given stream.
|
Modifier and Type | Class and Description |
---|---|
class |
ClassificationDataSet
ClassificationDataSet is a data set meant specifically for classification problems.
|
Constructor and Description |
---|
ClassificationDataSet(DataSet dataSet,
int predicting)
Creates a new data set for classification problems.
|
Modifier and Type | Method and Description |
---|---|
static Distribution |
EmphasisBoost.guessLambda(DataSet d)
Guesses the distribution to use for the λ parameter
|
Modifier and Type | Method and Description |
---|---|
static Distribution |
DANN.guessK(DataSet d)
Guesses the distribution to use for the number of neighbors to consider
|
static Distribution |
DANN.guessKn(DataSet d)
Guesses the distribution to use for the number of neighbors to consider
|
static Distribution |
NearestNeighbour.guessNeighbors(DataSet d)
Guesses the distribution to use for the number of neighbors to consider
|
static Distribution |
LWL.guessNeighbors(DataSet d)
Guesses the distribution to use for the number of neighbors to consider
|
Modifier and Type | Method and Description |
---|---|
static Distribution |
NewGLMNET.guessAlpha(DataSet d)
Guess the distribution to use for the trade off term term
(double) α in Elastic Net regularization. |
static Distribution |
SPA.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in Support PassiveAggressive. |
static Distribution |
SCW.guessC(DataSet d)
Guess the distribution to use for the regularization term
C . |
static Distribution |
PassiveAggressive.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in PassiveAggressive. |
static Distribution |
NHERD.guessC(DataSet d)
Guess the distribution to use for the regularization term
C . |
static Distribution |
NewGLMNET.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in Logistic Regression. |
static Distribution |
LogisticRegressionDCD.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in Logistic Regression. |
static Distribution |
SCW.guessEta(DataSet d)
Guess the distribution to use for the regularization term
η . |
static Distribution |
LinearSGD.guessLambda0(DataSet d)
Guess the distribution to use for the regularization term
λ0 . |
static Distribution |
LinearBatch.guessLambda0(DataSet d)
Guess the distribution to use for the regularization term
λ0 . |
static Distribution |
LinearSGD.guessLambda1(DataSet d)
Guess the distribution to use for the regularization term
λ1 . |
static Distribution |
AROW.guessR(DataSet d)
Guess the distribution to use for the regularization term
r . |
Constructor and Description |
---|
GradFunction(DataSet D,
LossFunc loss) |
LossFunction(DataSet D,
LossFunc loss) |
Modifier and Type | Method and Description |
---|---|
static Distribution |
ALMA2K.guessAlpha(DataSet d)
Guesses the distribution to use for the α parameter
|
static Distribution |
DUOL.guessC(DataSet d)
Guesses the distribution to use for the C parameter
|
static Distribution |
BOGD.guessEta(DataSet d)
Guesses the distribution to use for the η parameter
|
static Distribution |
KernelSGD.guessLambda(DataSet d)
Guess the distribution to use for the regularization term
λ . |
static Distribution |
BOGD.guessMaxCoeff(DataSet d)
Guesses the distribution to use for the MaxCoeff parameter
|
static Distribution |
OSKL.guessR(DataSet d)
Guesses the distribution to use for the R parameter
|
static Distribution |
CSKLRBatch.guessR(DataSet d)
Guesses the distribution to use for the R parameter
|
static Distribution |
CSKLR.guessR(DataSet d)
Guesses the distribution to use for the R parameter
|
static Distribution |
BOGD.guessRegularization(DataSet d)
Guesses the distribution to use for the Regularization parameter
|
Modifier and Type | Method and Description |
---|---|
protected abstract double[] |
RBFNet.Phase2Learner.estimateBandwidths(double alpha,
int p,
DataSet data,
List<Vec> centroids,
List<Double> centroidDistCache,
DistanceMetric dm,
ExecutorService threadpool) |
void |
RBFNet.fit(DataSet data) |
protected abstract List<Vec> |
RBFNet.Phase1Learner.getCentroids(DataSet data,
int centroids,
DistanceMetric dm,
ExecutorService ex)
Obtains the centroids for the given data set
|
static Distribution |
RBFNet.guessAlpha(DataSet data)
Guesses the distribution for the
RBFNet.setAlpha(double) parameter |
static Distribution |
RBFNet.guessNumCentroids(DataSet data)
Guesses the distribution for the
RBFNet.setNumCentroids(int) parameter |
static Distribution |
RBFNet.guessP(DataSet data)
Guesses the distribution for the
RBFNet.setP(int) parameter |
Modifier and Type | Method and Description |
---|---|
static Distribution |
PlattSMO.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in a SVM. |
static Distribution |
LSSVM.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in a LS-SVM. |
static Distribution |
DCDs.guessC(DataSet d)
Guess the distribution to use for the regularization term
C in a SVM. |
static Distribution |
Pegasos.guessRegularization(DataSet d)
Guess the distribution to use for the regularization term
Pegasos.setRegularization(double) in Pegasos. |
Modifier and Type | Method and Description |
---|---|
static Distribution |
CPM.guessEntropyThreshold(DataSet d)
Provides a distribution of reasonable values for the
CPM.setEntropyThreshold(double) parameter |
static Distribution |
OnlineAMM.guessLambda(DataSet d)
Guess the distribution to use for the regularization term
λ in AMM. |
static Distribution |
CPM.guessLambda(DataSet d)
Provides a distribution of reasonable values for the
λ parameter |
Modifier and Type | Method and Description |
---|---|
<Type extends DataSet> |
ERTrees.evaluateFeatureImportance(DataSet<Type> data)
Measures the statistics of feature importance from the trees in this
forest.
|
<Type extends DataSet> |
ERTrees.evaluateFeatureImportance(DataSet<Type> data,
TreeFeatureImportanceInference imp)
Measures the statistics of feature importance from the trees in this
forest.
|
<Type extends DataSet> |
TreeFeatureImportanceInference.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
<Type extends DataSet> |
MDI.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
<Type extends DataSet> |
MDA.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
<Type extends DataSet> |
ImportanceByUses.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
Modifier and Type | Method and Description |
---|---|
<Type extends DataSet> |
ERTrees.evaluateFeatureImportance(DataSet<Type> data)
Measures the statistics of feature importance from the trees in this
forest.
|
<Type extends DataSet> |
ERTrees.evaluateFeatureImportance(DataSet<Type> data,
TreeFeatureImportanceInference imp)
Measures the statistics of feature importance from the trees in this
forest.
|
<Type extends DataSet> |
TreeFeatureImportanceInference.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
<Type extends DataSet> |
MDI.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
<Type extends DataSet> |
MDA.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
<Type extends DataSet> |
ImportanceByUses.getImportanceStats(TreeLearner model,
DataSet<Type> data) |
Modifier and Type | Method and Description |
---|---|
List<List<DataPoint>> |
ClustererBase.cluster(DataSet dataSet) |
List<List<DataPoint>> |
Clusterer.cluster(DataSet dataSet)
Performs clustering on the given data set.
|
protected double |
PAM.cluster(DataSet data,
boolean doInit,
int[] medioids,
int[] assignments,
List<Double> cacheAccel)
Performs the actual work of PAM.
|
protected double |
CLARA.cluster(DataSet data,
boolean doInit,
int[] medioids,
int[] assignments,
List<Double> cacheAccel) |
List<List<DataPoint>> |
DBSCAN.cluster(DataSet dataSet,
double eps,
int minPts) |
List<List<DataPoint>> |
DBSCAN.cluster(DataSet dataSet,
double eps,
int minPts,
ExecutorService threadpool) |
int[] |
DBSCAN.cluster(DataSet dataSet,
double eps,
int minPts,
ExecutorService threadpool,
int[] designations) |
int[] |
DBSCAN.cluster(DataSet dataSet,
double eps,
int minPts,
int[] designations) |
List<List<DataPoint>> |
ClustererBase.cluster(DataSet dataSet,
ExecutorService threadpool) |
List<List<DataPoint>> |
Clusterer.cluster(DataSet dataSet,
ExecutorService threadpool)
Performs clustering on the given data set.
|
int[] |
PAM.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
OPTICS.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
MeanShift.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
LSDBC.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
HDBSCAN.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
GapStatistic.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
FLAME.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
EMGaussianMixture.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
DBSCAN.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
Clusterer.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations)
Performs clustering on the given data set.
|
List<List<DataPoint>> |
KClustererBase.cluster(DataSet dataSet,
int clusters) |
List<List<DataPoint>> |
KClusterer.cluster(DataSet dataSet,
int clusters)
Performs clustering on the given data set.
|
List<List<DataPoint>> |
DBSCAN.cluster(DataSet dataSet,
int minPts) |
int[] |
PAM.cluster(DataSet dataSet,
int[] designations) |
int[] |
OPTICS.cluster(DataSet dataSet,
int[] designations) |
int[] |
MeanShift.cluster(DataSet dataSet,
int[] designations) |
int[] |
LSDBC.cluster(DataSet dataSet,
int[] designations) |
int[] |
HDBSCAN.cluster(DataSet dataSet,
int[] designations) |
int[] |
GapStatistic.cluster(DataSet dataSet,
int[] designations) |
int[] |
FLAME.cluster(DataSet dataSet,
int[] designations) |
int[] |
EMGaussianMixture.cluster(DataSet dataSet,
int[] designations) |
int[] |
DBSCAN.cluster(DataSet dataSet,
int[] designations) |
int[] |
Clusterer.cluster(DataSet dataSet,
int[] designations)
Performs clustering on the given data set.
|
List<List<DataPoint>> |
KClustererBase.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool) |
List<List<DataPoint>> |
KClusterer.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool)
Performs clustering on the given data set.
|
List<List<DataPoint>> |
DBSCAN.cluster(DataSet dataSet,
int minPts,
ExecutorService threadpool) |
int[] |
PAM.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
KClusterer.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
GapStatistic.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
EMGaussianMixture.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
DBSCAN.cluster(DataSet dataSet,
int minPts,
ExecutorService threadpool,
int[] designations) |
List<List<DataPoint>> |
KClustererBase.cluster(DataSet dataSet,
int lowK,
int highK) |
List<List<DataPoint>> |
KClusterer.cluster(DataSet dataSet,
int lowK,
int highK)
Performs clustering on the given data set.
|
int[] |
PAM.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
KClusterer.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
GapStatistic.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
EMGaussianMixture.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
DBSCAN.cluster(DataSet dataSet,
int minPts,
int[] designations) |
int[] |
CLARA.cluster(DataSet dataSet,
int clusters,
int[] designations) |
List<List<DataPoint>> |
KClustererBase.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool) |
List<List<DataPoint>> |
KClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool)
Performs clustering on the given data set.
|
int[] |
PAM.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
KClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
GapStatistic.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
EMGaussianMixture.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
PAM.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
KClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
GapStatistic.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
EMGaussianMixture.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
protected double |
EMGaussianMixture.cluster(DataSet dataSet,
List<Double> accelCache,
int K,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError) |
protected double |
EMGaussianMixture.clusterCompute(int K,
DataSet dataSet,
int[] assignment,
List<Vec> means,
List<Matrix> covs,
ExecutorService execServ) |
static List<List<DataPoint>> |
ClustererBase.createClusterListFromAssignmentArray(int[] assignments,
DataSet dataSet)
Convenient helper method.
|
static List<DataPoint> |
ClustererBase.getDatapointsFromCluster(int c,
int[] assignments,
DataSet dataSet,
int[] indexFrom)
Gets a list of the datapoints in a data set that belong to the indicated cluster
|
static void |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int[] indices,
DistanceMetric dm,
List<Double> accelCache,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod)
Selects seeds from a data set to use for a clustering algorithm.
|
static void |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int[] indices,
DistanceMetric dm,
List<Double> accelCache,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod,
ExecutorService threadpool)
Selects seeds from a data set to use for a clustering algorithm.
|
static void |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int[] indices,
DistanceMetric dm,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod)
Selects seeds from a data set to use for a clustering algorithm.
|
static void |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int[] indices,
DistanceMetric dm,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod,
ExecutorService threadpool)
Selects seeds from a data set to use for a clustering algorithm.
|
static List<Vec> |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int k,
DistanceMetric dm,
List<Double> accelCache,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod) |
static List<Vec> |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int k,
DistanceMetric dm,
List<Double> accelCache,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod,
ExecutorService threadpool)
Selects seeds from a data set to use for a clustering algorithm.
|
static List<Vec> |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int k,
DistanceMetric dm,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod)
Selects seeds from a data set to use for a clustering algorithm.
|
static List<Vec> |
SeedSelectionMethods.selectIntialPoints(DataSet d,
int k,
DistanceMetric dm,
Random rand,
SeedSelectionMethods.SeedSelection selectionMethod,
ExecutorService threadpool)
Selects seeds from a data set to use for a clustering algorithm.
|
boolean |
EMGaussianMixture.setUsingData(DataSet dataSet) |
boolean |
EMGaussianMixture.setUsingData(DataSet dataSet,
ExecutorService threadpool) |
Modifier and Type | Method and Description |
---|---|
static double[][] |
AbstractClusterDissimilarity.createDistanceMatrix(DataSet dataSet,
ClusterDissimilarity cd)
Creates an upper triangular matrix containing the distance between all
points in the data set.
|
Modifier and Type | Method and Description |
---|---|
double |
NormalizedMutualInformation.evaluate(int[] designations,
DataSet dataSet) |
double |
DunnIndex.evaluate(int[] designations,
DataSet dataSet) |
double |
DaviesBouldinIndex.evaluate(int[] designations,
DataSet dataSet) |
double |
ClusterEvaluationBase.evaluate(int[] designations,
DataSet dataSet) |
double |
ClusterEvaluation.evaluate(int[] designations,
DataSet dataSet)
Evaluates the clustering of the given clustering.
|
double |
AdjustedRandIndex.evaluate(int[] designations,
DataSet dataSet) |
Modifier and Type | Method and Description |
---|---|
double |
SumOfSqrdPairwiseDistances.evaluate(int[] designations,
DataSet dataSet,
int clusterID) |
double |
SoSCentroidDistance.evaluate(int[] designations,
DataSet dataSet,
int clusterID) |
double |
MeanDistance.evaluate(int[] designations,
DataSet dataSet,
int clusterID) |
double |
MeanCentroidDistance.evaluate(int[] designations,
DataSet dataSet,
int clusterID) |
double |
MaxDistance.evaluate(int[] designations,
DataSet dataSet,
int clusterID) |
double |
IntraClusterEvaluation.evaluate(int[] designations,
DataSet dataSet,
int clusterID)
Evaluates the cluster represented by the given list of data points.
|
Modifier and Type | Method and Description |
---|---|
int[] |
SimpleHAC.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
PriorityHAC.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
NNChainHAC.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
DivisiveLocalClusterer.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
DivisiveGlobalClusterer.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
SimpleHAC.cluster(DataSet dataSet,
int[] designations) |
int[] |
PriorityHAC.cluster(DataSet dataSet,
int[] designations) |
int[] |
NNChainHAC.cluster(DataSet dataSet,
int[] designations) |
int[] |
DivisiveLocalClusterer.cluster(DataSet dataSet,
int[] designations) |
int[] |
DivisiveGlobalClusterer.cluster(DataSet dataSet,
int[] designations) |
int[] |
SimpleHAC.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
PriorityHAC.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
NNChainHAC.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
DivisiveLocalClusterer.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
DivisiveGlobalClusterer.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
SimpleHAC.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
PriorityHAC.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
NNChainHAC.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
DivisiveLocalClusterer.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
DivisiveGlobalClusterer.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
SimpleHAC.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
PriorityHAC.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
NNChainHAC.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
DivisiveLocalClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
DivisiveGlobalClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
SimpleHAC.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
PriorityHAC.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
NNChainHAC.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
DivisiveLocalClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
DivisiveGlobalClusterer.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
List<List<DataPoint>> |
NNChainHAC.getClusterDesignations(int clusters,
DataSet data)
Returns the assignment array for that would have been computed for the
previous data set with the desired number of clusters.
|
Modifier and Type | Method and Description |
---|---|
int[] |
XMeans.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
MiniBatchKMeans.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
KMeansPDN.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
KMeans.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
KernelKMeans.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
GMeans.cluster(DataSet dataSet,
ExecutorService threadpool,
int[] designations) |
int[] |
XMeans.cluster(DataSet dataSet,
int[] designations) |
int[] |
MiniBatchKMeans.cluster(DataSet dataSet,
int[] designations) |
int[] |
KMeansPDN.cluster(DataSet dataSet,
int[] designations) |
int[] |
KMeans.cluster(DataSet dataSet,
int[] designations) |
int[] |
KernelKMeans.cluster(DataSet dataSet,
int[] designations) |
int[] |
GMeans.cluster(DataSet dataSet,
int[] designations) |
int[] |
MiniBatchKMeans.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
LloydKernelKMeans.cluster(DataSet dataSet,
int K,
ExecutorService threadpool,
int[] designations) |
int[] |
KMeans.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
ElkanKernelKMeans.cluster(DataSet dataSet,
int clusters,
ExecutorService threadpool,
int[] designations) |
int[] |
MiniBatchKMeans.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
LloydKernelKMeans.cluster(DataSet dataSet,
int K,
int[] designations) |
int[] |
KMeans.cluster(DataSet dataSet,
int clusters,
int[] designations) |
int[] |
ElkanKernelKMeans.cluster(DataSet dataSet,
int clusters,
int[] designations) |
protected double |
ElkanKernelKMeans.cluster(DataSet dataSet,
int k,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool)
This is a helper method where the actual cluster is performed.
|
int[] |
XMeans.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
MiniBatchKMeans.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
KMeansPDN.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
KMeans.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
KernelKMeans.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
GMeans.cluster(DataSet dataSet,
int lowK,
int highK,
ExecutorService threadpool,
int[] designations) |
int[] |
XMeans.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
MiniBatchKMeans.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
KMeansPDN.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
KMeans.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
KernelKMeans.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
int[] |
GMeans.cluster(DataSet dataSet,
int lowK,
int highK,
int[] designations) |
protected double |
XMeans.cluster(DataSet dataSet,
List<Double> accelCache,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights) |
protected double |
NaiveKMeans.cluster(DataSet dataSet,
List<Double> accelCacheInit,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights) |
protected double |
KMeansPDN.cluster(DataSet dataSet,
List<Double> accelCache,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights) |
protected abstract double |
KMeans.cluster(DataSet dataSet,
List<Double> accelCache,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights)
This is a helper method where the actual cluster is performed.
|
protected double |
HamerlyKMeans.cluster(DataSet dataSet,
List<Double> accelCache,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights) |
protected double |
GMeans.cluster(DataSet dataSet,
List<Double> accelCache,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights) |
protected double |
ElkanKMeans.cluster(DataSet dataSet,
List<Double> accelCache,
int k,
List<Vec> means,
int[] assignment,
boolean exactTotal,
ExecutorService threadpool,
boolean returnError,
Vec dataPointWeights) |
Modifier and Type | Method and Description |
---|---|
void |
ZeroMeanTransform.fit(DataSet dataset) |
void |
WhitenedZCA.fit(DataSet dataSet) |
void |
WhitenedPCA.fit(DataSet dataSet) |
void |
UnitVarianceTransform.fit(DataSet d) |
void |
StandardizeTransform.fit(DataSet dataset) |
void |
RemoveAttributeTransform.fit(DataSet data) |
void |
PolynomialTransform.fit(DataSet data) |
void |
PNormNormalization.fit(DataSet data) |
void |
PCA.fit(DataSet dataSet) |
void |
NumericalToHistogram.fit(DataSet dataSet) |
void |
NominalToNumeric.fit(DataSet data) |
void |
LinearTransform.fit(DataSet dataSet) |
void |
JLTransform.fit(DataSet data) |
void |
InverseOfTransform.fit(DataSet data) |
void |
InsertMissingValuesTransform.fit(DataSet data) |
void |
Imputer.fit(DataSet d) |
void |
FastICA.fit(DataSet data) |
void |
DenseSparceTransform.fit(DataSet data) |
void |
DataTransformProcess.fit(DataSet data) |
void |
DataTransform.fit(DataSet data)
Fits this transform to the given dataset.
|
void |
AutoDeskewTransform.fit(DataSet dataSet) |
static Distribution |
WhitenedPCA.guessDimensions(DataSet d) |
static Distribution |
NumericalToHistogram.guessNumberOfBins(DataSet data)
Attempts to guess the number of bins to use
|
static Distribution |
JLTransform.guessProjectedDimension(DataSet d) |
void |
DataTransformProcess.leanTransforms(DataSet dataSet)
Learns the transforms for the given data set.
|
void |
DataTransformProcess.learnApplyTransforms(DataSet dataSet)
Learns the transforms for the given data set.
|
protected void |
RemoveAttributeTransform.setUp(DataSet dataSet,
Set<Integer> categoricalToRemove,
Set<Integer> numericalToRemove)
Sets up the Remove Attribute Transform properly
|
Constructor and Description |
---|
AutoDeskewTransform(DataSet dataSet)
Creates a new deskewing object from the given data set
|
AutoDeskewTransform(DataSet dataSet,
boolean ignorZeros,
List<Double> lambdas)
Creates a new deskewing object from the given data set
|
AutoDeskewTransform(DataSet dataSet,
List<Double> lambdas)
Creates a new deskewing object from the given data set
|
FastICA(DataSet data,
int C)
Creates a new FastICA transform
|
FastICA(DataSet data,
int C,
FastICA.NegEntropyFunc G,
boolean preWhitened)
Creates a new FastICA transform
|
Imputer(DataSet<?> data) |
Imputer(DataSet<?> data,
Imputer.NumericImputionMode mode) |
LinearTransform(DataSet dataSet)
Creates a new Linear Transformation for the input data set so that all
values are in the [0, 1] range.
|
LinearTransform(DataSet dataSet,
double A,
double B)
Creates a new Linear Transformation for the input data set.
|
NominalToNumeric(DataSet dataSet)
Creates a new transform to convert categorical to numeric features for the given dataset
|
NumericalToHistogram(DataSet dataSet)
Creates a new transform which will use O(sqrt(n)) bins for each numeric
feature, where n is the number of data points in the dataset.
|
NumericalToHistogram(DataSet dataSet,
int n)
Creates a new transform which will use the specified number of bins for
each numeric feature.
|
PCA(DataSet dataSet)
Performs PCA analysis using the given data set, so that transformations may be performed on future data points.
|
PCA(DataSet dataSet,
int maxPCs)
Performs PCA analysis using the given data set, so that transformations may be performed on future data points.
|
PCA(DataSet dataSet,
int maxPCs,
double threshold)
Performs PCA analysis using the given data set, so that transformations may be performed on future data points.
|
RemoveAttributeTransform(DataSet dataSet,
Set<Integer> categoricalToRemove,
Set<Integer> numericalToRemove)
Creates a new transform for removing specified features from a data set
|
StandardizeTransform(DataSet dataset)
Creates a new object for standaidizing datasets fit to the given dataset
|
UnitVarianceTransform(DataSet d)
Creates a new object for making datasets unit variance fit to the given
dataset
|
WhitenedPCA(DataSet dataSet)
Creates a new WhitenedPCA.
|
WhitenedPCA(DataSet dataSet,
double regularization)
Creates a new WhitenedPCA, the dimensions will be chosen so that the
subset of dimensions is of full rank.
|
WhitenedPCA(DataSet dataSet,
double regularization,
int dims)
Creates a new WhitenedPCA from the given dataset
|
WhitenedPCA(DataSet dataSet,
int dims)
Creates a new WhitenedPCA.
|
WhitenedZCA(DataSet dataSet)
Creates a new Whitened ZCA transform from the given data.
|
WhitenedZCA(DataSet dataSet,
double regularization)
Creates a new Whitened ZCA transform from the given data.
|
ZeroMeanTransform(DataSet dataset)
Creates a new object for transforming datapoints by centering the data
|
Modifier and Type | Method and Description |
---|---|
void |
SFS.fit(DataSet data) |
void |
SBS.fit(DataSet data) |
void |
ReliefF.fit(DataSet data) |
void |
MutualInfoFS.fit(DataSet data) |
void |
LRS.fit(DataSet data) |
void |
BDS.fit(DataSet data) |
void |
ReliefF.fit(DataSet data,
ExecutorService threadPool) |
protected static double |
SFS.getScore(DataSet workOn,
Object evaluater,
int folds,
Random rand)
The score function for a data set and a learner by cross validation of a
classifier
|
protected static int |
SBS.SBSRemoveFeature(Set<Integer> available,
DataSet dataSet,
Set<Integer> catToRemove,
Set<Integer> numToRemove,
Set<Integer> catSelecteed,
Set<Integer> numSelected,
Object evaluater,
int folds,
Random rand,
int maxFeatures,
double[] PbestScore,
double maxDecrease)
Attempts to remove one feature from the list while maintaining its
accuracy
|
protected static int |
SFS.SFSSelectFeature(Set<Integer> available,
DataSet dataSet,
Set<Integer> catToRemove,
Set<Integer> numToRemove,
Set<Integer> catSelecteed,
Set<Integer> numSelected,
Object evaluater,
int folds,
Random rand,
double[] PbestScore,
int minFeatures)
Attempts to add one feature to the list of features while increasing or
maintaining the current accuracy
|
Modifier and Type | Method and Description |
---|---|
void |
RFF_RBF.fit(DataSet data) |
void |
Nystrom.fit(DataSet dataset) |
void |
KernelPCA.fit(DataSet ds) |
static Distribution |
KernelPCA.guessDimensions(DataSet d) |
Distribution |
RFF_RBF.guessSigma(DataSet d)
Guess the distribution to use for the kernel width term
σ in the RBF kernel being approximated. |
static List<Vec> |
Nystrom.sampleBasisVectors(KernelTrick k,
DataSet dataset,
List<Vec> X,
Nystrom.SamplingMethod method,
int basisSize,
boolean sampleWithReplacment,
Random rand)
Performs sampling of a data set for a subset of the vectors that make a
good set of basis vectors for forming an approximation of a full kernel
space.
|
Constructor and Description |
---|
KernelPCA(KernelTrick k,
DataSet ds,
int dimensions,
int basisSize,
Nystrom.SamplingMethod samplingMethod)
Creates a new Kernel PCA transform object
|
Nystrom(KernelTrick k,
DataSet dataset,
int basisSize,
Nystrom.SamplingMethod method)
Creates a new Nystrom approximation object
|
Nystrom(KernelTrick k,
DataSet dataset,
int basisSize,
Nystrom.SamplingMethod method,
double ridge,
boolean sampleWithReplacment)
Creates a new Nystrom approximation object
|
Modifier and Type | Method and Description |
---|---|
<Type extends DataSet> |
VisualizationTransform.transform(DataSet<Type> d)
Transforms the given data set, returning a dataset of the same type.
|
<Type extends DataSet> |
TSNE.transform(DataSet<Type> d) |
<Type extends DataSet> |
MDS.transform(DataSet<Type> d) |
<Type extends DataSet> |
LargeViz.transform(DataSet<Type> d) |
<Type extends DataSet> |
Isomap.transform(DataSet<Type> d) |
<Type extends DataSet> |
VisualizationTransform.transform(DataSet<Type> d,
ExecutorService ex)
Transforms the given data set, returning a dataset of the same type.
|
<Type extends DataSet> |
TSNE.transform(DataSet<Type> d,
ExecutorService ex) |
<Type extends DataSet> |
MDS.transform(DataSet<Type> d,
ExecutorService ex) |
<Type extends DataSet> |
LargeViz.transform(DataSet<Type> d,
ExecutorService ex) |
<Type extends DataSet> |
Isomap.transform(DataSet<Type> d,
ExecutorService ex) |
Modifier and Type | Method and Description |
---|---|
protected static void |
TSNE.computeP(DataSet d,
ExecutorService ex,
Random rand,
int knn,
int[][] nearMe,
double[][] nearMePij,
DistanceMetric dm,
double perplexity) |
<Type extends DataSet> |
VisualizationTransform.transform(DataSet<Type> d)
Transforms the given data set, returning a dataset of the same type.
|
<Type extends DataSet> |
TSNE.transform(DataSet<Type> d) |
<Type extends DataSet> |
MDS.transform(DataSet<Type> d) |
<Type extends DataSet> |
LargeViz.transform(DataSet<Type> d) |
<Type extends DataSet> |
Isomap.transform(DataSet<Type> d) |
<Type extends DataSet> |
VisualizationTransform.transform(DataSet<Type> d,
ExecutorService ex)
Transforms the given data set, returning a dataset of the same type.
|
<Type extends DataSet> |
TSNE.transform(DataSet<Type> d,
ExecutorService ex) |
<Type extends DataSet> |
MDS.transform(DataSet<Type> d,
ExecutorService ex) |
<Type extends DataSet> |
LargeViz.transform(DataSet<Type> d,
ExecutorService ex) |
<Type extends DataSet> |
Isomap.transform(DataSet<Type> d,
ExecutorService ex) |
Modifier and Type | Method and Description |
---|---|
static Distribution |
SigmoidKernel.guessAlpha(DataSet d)
Guesses a distribution for the α parameter
|
static Distribution |
SigmoidKernel.guessC(DataSet d)
Guesses a distribution for the α parameter
|
static Distribution |
RationalQuadraticKernel.guessC(DataSet d)
Guess the distribution to use for the C parameter.
|
static Distribution |
PolynomialKernel.guessDegree(DataSet d)
Guesses the distribution to use for the degree parameter
|
static Distribution |
PukKernel.guessOmega(DataSet d)
Guesses the distribution to use for the ω parameter
|
static Distribution |
RBFKernel.guessSigma(DataSet d)
Guess the distribution to use for the kernel width term
σ in the RBF kernel. |
static Distribution |
PukKernel.guessSigma(DataSet d)
Guesses the distribution to use for the λ parameter
|
Distribution |
GeneralRBFKernel.guessSigma(DataSet d)
Guess the distribution to use for the kernel width term
σ in the General RBF kernel. |
static Distribution |
GeneralRBFKernel.guessSigma(DataSet d,
DistanceMetric dist)
Guess the distribution to use for the kernel width term
σ in the General RBF kernel. |
Modifier and Type | Method and Description |
---|---|
boolean |
MultivariateDistributionSkeleton.setUsingData(DataSet dataSet) |
boolean |
MultivariateDistribution.setUsingData(DataSet dataSet)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
boolean |
MultivariateDistributionSkeleton.setUsingData(DataSet dataSet,
ExecutorService threadpool) |
boolean |
MultivariateDistribution.setUsingData(DataSet dataSet,
ExecutorService threadpool)
Sets the parameters of the distribution to attempt to fit the given list of data points.
|
Modifier and Type | Method and Description |
---|---|
static <Type extends DataSet<Type>> |
JSATData.FloatStorageMethod.getMethod(DataSet<Type> data,
JSATData.FloatStorageMethod method) |
static <Type extends DataSet<Type>> |
JSATData.writeData(DataSet<Type> dataset,
OutputStream outRaw)
This method writes out a JSAT dataset to a binary format that can be read
in again later, and could be read in other languages.
The format that is used will understand both ClassificationDataSet and RegressionDataSet datasets as
special cases, and will store the target values in the binary file. |
static <Type extends DataSet<Type>> |
JSATData.writeData(DataSet<Type> dataset,
OutputStream outRaw,
JSATData.FloatStorageMethod fpStore)
This method writes out a JSAT dataset to a binary format that can be read
in again later, and could be read in other languages.
The format that is used will understand both ClassificationDataSet and RegressionDataSet datasets as
special cases, and will store the target values in the binary file. |
Modifier and Type | Method and Description |
---|---|
static DataSet<?> |
JSATData.load(InputStream inRaw)
This loads a JSAT dataset from an input stream, and will not do any of
its own buffering.
|
protected static DataSet<?> |
JSATData.load(InputStream inRaw,
boolean forceAsStandard)
This loads a JSAT dataset from an input stream, and will not do any of
its own buffering.
|
Modifier and Type | Method and Description |
---|---|
static <Type extends DataSet<Type>> |
JSATData.FloatStorageMethod.getMethod(DataSet<Type> data,
JSATData.FloatStorageMethod method) |
static void |
CSV.write(DataSet<?> data,
Path path)
Writes out the given dataset as a CSV file.
|
static void |
CSV.write(DataSet<?> data,
Path path,
char delimiter)
Writes out the given dataset as a CSV file.
|
static void |
CSV.write(DataSet<?> data,
Writer writer)
Writes out the given dataset as a CSV file.
|
static void |
CSV.write(DataSet<?> data,
Writer writer,
char delimiter)
Writes out the given dataset as a CSV file.
|
static <Type extends DataSet<Type>> |
JSATData.writeData(DataSet<Type> dataset,
OutputStream outRaw)
This method writes out a JSAT dataset to a binary format that can be read
in again later, and could be read in other languages.
The format that is used will understand both ClassificationDataSet and RegressionDataSet datasets as
special cases, and will store the target values in the binary file. |
static <Type extends DataSet<Type>> |
JSATData.writeData(DataSet<Type> dataset,
OutputStream outRaw,
JSATData.FloatStorageMethod fpStore)
This method writes out a JSAT dataset to a binary format that can be read
in again later, and could be read in other languages.
The format that is used will understand both ClassificationDataSet and RegressionDataSet datasets as
special cases, and will store the target values in the binary file. |
Modifier and Type | Method and Description |
---|---|
static Vec |
MatrixStatistics.covarianceDiag(Vec means,
DataSet dataset)
Computes the weighted diagonal of the covariance matrix, which is the
standard deviations of the columns of all values.
|
static void |
MatrixStatistics.covarianceDiag(Vec means,
Vec diag,
DataSet dataset)
Computes the weighted diagonal of the covariance matrix, which is the
standard deviations of the columns of all values.
|
static Matrix |
MatrixStatistics.covarianceMatrix(Vec mean,
DataSet dataSet)
Computes the weighted covariance matrix of the data set
|
static void |
MatrixStatistics.covarianceMatrix(Vec mean,
DataSet dataSet,
Matrix covariance)
Computes the weighted covariance matrix of the given data set.
|
static void |
MatrixStatistics.covarianceMatrix(Vec mean,
DataSet dataSet,
Matrix covariance,
double sumOfWeights,
double sumOfSquaredWeights)
Computes the weighted covariance matrix of the given data set.
|
static Vec |
MatrixStatistics.meanVector(DataSet dataSet)
Computes the weighted mean of the given data set.
|
static void |
MatrixStatistics.meanVector(Vec mean,
DataSet dataSet)
Computes the weighted mean of the data set
|
Modifier and Type | Method and Description |
---|---|
abstract void |
TrainableDistanceMetric.train(DataSet dataSet)
Trains this metric on the given data set
|
void |
NormalizedEuclideanDistance.train(DataSet dataSet) |
void |
MahalanobisDistance.train(DataSet dataSet) |
abstract void |
TrainableDistanceMetric.train(DataSet dataSet,
ExecutorService threadpool)
Trains this metric on the given data set
|
void |
NormalizedEuclideanDistance.train(DataSet dataSet,
ExecutorService threadpool) |
void |
MahalanobisDistance.train(DataSet dataSet,
ExecutorService threadpool) |
static void |
TrainableDistanceMetric.trainIfNeeded(DistanceMetric dm,
DataSet dataset)
Static helper method for training a distance metric only if it is needed.
|
static void |
TrainableDistanceMetric.trainIfNeeded(DistanceMetric dm,
DataSet dataset,
ExecutorService threadpool)
Static helper method for training a distance metric only if it is needed.
|
Modifier and Type | Method and Description |
---|---|
int |
RandomSearch.autoAddParameters(DataSet data)
This method will automatically populate the search space with parameters
based on which Parameter objects return non-null distributions.
Note, using this method with Cross Validation has the potential for over-estimating the accuracy of results if the data set is actually used to for parameter guessing. It is possible for this method to return 0, indicating that no default parameters could be found. |
int |
GridSearch.autoAddParameters(DataSet data)
This method will automatically populate the search space with parameters
based on which Parameter objects return non-null distributions.
|
int |
GridSearch.autoAddParameters(DataSet data,
int paramsEach)
This method will automatically populate the search space with parameters
based on which Parameter objects return non-null distributions.
Note, using this method with Cross Validation has the potential for over-estimating the accuracy of results if the data set is actually used to for parameter guessing. |
Distribution |
IntParameter.getGuess(DataSet data)
This method allows one to obtain a distribution that represents a
reasonable "guess" at the range of values that would work for this
parameter.
|
Distribution |
DoubleParameter.getGuess(DataSet data)
This method allows one to obtain a distribution that represents a
reasonable "guess" at the range of values that would work for this
parameter.
|
Modifier and Type | Class and Description |
---|---|
class |
RegressionDataSet
A RegressionDataSet is a data set specifically for the task of performing regression.
|
Modifier and Type | Method and Description |
---|---|
static Distribution |
KernelRidgeRegression.guessLambda(DataSet d)
Guesses the distribution to use for the λ parameter
|
Modifier and Type | Method and Description |
---|---|
DataSet |
TextDataLoader.getDataSet()
Returns a new data set containing the original data points that were
loaded with this loader.
|
DataSet |
HashedTextDataLoader.getDataSet()
Returns a new data set containing the original data points that were
loaded with this loader.
|
Modifier and Type | Method and Description |
---|---|
void |
OnlineLDAsvi.model(DataSet dataSet,
int topics)
Fits the LDA model against the given data set
|
void |
OnlineLDAsvi.model(DataSet dataSet,
int topics,
ExecutorService ex)
Fits the LDA model against the given data set
|
Copyright © 2017. All rights reserved.