public class ClassificationDataSet extends DataSet<ClassificationDataSet>
Modifier and Type | Field and Description |
---|---|
protected IntList |
category |
protected List<DataPoint> |
datapoints |
protected CategoricalData |
predicting
The categories for the predicted value
|
categories, columnVecCache, numericalVariableNames, numNumerVals
Constructor and Description |
---|
ClassificationDataSet(DataSet dataSet,
int predicting)
Creates a new data set for classification problems.
|
ClassificationDataSet(int numerical,
CategoricalData[] categories,
CategoricalData predicting)
Creates a new, empty, data set for classification problems.
|
ClassificationDataSet(List<DataPoint> data,
int predicting)
Creates a new data set for classification problems from the given list of data points.
|
ClassificationDataSet(List<DataPointPair<Integer>> data,
CategoricalData predicting)
Creates a new data set for classification problems from the given list of data points.
|
Modifier and Type | Method and Description |
---|---|
void |
addDataPoint(DataPoint dp,
int classification)
Creates a new data point and add it
|
void |
addDataPoint(Vec v,
int classification)
Creates a new data point with no categorical variables and adds it to
this data set.
|
void |
addDataPoint(Vec v,
int[] classes,
int classification)
Creates a new data point and adds it to this data set.
|
void |
addDataPoint(Vec v,
int[] classes,
int classification,
double weight)
Creates a new data point and add its to this data set.
|
void |
addDataPoint(Vec v,
int classification,
double weight)
Creates a new data point with no categorical variables and adds it to
this data set.
|
int |
classSampleCount(int targetClass)
Returns the number of data points that belong to the specified class,
irrespective of the weights of the individual points.
|
static ClassificationDataSet |
comineAllBut(List<ClassificationDataSet> list,
int exception)
A helper method meant to be used with
DataSet.cvSet(int) , this combines all
classification data sets in a given list, but holding out the indicated list. |
List<DataPointPair<Integer>> |
getAsDPPList()
Returns the data set as a list of
DataPointPair . |
List<DataPointPair<Double>> |
getAsFloatDPPList()
Returns the data set as a list of
DataPointPair . |
int |
getClassSize()
Returns the number of target classes in this classification data set.
|
DataPoint |
getDataPoint(int i)
Returns the i'th data point from the data set
|
int |
getDataPointCategory(int i)
Returns the integer value corresponding to the true category of the i'th data point.
|
DataPointPair<Integer> |
getDataPointPair(int i)
Returns the i'th data point from the data set, paired with the integer indicating its true class
|
CategoricalData |
getPredicting() |
double[] |
getPriors()
Computes the prior probabilities of each class, and returns an array containing the values.
|
List<DataPoint> |
getSamples(int category)
Returns the list of all examples that belong to the given category.
|
int |
getSampleSize()
Returns the number of data points in this data set
|
Vec |
getSampleVariableVector(int category,
int n)
This method is a counter part to
DataSet.getNumericColumn(int) . |
protected ClassificationDataSet |
getSubset(List<Integer> indicies)
Creates a new dataset that is a subset of this dataset.
|
ClassificationDataSet |
getTwiceShallowClone()
Returns a new version of this data set that is of the same type, and
contains a different listing pointing to shallow data point copies.
|
void |
setDataPoint(int i,
DataPoint dp)
Replaces an already existing data point with the one given.
|
ClassificationDataSet |
shallowClone()
Returns a new version of this data set that is of the same type, and
contains a different list pointing to the same data points.
|
List<ClassificationDataSet> |
stratSet(int folds,
Random rnd) |
applyTransform, applyTransform, applyTransform, applyTransform, countMissingValues, cvSet, cvSet, getCategories, getCategoryName, getColumnMeanVariance, getDataMatrix, getDataMatrixView, getDataPointIterator, getDataPoints, getDataVectors, getDataWeights, getMissingDropped, getNumCategoricalVars, getNumericColumn, getNumericColumns, getNumericColumns, getNumericName, getNumFeatures, getNumNumericalVars, getOnlineColumnStats, getOnlineDenseStats, getSparsityStats, randomSplit, randomSplit, replaceNumericFeatures, setNumericName
protected CategoricalData predicting
protected IntList category
public ClassificationDataSet(DataSet dataSet, int predicting)
dataSet
- the source data setpredicting
- the categorical attribute to use as the target classpublic ClassificationDataSet(List<DataPoint> data, int predicting)
data
- the list of data points for the problem.predicting
- the categorical attribute to use as the target classpublic ClassificationDataSet(List<DataPointPair<Integer>> data, CategoricalData predicting)
data
- the list of data points, paired with their class valuespredicting
- the information about the target classpublic ClassificationDataSet(int numerical, CategoricalData[] categories, CategoricalData predicting)
numerical
- the number of numerical attributes for the problemcategories
- the information about each categorical variable in the problem.predicting
- the information about the target classpublic int getClassSize()
getPredicting()
.
getNumOfCategories()
public static ClassificationDataSet comineAllBut(List<ClassificationDataSet> list, int exception)
DataSet.cvSet(int)
, this combines all
classification data sets in a given list, but holding out the indicated list.list
- a list of data setsexception
- the one data set in the list NOT to combine into one filepublic DataPoint getDataPoint(int i)
getDataPoint
in class DataSet<ClassificationDataSet>
i
- the i'th data point in this setpublic DataPointPair<Integer> getDataPointPair(int i)
i
- the i'th data point in this setpublic void setDataPoint(int i, DataPoint dp)
DataSet
setDataPoint
in class DataSet<ClassificationDataSet>
i
- the i'th dataPoint to set.dp
- the data point to set at the specified indexpublic int getDataPointCategory(int i)
i
- the i'th data point.IndexOutOfBoundsException
- if i is not a valid index into the data set.protected ClassificationDataSet getSubset(List<Integer> indicies)
DataSet
getSubset
in class DataSet<ClassificationDataSet>
indicies
- the indices of data points to insert into the new
dataset, and will be placed in the order listed.public List<ClassificationDataSet> stratSet(int folds, Random rnd)
public void addDataPoint(Vec v, int[] classes, int classification)
v
- the numerical values for the data pointclasses
- the categorical values for the data pointclassification
- the true class value for the data pointIllegalArgumentException
- if the given values are inconsistent with the data this class stores.public void addDataPoint(Vec v, int classification)
v
- the numerical values for the data pointclassification
- the true class value for the data pointIllegalArgumentException
- if the given values are inconsistent with the data this class stores.public void addDataPoint(Vec v, int classification, double weight)
v
- the numerical values for the data pointclassification
- the true class value for the data pointweight
- the weight value to give to the data pointIllegalArgumentException
- if the given values are inconsistent with the data this class stores.public void addDataPoint(Vec v, int[] classes, int classification, double weight)
v
- the numerical values for the data pointclasses
- the categorical values for the data pointclassification
- the true class value for the data pointweight
- the weight value to give to the data pointIllegalArgumentException
- if the given values are inconsistent with the data this class stores.public void addDataPoint(DataPoint dp, int classification)
dp
- the data point to add to this setclassification
- the label for this data pointpublic List<DataPoint> getSamples(int category)
category
- the category desiredpublic Vec getSampleVariableVector(int category, int n)
DataSet.getNumericColumn(int)
. Instead of returning all
values for a given attribute, all values for the attribute that are members of a specific
class are returned.category
- the category desiredn
- the n'th numerical variablepublic CategoricalData getPredicting()
CategoricalData
object for the variable that is to be predictedpublic List<DataPointPair<Integer>> getAsDPPList()
DataPointPair
.
Each data point is paired with it's true class value.
Altering the data points will effect the data set.
Altering the list will not. getDataPoint(int)
public List<DataPointPair<Double>> getAsFloatDPPList()
DataPointPair
.
Each data point is paired with it's true class value, which is stored in a double.
Altering the data points will effect the data set.
Altering the list will not. getDataPoint(int)
public double[] getPriors()
public int classSampleCount(int targetClass)
targetClass
- the target classpublic int getSampleSize()
DataSet
getSampleSize
in class DataSet<ClassificationDataSet>
public ClassificationDataSet shallowClone()
DataSet
shallowClone
in class DataSet<ClassificationDataSet>
public ClassificationDataSet getTwiceShallowClone()
DataSet
getTwiceShallowClone
in class DataSet<ClassificationDataSet>
Copyright © 2017. All rights reserved.