public class SFS extends Object implements DataTransform
Constructor and Description |
---|
SFS(int minFeatures,
int maxFeatures,
ClassificationDataSet dataSet,
Classifier evaluater,
int folds,
double maxIncrease)
Performs SFS feature selection for a classification problem
|
SFS(int minFeatures,
int maxFeatures,
Classifier evaluater,
double maxIncrease)
Performs SFS feature selection for a classification problem
|
SFS(int minFeatures,
int maxFeatures,
RegressionDataSet dataSet,
Regressor regressor,
int folds,
double maxIncrease)
Performs SFS feature selection for a regression problem
|
SFS(int minFeatures,
int maxFeatures,
Regressor regressor,
double maxIncrease)
Creates SFS feature selection for a regression problem
|
Modifier and Type | Method and Description |
---|---|
protected static void |
addFeature(int curBest,
int nCat,
Set<Integer> catF,
Set<Integer> numF) |
SFS |
clone() |
void |
fit(DataSet data)
Fits this transform to the given dataset.
|
int |
getFolds() |
int |
getMaxFeatures()
Returns the maximum number of features to find
|
double |
getMaxIncrease() |
int |
getMinFeatures()
Returns the minimum number of features to find
|
protected static double |
getScore(DataSet workOn,
Object evaluater,
int folds,
Random rand)
The score function for a data set and a learner by cross validation of a
classifier
|
Set<Integer> |
getSelectedCategorical()
Returns a copy of the set of categorical features selected by the search
algorithm
|
Set<Integer> |
getSelectedNumerical()
Returns a copy of the set of numerical features selected by the search
algorithm.
|
protected static void |
removeFeature(int feature,
int nCat,
Set<Integer> catF,
Set<Integer> numF) |
void |
setFolds(int folds)
Sets the number of folds to use for cross validation when estimating the error rate
|
void |
setMaxFeatures(int maxFeatures)
Sets the maximum number of features that must be selected
|
void |
setMaxIncrease(double maxIncrease)
Sets the maximum allowable the maximum tolerable increase in error when a
feature is added
|
void |
setMinFeatures(int minFeatures)
Sets the minimum number of features that must be selected
|
protected static int |
SFSSelectFeature(Set<Integer> available,
DataSet dataSet,
Set<Integer> catToRemove,
Set<Integer> numToRemove,
Set<Integer> catSelecteed,
Set<Integer> numSelected,
Object evaluater,
int folds,
Random rand,
double[] PbestScore,
int minFeatures)
Attempts to add one feature to the list of features while increasing or
maintaining the current accuracy
|
DataPoint |
transform(DataPoint dp)
Returns a new data point that is a transformation of the original data
point.
|
public SFS(int minFeatures, int maxFeatures, Classifier evaluater, double maxIncrease)
minFeatures
- the minimum number of features to findmaxFeatures
- the maximum number of features to findevaluater
- the classifier to use in determining accuracy given a
feature subsetmaxIncrease
- the maximum tolerable increase in error when a feature
is addedpublic SFS(int minFeatures, int maxFeatures, ClassificationDataSet dataSet, Classifier evaluater, int folds, double maxIncrease)
minFeatures
- the minimum number of features to findmaxFeatures
- the maximum number of features to finddataSet
- the data set to perform feature selection onevaluater
- the classifier to use in determining accuracy given a
feature subsetfolds
- the number of cross validation folds to use in selectionmaxIncrease
- the maximum tolerable increase in error when a feature
is addedpublic SFS(int minFeatures, int maxFeatures, Regressor regressor, double maxIncrease)
minFeatures
- the minimum number of features to findmaxFeatures
- the maximum number of features to findregressor
- the regressor to use in determining accuracy given a
feature subsetmaxIncrease
- the maximum tolerable increase in error when a feature
is addedpublic SFS(int minFeatures, int maxFeatures, RegressionDataSet dataSet, Regressor regressor, int folds, double maxIncrease)
minFeatures
- the minimum number of features to findmaxFeatures
- the maximum number of features to finddataSet
- the data set to perform feature selection onregressor
- the regressor to use in determining accuracy given a
feature subsetfolds
- the number of cross validation folds to use in selectionmaxIncrease
- the maximum tolerable increase in error when a feature
is addedpublic void fit(DataSet data)
DataTransform
FailedToFitException
exception may be
thrown.fit
in interface DataTransform
data
- the dataset to fir this transform toprotected static void addFeature(int curBest, int nCat, Set<Integer> catF, Set<Integer> numF)
curBest
- the value of curBestnCat
- the value of nCatcatF
- the value of catFnumF
- the value of numFprotected static void removeFeature(int feature, int nCat, Set<Integer> catF, Set<Integer> numF)
feature
- the value of featurenCat
- the value of nCatcatF
- the value of catFnumF
- the value of numFpublic DataPoint transform(DataPoint dp)
DataTransform
transform
in interface DataTransform
dp
- the data point to apply a transformation topublic SFS clone()
clone
in interface DataTransform
clone
in class Object
public Set<Integer> getSelectedCategorical()
public Set<Integer> getSelectedNumerical()
protected static int SFSSelectFeature(Set<Integer> available, DataSet dataSet, Set<Integer> catToRemove, Set<Integer> numToRemove, Set<Integer> catSelecteed, Set<Integer> numSelected, Object evaluater, int folds, Random rand, double[] PbestScore, int minFeatures)
available
- the set of available features from [0, n) to consider
for addingdataSet
- the original data set to perform feature selection fromcatToRemove
- the current set of categorical features to removenumToRemove
- the current set of numerical features to removecatSelecteed
- the current set of categorical features we are keepingnumSelected
- the current set of numerical features we are keepingevaluater
- the classifier or regressor to perform evaluations withfolds
- the number of cross validation folds to determine performancerand
- the source of randomnessPbestScore
- an array to behave as a pointer to the best score seen
so farminFeatures
- the minimum number of features neededprotected static double getScore(DataSet workOn, Object evaluater, int folds, Random rand)
workOn
- the transformed data set to test from with cross validationevaluater
- the learning algorithm to usefolds
- the number of cross validation folds to performrand
- the source of randomnesspublic void setMaxIncrease(double maxIncrease)
maxIncrease
- the maximum allowable the maximum tolerable increase
in error when a feature is addedpublic double getMaxIncrease()
public void setMinFeatures(int minFeatures)
minFeatures
- the minimum number of features to learnpublic int getMinFeatures()
public void setMaxFeatures(int maxFeatures)
maxFeatures
- the maximum number of features to findpublic int getMaxFeatures()
public void setFolds(int folds)
folds
- the number of folds to use for cross validation when estimating the error ratepublic int getFolds()
Copyright © 2017. All rights reserved.