A New Evaluation Measure for Feature Subset Selection with Genetic Algorithm
Author(s) -
Saptarsi Goswami,
Sourav Saha,
Subhayu Chakravorty,
Amlan Chakrabarti,
Basabi Chakraborty
Publication year - 2015
Publication title -
international journal of intelligent systems and applications
Language(s) - English
Resource type - Journals
eISSN - 2074-9058
pISSN - 2074-904X
DOI - 10.5815/ijisa.2015.10.04
Subject(s) - computer science , feature selection , cardinality (data modeling) , measure (data warehouse) , minimum redundancy feature selection , preprocessor , fitness function , redundancy (engineering) , feature (linguistics) , data mining , artificial intelligence , pattern recognition (psychology) , relevance (law) , selection (genetic algorithm) , data pre processing , machine learning , genetic algorithm , algorithm , linguistics , philosophy , political science , law , operating system
Feature selection is one of the most important preprocessing steps for a data mining, pattern recognition or machine learning problem. Finding an optimal subset of features, among all the combinations is a NP-Complete problem. Lot of research has been done in feature selection. However, as the sizes of the datasets are increasing and optimality is a subjective notion, further research is needed to find better techniques. In this paper, a genetic algorithm based feature subset selection method has been proposed with a novel feature evaluation measure as the fitness function. The evaluation measure is different in three primary ways a) It considers the information content of the features apart from relevance with respect to the target b) The redundancy is considered only when it is over a threshold value c) There is lesser penalization towards cardinality of the subset. As the measure accepts value of few parameters, this is available for tuning as per the need of the particular problem domain. Experiments conducted over 21 well known publicly available datasets reveal superior performance. Hypothesis testing for the accuracy improvement is found to be statistically significant.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom