Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach | Zendy

Mohsen Biglari | Zendy; Fatemeh Mirzaei | Zendy; Hamid Hassanpour | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Author(s) -

Mohsen Biglari,

Fatemeh Mirzaei,

Hamid Hassanpour

Publication year - 2020

Publication title -

international journal of engineering. transactions b: applications

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.213

H-Index - 17

ISSN - 1728-144X

DOI - 10.5829/ije.2020.33.02b.05

Subject(s) - pattern recognition (psychology) , computer science , feature selection , cluster analysis , feature (linguistics) , heuristic , set (abstract data type) , sample (material) , data mining , selection (genetic algorithm) , data set , artificial intelligence , sample size determination , function (biology) , mathematics , statistics , programming language , philosophy , linguistics , chemistry , chromatography , evolutionary biology , biology

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering algorithms perform poorly on this kind of data. In this paper, a novel hybrid feature selection technique is proposed, which can reduce drastically the number of features with an acceptable loss of prediction accuracy. The proposed approach operates in multiple stages, starting by removing irrelevant features with a low discrimination power, and then eliminating the ones with low variation range. Afterward, among each set of features with high cross-correlation, a single feature that is strongly correlated with the output is kept. Finally, a Genetic Algorithm with a customized cost function is provided to select a small subset of the remainder of features. To show the effectiveness of the proposed approach, we investigated two challenging case studies with sample set sizes of about 100 and the number of features larger than 1000. The experimental results look promising as they showed a percentage decrease of more than 99% in the number of features, with a prediction accuracy of more than 92%.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research