
Subsumption is a Novel Feature Reduction Strategy for High Dimensionality Datasets
Author(s) -
Donald C. Wunsch,
Daniel B. Hier
Publication year - 2022
Publication title -
european scientific journal
Language(s) - English
Resource type - Journals
eISSN - 1857-7881
pISSN - 1857-7431
DOI - 10.19044/esj.2022.v18n4p20
Subject(s) - dimensionality reduction , artificial intelligence , principal component analysis , feature (linguistics) , computer science , pattern recognition (psychology) , reduction (mathematics) , machine learning , mathematics , philosophy , linguistics , geometry
High dataset dimensionality poses challenges for machine learning classifiers because of high computational costs and the adverse consequences of redundant features. Feature reduction is an attractive remedy to high dimensionality. Three different feature reduction strategies (subsumption, Relief F, and principal component analysis) were evaluated using four machine learning classifiers on a high dimension dataset with 474 unique features, 20 diagnoses, and 364 instances. All three feature reduction strategies proved capable of significant feature reduction while maintaining classification accuracy. At high levels of feature reduction, the principal components strategy outperformed Relief F and subsumption. Subsumption is a novel strategy for feature reduction if features are organized in a hierarchical ontology.