Premium
Decomposition feature selection with applications in detecting correlated biomarkers of bipolar disorders
Author(s) -
Huang Hailin,
Li Yuanzhang,
Liang Hua,
Wu Colin O.
Publication year - 2019
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.8317
Subject(s) - covariate , feature selection , lasso (programming language) , selection (genetic algorithm) , computer science , disjoint sets , uncorrelated , feature (linguistics) , outcome (game theory) , artificial intelligence , data mining , pattern recognition (psychology) , mathematics , statistics , machine learning , linguistics , philosophy , mathematical economics , combinatorics , world wide web
Feature selection is an important initial step of exploratory analysis in biomedical studies. Its main objective is to eliminate the covariates that are uncorrelated with the outcome. For highly correlated covariates, traditional feature selection methods, such as the Lasso, tend to select one of them and eliminate the others, although some of the eliminated ones are still scientifically valuable. To alleviate this drawback, we propose a feature selection method based on covariate space decomposition, referred herein as the “Decomposition Feature Selection” (DFS), and show that this method can lead to scientifically meaningful results in studies with correlated high dimensional data. The DFS consists of two steps: (i) decomposing the covariate space into disjoint subsets such that each of the subsets contains only uncorrelated covariates and (ii) identifying significant predictors by traditional feature selection within each covariate subset. We demonstrate through simulation studies that the DFS has superior practical performance over the Lasso type methods when multiple highly correlated covariates need to be retained. Application of the DFS is demonstrated through a study of bipolar disorders with correlated biomarkers.