Premium
Group variable selection via convex log‐exp‐sum penalty with application to a breast cancer survivor study
Author(s) -
Geng Zhigeng,
Wang Sijian,
Yu Menggang,
Monahan Patrick O.,
Champion Victoria,
Wahba Grace
Publication year - 2015
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.12230
Subject(s) - covariate , selection (genetic algorithm) , consistency (knowledge bases) , group selection , breast cancer , group (periodic table) , mathematics , coordinate descent , feature selection , variable (mathematics) , mathematical optimization , convexity , statistics , computer science , medicine , cancer , machine learning , discrete mathematics , mathematical analysis , chemistry , organic chemistry , financial economics , economics
Summary In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non‐convex. Such a non‐convexity may require extra numerical effort. In this article, we propose a novel Log‐Exp‐Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group‐level coordinate descent algorithm to fit the model. We also derive non‐asymptotic error bounds and asymptotic group selection consistency for our method in the high‐dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.