Premium
Multiple Hypothesis Testing for Variable Selection
Author(s) -
Rohart Florian
Publication year - 2016
Publication title -
australian and new zealand journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.434
H-Index - 41
eISSN - 1467-842X
pISSN - 1369-1473
DOI - 10.1111/anzs.12157
Subject(s) - lasso (programming language) , false discovery rate , feature selection , mathematics , multiple comparisons problem , scad , variable (mathematics) , extension (predicate logic) , statistical hypothesis testing , context (archaeology) , selection (genetic algorithm) , sample size determination , algorithm , computer science , statistics , data mining , machine learning , mathematical analysis , psychology , paleontology , biochemistry , chemistry , psychiatry , biology , world wide web , myocardial infarction , gene , programming language
Summary We propose two new procedures based on multiple hypothesis testing for correct support estimation in high‐dimensional sparse linear models. We conclusively prove that both procedures are powerful and do not require the sample size to be large. The first procedure tackles the atypical setting of ordered variable selection through an extension of a testing procedure previously developed in the context of a linear hypothesis. The second procedure is the main contribution of this paper. It enables data analysts to perform support estimation in the general high‐dimensional framework of non‐ordered variable selection. A thorough simulation study and applications to real datasets using the R package mht shows that our non‐ordered variable procedure produces excellent results in terms of correct support estimation as well as in terms of mean square errors and false discovery rate, when compared to common methods such as the Lasso, the SCAD penalty, forward regression or the false discovery rate procedure (FDR).