Two tales of variable selection for high dimensional regression: Screening and model building | Zendy

Liu Cong | Zendy; Shi Tao | Zendy; Lee Yoonkyung | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Two tales of variable selection for high dimensional regression: Screening and model building

Author(s) -

Liu Cong,

Shi Tao,

Lee Yoonkyung

Publication year - 2014

Publication title -

statistical analysis and data mining: the asa data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 33

eISSN - 1932-1872

pISSN - 1932-1864

DOI - 10.1002/sam.11219

Subject(s) - lasso (programming language) , feature selection , ranking (information retrieval) , computer science , selection (genetic algorithm) , regression , regression analysis , machine learning , variable (mathematics) , independence (probability theory) , relevance (law) , confounding , model selection , artificial intelligence , statistics , mathematics , mathematical analysis , political science , law , world wide web

Abstract Variable selection plays an important role in high‐dimensional regression problems where a large number of variables are given as potential predictors of a response of interest. Typically, it arises at two stages of statistical modeling, namely screening and formal model building, with different goals. Screening aims at filtering out irrelevant variables prior to model building where a formal description of a functional relation between the variables screened for relevance and the response is sought. Accordingly, proper comparison of variable selection methods calls for evaluation criteria that reflect the differential goals: accuracy in ranking order of variables for screening and prediction accuracy for formal modeling. Without delineating the difference in the two aspects, confounding comparisons of various screening and selection methods have often been made in the literature, which may lead to misleading conclusions. In this paper, we present comprehensive numerical studies for comparison of four commonly used screening and selection procedures: correlation screening (also known as sure independence screening), forward selection, LASSO and SCAD . By clearly differentiating screening and model building, we highlight the situations where the performance of these procedures might differ. In addition, we propose a new method for cross‐validation for LASSO . Furthermore, we discuss connections to relevant comparison studies that appeared in the recent literature to clarify different findings and conclusions.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore