Variable selection in the presence of missing data: imputation‐based methods | Zendy

Zhao Yize | Zendy; Long Qi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Variable selection in the presence of missing data: imputation‐based methods

Author(s) -

Zhao Yize,

Long Qi

Publication year - 2017

Publication title -

wiley interdisciplinary reviews: computational statistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.693

H-Index - 38

eISSN - 1939-0068

pISSN - 1939-5108

DOI - 10.1002/wics.1402

Subject(s) - missing data , imputation (statistics) , feature selection , computer science , resampling , data mining , variable (mathematics) , selection (genetic algorithm) , statistics , machine learning , artificial intelligence , mathematics , mathematical analysis

Variable selection plays an essential role in regression analysis as it identifies important variables that are associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid and used under the assumptions of missing at random and missing completely at random, largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combines variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under‐developed and offers fertile ground for further research. WIREs Comput Stat 2017, 9:e1402. doi: 10.1002/wics.1402 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research