Premium
Nonparametric regression with missing data
Author(s) -
Efromovich Sam
Publication year - 2014
Publication title -
wiley interdisciplinary reviews: computational statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.693
H-Index - 38
eISSN - 1939-0068
pISSN - 1939-5108
DOI - 10.1002/wics.1303
Subject(s) - missing data , imputation (statistics) , estimator , nonparametric statistics , statistics , conditional expectation , nonparametric regression , mathematics , regression analysis , regression , econometrics
Optimal estimation of a regression function, when either the response or the predictor may be missed at random, is considered. Missing at random ( MAR ) means that the conditional probability of missing, given response and predictor, does not depend on a variable whose values may be missed. Mean integrated squared error ( MISE ) is the used statistical criteria, and a nonparametric approach implies that no assumption about shape of the regression function is made. It is shown that optimal estimation depends on which variable, the response or the predictor, is missed. For a setting with missed responses, optimal estimation is based only on complete cases of observations and incomplete ones can be ignored. For a setting with missed predictors, optimal estimation is based on all cases, both complete and incomplete, and the procedure includes estimation of the conditional probability of missing the predictor given the response. Proposed estimators are completely data‐driven, do not involve imputation of missing values, and adapt to missing mechanism and smoothness of an estimated regression function. Theoretical results are complemented by the analysis of a credit score survey data. WIREs Comput Stat 2014, 6:265–275. doi: 10.1002/wics.1303 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Nonparametric Methods