Premium
Strategies for imputing missing covariates in accelerated failure time models
Author(s) -
Qi Lihong,
Wang YingFang,
Chen Rongqi,
Siddique Juned,
Robbins John,
He Yulei
Publication year - 2018
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.7809
Subject(s) - covariate , missing data , imputation (statistics) , computer science , conditional probability distribution , parametric statistics , econometrics , joint probability distribution , multivariate statistics , marginal distribution , observational study , data mining , statistics , machine learning , mathematics , random variable
Missing covariates often occur in biomedical studies with survival outcomes. Multiple imputation via chained equations (MICE) is a semi‐parametric and flexible approach that imputes multivariate data by a series of conditional models, one for each incomplete variable. When applying MICE, practitioners tend to specify the conditional models in a simple fashion largely dictated by the software, which could lead to suboptimal results. Practical guidelines for specifying appropriate conditional models in MICE are lacking. Motivated by a study of time to hip fractures in the Women's Health Initiative Observational Study using accelerated failure time models, we propose and experiment with some rationales leading to appropriate MICE specifications. This strategy starts with specifying a joint model for the variables involved. We first derive the conditional distribution of each variable under the joint model, then approximate these conditional distributions to the extent which can be characterized by commonly used regression models. We propose to fit separate models to impute incomplete variables by the failure status, which is key to generating appropriate MICE specifications for survival outcomes. The proposed strategy can be conveniently implemented with all available imputation software that uses fully conditional specifications. Our simulation results show that some commonly used simple MICE specifications can produce suboptimal results, while those based on the proposed strategy appear to perform well and be robust toward model misspecifications. Hence, we warn against a mechanical use of MICE and suggest careful modeling of the conditional distributions of variables to ensure proper performance.