Premium
Monte Carlo EM for Missing Covariates in Parametric Regression Models
Author(s) -
Ibrahim Joseph G.,
Chen MingHui,
Lipsitz Stuart R.
Publication year - 1999
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.0006-341x.1999.00591.x
Subject(s) - covariate , categorical variable , missing data , mathematics , statistics , parametric statistics , gibbs sampling , conditional probability distribution , regression analysis , parametric model , econometrics , bayesian probability
Summary. We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85 , 765–769). We extend this method t o continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American Statistical Association 85 , 699–704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log‐concave. The log‐concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41 , 337–348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one‐dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E‐step. We present examples involving both simulated and real data.