Premium
Likelihood Methods for Regression Models with Expensive Variables Missing by Design
Author(s) -
Zhao Yang,
Lawless Jerald F.,
McLeish Donald L.
Publication year - 2009
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.200810487
Subject(s) - covariate , missing data , statistics , mathematics , regression analysis , restricted maximum likelihood , regression , econometrics , likelihood function , maximum likelihood
In some applications involving regression the values of certain variables are missing by design for some individuals. For example, in two‐stage studies (Zhao and Lipsitz, 1992), data on “cheaper” variables are collected on a random sample of individuals in stage I, and then “expensive” variables are measured for a subsample of these in stage II. So the “expensive” variables are missing by design at stage I. Both estimating function and likelihood methods have been proposed for cases where either covariates or responses are missing. We extend the semiparametric maximum likelihood (SPML) method for missing covariate problems (e.g. Chen, 2004; Ibrahim et al., 2005; Zhang and Rockette, 2005, 2007) to deal with more general cases where covariates and/or responses are missing by design, and show that profile likelihood ratio tests and interval estimation are easily implemented. Simulation studies are provided to examine the performance of the likelihood methods and to compare their efficiencies with estimating function methods for problems involving (a) a missing covariate and (b) a missing response variable. We illustrate the ease of implementation of SPML and demonstrate its high efficiency (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)