Premium
Utilization of multiple imperfect assessments of the dependent variable in a logistic regression analysis
Author(s) -
Magder Laurence S.,
Sloan Michael A.,
Duh ShowHong,
Abate Joseph F.,
Kittner Steven J.
Publication year - 2000
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/(sici)1097-0258(20000115)19:1<99::aid-sim327>3.0.co;2-o
Subject(s) - identifiability , logistic regression , computer science , missing data , statistics , regression analysis , variable (mathematics) , econometrics , regression , data mining , machine learning , mathematics , mathematical analysis
Often, in biomedical research, there are multiple sources of imperfect information regarding a dichotomous variable of interest. For example, in a study we are conducting on the relationship between cocaine use and stroke risk, information on the cocaine use of each study patient is available from three fallible sources: patient interviews; urine toxicology testing, and medical record review. Regression analyses based on a rule for classifying patients from this information can result in biased estimation of associations and variances due to the misclassification of some subjects and to the assumption of certainty. We describe a likelihood‐based method that directly incorporates multiple sources of information regarding an outcome variable into a regression analysis and takes into account the uncertainty in the classification. The method can be applied when some sources of information are missing for some subjects. We show how the availability of multiple sources can be exploited to generate estimates of the quality (for example, sensitivity and specificity) of each source and to model the degree to which missing data are informative. A fitting algorithm and issues of identifiability are discussed. We illustrate the method using data from our study. Copyright © 2000 John Wiley & Sons, Ltd.