Premium
A review of methods for misclassified categorical data in epidemiology
Author(s) -
Chen T. Timothy
Publication year - 1989
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.4780080908
Subject(s) - categorical variable , computer science , statistics , data mining , table (database) , linear model , generalized linear model , contingency table , mathematics , machine learning
Misclassification introduces errors in categorical variables. This paper presents a review of methods for misclassified categorical data in epidemiology. Different sampling schemes for a 2 × 2 × 2 table and methods of analyses will be discussed first. A misclassification matrix is defined, and the usual misclassification models will be shown to be a subclass of log‐linear models. Well‐known results on a 2 × 2 table with misclassification and recent results on a 2 × 2 × 2 table are then reviewed. Finally two methods of adjusting for misclassification will be given. The first method assumes a known misclassification matrix, and the second method uses subsampling to estimate the misclassification matrix. The analysis is based on a recursive system of log‐linear models: first determine a misclassification model, then select a model for the correctly classified variables. The methods are illustrated by data from traffic safety research on the effectiveness of seatbelt use in reducing injuries.