z-logo
Premium
Comparison of methods for imputing ordinal data using multivariate normal imputation: a case study of non‐linear effects in a large cohort study
Author(s) -
Lee Katherine J.,
Galati John C.,
Simpson Julie A.,
Carlin John B.
Publication year - 2012
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.5445
Subject(s) - imputation (statistics) , ordinal data , missing data , categorical variable , statistics , ordinal regression , multivariate statistics , multivariate normal distribution , rounding , mathematics , computer science , linear model , econometrics , operating system
Background Multiple imputation is becoming increasingly popular for handling missing data, with Markov chain Monte Carlo assuming multivariate normality (MVN) a commonly used approach. Imputing categorical variables (which are clearly non‐normal) using MVN imputation is challenging, and several approaches have been suggested. However, it remains unclear which approach should be preferred. Methods We explore methods for imputing ordinal variables using MVN imputation, including imputing as a continuous variable and as a set of indicators, and various methods for assigning imputed values to the possible categories (rounding), for estimating a non‐linear association between an ordinal exposure and binary outcome. We introduce a new approach where we impute as continuous and assign imputed values into categories based on the mean indicators imputed in a separate round of imputation. We compare these approaches in a simple setting where we make 50% of data in an ordinal exposure missing completely at random, within an otherwise complete real dataset. Results Methods that impute the ordinal exposure as continuous distorted the non‐linear exposure–outcome association by biasing the relationship towards linearity irrespective of the rounding method. In contrast, imputing using indicators preserved the non‐linear association but not the marginal distribution of the ordinal variable. Conclusions Imputing ordinal variables as continuous can bias the estimation of the exposure–outcome association in the presence of non‐linear relationships. Further work is needed to develop optimal methods for handling ordinal (and nominal) variables when using MVN imputation. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here