Premium
Effects of categorization method, regression type, and variable distribution on the inflation of Type‐I error rate when categorizing a confounding variable
Author(s) -
BarnwellMénard JeanLouis,
Li Qing,
Cohen Alan A.
Publication year - 2014
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.6387
Subject(s) - statistics , confounding , econometrics , proxy (statistics) , type i and type ii errors , logistic regression , categorization , latent variable , variable (mathematics) , regression analysis , regression , mathematics , inflation (cosmology) , linear regression , computer science , artificial intelligence , mathematical analysis , physics , theoretical physics
The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type‐I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type‐I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type‐I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type‐I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error‐ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.