Premium
Bias due to two‐stage residual‐outcome regression analysis in genetic association studies
Author(s) -
Demissie Serkalem,
Cupples L. Adrienne
Publication year - 2011
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20607
Subject(s) - residual , regression , stage (stratigraphy) , genetic association , association (psychology) , regression analysis , outcome (game theory) , statistics , biology , genetics , mathematics , psychology , genotype , single nucleotide polymorphism , mathematical economics , gene , paleontology , algorithm , psychotherapist
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two‐stage regression analysis, sometimes referred to as residual‐ or adjusted‐outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual‐outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted‐outcome and the SNP is evaluated by a simple linear regression of the adjusted‐outcome on the SNP. In this article, we examine the performance of this two‐stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two‐stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared‐correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two‐stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two‐stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two ‐stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. Genet. Epidemiol . 2011. © 2011 Wiley Periodicals, Inc. 35: 592‐596, 2011