Premium
Detecting SNP‐expression associations: A comparison of mutual information and median test with standard statistical approaches
Author(s) -
Szymczak S.,
Igl B.W.,
Ziegler A.
Publication year - 2009
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.3695
Subject(s) - international hapmap project , statistics , skewness , multiple comparisons problem , statistical hypothesis testing , analysis of variance , single nucleotide polymorphism , nominal level , genotype , statistical power , mathematics , computer science , genetics , biology , gene , confidence interval
Single nucleotide polymorphism‐gene expression associations have received increasing interest. The aim of these studies is discovering a difference in the location parameters of gene expressions given genotype. Because gene expressions often are highly skewed, heavy‐tailed or data of different genotypes vary in dispersion, the median is the most appropriate measure of location. In this case, model assumptions of standard statistical methods for comparing locations such as the analysis of variance (ANOVA) or the Kruskal–Wallis (KW) test are violated. Alternatives that might be more appropriate are the median test (MED) and tests based on mutual information (MI). In simulation studies these approaches and a novel MI test are compared with ANOVA and KW. Location, dispersion and skewness parameters of the gene expression distributions given genotypes are varied as well as genotype frequencies. The MED test and the novel MI‐based method keep the nominal significance levels for comparing medians if gene expression data are non‐normally distributed. ANOVA and KW have substantially inflated type I errors. They are, however, optimal if standard model assumptions are fulfilled. The MED test generally has larger power than MI and is therefore recommended if model assumptions of standard procedures are violated. A 300 kb region on chromosome 9p21.3, which is associated with coronary artery disease, was analyzed using the HapMap data. Only the alternative approaches were able to identify three genes (ADM, FCGR3B and ADORA1) as promising candidates to clarify the molecular mechanism of the genetic association. Copyright © 2009 John Wiley & Sons, Ltd.