z-logo
Premium
Generalized linear modeling with regularization for detecting common disease rare haplotype association
Author(s) -
Guo Wei,
Lin Shili
Publication year - 2009
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20382
Subject(s) - haplotype , sample size determination , single nucleotide polymorphism , genetic association , genotyping , genome wide association study , haplotype estimation , computer science , statistics , biology , computational biology , genetics , data mining , mathematics , genotype , gene
Abstract Whole genome association studies (WGAS) have surged in popularity in recent years as technological advances have made large‐scale genotyping more feasible and as new exciting results offer tremendous hope and optimism. The logic of WGAS rests upon the common disease/common variant (CD/CV) hypothesis. Detection of association under the common disease/rare variant (CD/RV) scenario is much harder, and the current practices of WGAS may be under‐power without large enough sample sizes. In this article, we propose a generalized linear model with regularization (rGLM) approach for detecting disease‐haplotype association using unphased single nucleotide polymorphisms data that is applicable to both CD/CV and CD/RV scenarios. We borrow a dimension‐reduction method from the data mining and statistical learning literature, but use it for the purpose of weeding out haplotypes that are not associated with the disease so that the associated haplotypes, especially those that are rare, can stand out and be accounted for more precisely. By using high‐dimensional data analysis techniques, which are frequently employed in microarray analyses, interacting effects among haplotypes in different blocks can be investigated without much concern about the sample size being overwhelmed by the number of haplotype combinations. Our simulation study demonstrates the gain in power for detecting associations with moderate sample sizes. For detecting association under CD/RV, regression type methods such as that implemented in hapassoc may fail to provide coefficient estimates for rare associated haplotypes, resulting in a loss of power compared to rGLM. Furthermore, our results indicate that rGLM can uncover the associated variants much more frequently than can hapassoc. Genet. Epidemiol . 2009. © 2008 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here