z-logo
Premium
Identification of association between disease and multiple markers via sparse partial least‐squares regression
Author(s) -
Chun Hyonho,
Ballard David H.,
Cho Judy,
Zhao Hongyu
Publication year - 2011
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20596
Subject(s) - principal component analysis , identification (biology) , partial least squares regression , genetic association , regression , genome wide association study , biology , computational biology , genetic marker , regression analysis , linear model , linear regression , computer science , statistics , genetics , genotype , mathematics , artificial intelligence , gene , single nucleotide polymorphism , botany
Although genome‐wide association studies have led to the identifications of hundreds of genes underlying dozens of traits in recent years, most published studies have primarily used single marker‐based analysis. Intuitively, more information may be utilized when multiple markers are jointly analyzed. Therefore, many methods have been proposed in the literature for association analysis between traits and multiple markers. Among these methods, simulation and real data analyses have shown that it is often more effective to reduce the dimensionality of the markers in a region through principal components analysis of all the markers first, and then to perform association analysis between traits and those principal components that account for most of the genetic variations in the region. However, one major limitation of this approach is that the principal components are derived purely from marker genotypes, without consideration of their relevance to traits. Furthermore, these components are constructed as linear combinations of all the markers even when only a limited number are potentially relevant to traits. In this manuscript, we propose the use of sparse partial least‐squares regression to derive the components that are linear combinations of only relevant markers. This approach is able to use information from both traits and marker genotypes. Extensive simulations and real data analyses on a Crohn's disease data set suggest the superiority of this approach over existing methods. Genet. Epidemiol . 2011. © 2011 Wiley‐Liss, Inc. 35: 479‐486, 2011

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here