z-logo
Premium
A Unified Sparse Representation for Sequence Variant Identification for Complex Traits
Author(s) -
Cao Shaolong,
Qin Huaizhen,
Deng HongWen,
Wang YuPing
Publication year - 2014
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21849
Subject(s) - covariate , regression , confounding , sequence (biology) , population , computer science , identification (biology) , regression analysis , artificial intelligence , statistical power , biology , pattern recognition (psychology) , statistics , data mining , machine learning , mathematics , genetics , botany , demography , sociology
Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect, and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants and maintain a lower false discovery rate than do several commonly used feature selection methods. It can handle both rare and common variants simultaneously. Applying our USR algorithm to DNA sequence data of Mexican Americans from GAW18, we replicated three hypertension pathways, demonstrating the effectiveness in identifying susceptibility genetic variants.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here