Premium
The Value of Statistical or Bioinformatics Annotation for Rare Variant Association With Quantitative Trait
Author(s) -
Byrnes Andrea E.,
Wu Michael C.,
Wright Fred A.,
Li Mingyao,
Li Yun
Publication year - 2013
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21747
Subject(s) - weighting , lasso (programming language) , annotation , computational biology , trait , selection (genetic algorithm) , genetic association , feature selection , regression , biology , single nucleotide polymorphism , variable (mathematics) , phenotype , computer science , data mining , bioinformatics , genetics , statistics , machine learning , mathematics , genotype , gene , medicine , mathematical analysis , world wide web , radiology , programming language
In the past few years, a plethora of methods for rare variant association with phenotype have been proposed. These methods aggregate information from multiple rare variants across genomic region(s), but there is little consensus as to which method is most effective. The weighting scheme adopted when aggregating information across variants is one of the primary determinants of effectiveness. Here we present a systematic evaluation of multiple weighting schemes through a series of simulations intended to mimic large sequencing studies of a quantitative trait. We evaluate existing phenotype‐independent and phenotype‐dependent methods, as well as weights estimated by penalized regression approaches including Lasso, Elastic Net, and SCAD. We find that the difference in power between phenotype‐dependent schemes is negligible when high‐quality functional annotations are available. When functional annotations are unavailable or incomplete, all methods suffer from power loss; however, the variable selection methods outperform the others at the cost of increased computational time. Therefore, in the absence of good annotation, we recommend variable selection methods (which can be viewed as “statistical annotation”) on top of regions implicated by a phenotype‐independent weighting scheme. Further, once a region is implicated, variable selection can help to identify potential causal single nucleotide polymorphisms for biological validation. These findings are supported by an analysis of a high coverage targeted sequencing study of 1,898 individuals.