Premium
Post hoc Analysis for Detecting Individual Rare Variant Risk Associations Using Probit Regression Bayesian Variable Selection Methods in Case‐Control Sequencing Studies
Author(s) -
Larson Nicholas B.,
McDonnell Shan,
Albright Lisa Can,
Teerlink Craig,
Stanford Janet,
Ostrander Elaine A.,
Isaacs William B.,
Xu Jianfeng,
Cooney Kathleen A.,
Lange Ethan,
Schleutker Johanna,
Carpten John D.,
Powell Isaac,
BaileyWilson Joan,
Cussenot Olivier,
CancelTassin Geraldine,
Giles Graham,
MacInnis Robert,
Maier Christiane,
Whittemore Alice S.,
Hsieh ChihLin,
Wiklund Fredrik,
Catolona William J.,
Foulkes William,
Mandal Diptasri,
Eeles Rosalind,
KoteJarai Zsofia,
Ackerman Michael J.,
Olson Timothy M.,
Klein Christopher J.,
Thibodeau Stephen N.,
Schaid Daniel J.
Publication year - 2016
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21983
Subject(s) - feature selection , bayesian probability , sample size determination , genetic association , computer science , probit model , minor allele frequency , statistics , allele frequency , mathematics , artificial intelligence , machine learning , biology , genetics , allele , single nucleotide polymorphism , gene , genotype
Rare variants (RVs) have been shown to be significant contributors to complex disease risk. By definition, these variants have very low minor allele frequencies and traditional single‐marker methods for statistical analysis are underpowered for typical sequencing study sample sizes. Multimarker burden‐type approaches attempt to identify aggregation of RVs across case‐control status by analyzing relatively small partitions of the genome, such as genes. However, it is generally the case that the aggregative measure would be a mixture of causal and neutral variants, and these omnibus tests do not directly provide any indication of which RVs may be driving a given association. Recently, Bayesian variable selection approaches have been proposed to identify RV associations from a large set of RVs under consideration. Although these approaches have been shown to be powerful at detecting associations at the RV level, there are often computational limitations on the total quantity of RVs under consideration and compromises are necessary for large‐scale application. Here, we propose a computationally efficient alternative formulation of this method using a probit regression approach specifically capable of simultaneously analyzing hundreds to thousands of RVs. We evaluate our approach to detect causal variation on simulated data and examine sensitivity and specificity in instances of high RV dimensionality as well as apply it to pathway‐level RV analysis results from a prostate cancer (PC) risk case‐control sequencing study. Finally, we discuss potential extensions and future directions of this work.