Premium
On Efficient and Accurate Calculation of Significance P ‐Values for Sequence Kernel Association Testing of Variant Set
Author(s) -
Wu Baolin,
Guan Weihua,
Pankow James S.
Publication year - 2016
Publication title -
annals of human genetics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.537
H-Index - 77
eISSN - 1469-1809
pISSN - 0003-4800
DOI - 10.1111/ahg.12144
Subject(s) - kernel (algebra) , set (abstract data type) , sequence (biology) , computer science , scale (ratio) , exome , exome sequencing , algorithm , computational biology , mathematics , data mining , biology , mutation , genetics , geography , cartography , combinatorics , gene , programming language
Summary The objective of this paper is to discuss and develop alternative computational methods to accurately and efficiently calculate significance P ‐values for the commonly used sequence kernel association test (SKAT) and adaptive sum of SKAT and burden test (SKAT‐O) for variant set association. We show that the existing software can lead to either conservative or inflated type I errors. We develop alternative and efficient computational algorithms that quickly compute the SKAT P ‐value and have well‐controlled type I errors. In addition, we derive an alternative and simplified formula for calculating the significance P ‐value of SKAT‐O, which sheds light on the development of efficient and accurate numerical algorithms. We implement the proposed methods in the publicly available R package that can be readily used or adapted to large‐scale sequencing studies. Given that more and more large‐scale exome and whole genome sequencing or re‐sequencing studies are being conducted, the proposed methods are practically very important. We conduct extensive numerical studies to investigate the performance of the proposed methods. We further illustrate their usefulness with application to associations between rare exonic variants and fasting glucose levels in the Atherosclerosis Risk in Communities (ARIC) study.