Premium
An optimal kernel‐based U ‐statistic method for quantitative gene‐set association analysis
Author(s) -
He Tao,
Li Shaoyu,
Zhong PingShou,
Cui Yuehua
Publication year - 2019
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.22170
Subject(s) - statistic , kernel (algebra) , type i and type ii errors , set (abstract data type) , kernel method , genetic association , mathematics , computer science , statistical power , genome wide association study , statistics , artificial intelligence , genetics , biology , gene , support vector machine , genotype , combinatorics , single nucleotide polymorphism , programming language
Abstract Single‐variant‐based genome‐wide association studies have successfully detected many genetic variants that are associated with a number of complex traits. However, their power is limited due to weak marginal signals and ignoring potential complex interactions among genetic variants. The set‐based strategy was proposed to provide a remedy where multiple genetic variants in a given set (e.g., gene or pathway) are jointly evaluated, so that the systematic effect of the set is considered. Among many, the kernel‐based testing (KBT) framework is one of the most popular and powerful methods in set‐based association studies. Given a set of candidate kernels, the method has been proposed to choose the one with the smallest p ‐value. Such a method, however, can yield inflated Type 1 error, especially when the number of variants in a set is large. Alternatively one can get p values by permutations which, however, could be very time‐consuming. In this study, we proposed an efficient testing procedure that cannot only control Type 1 error rate but also have power close to the one obtained under the optimal kernel in the candidate kernel set, for quantitative trait association studies. Our method, a maximum kernel‐based U ‐statistic method, is built upon the KBT framework and is based on asymptotic results under a high‐dimensional setting. Hence it can efficiently deal with the case where the number of variants in a set is much larger than the sample size. Both simulation and real data analysis demonstrate the advantages of the method compared with its counterparts.